Security & Privacy

InferKit is built privacy-first. This brief is written for developers and for enterprise security reviews; for a deeper architecture discussion, contact [email protected].

The core guarantee: local mode keeps data on-device

On a WebGPU-capable browser, InferKit runs the language model entirely in the visitor’s browser (via WebLLM). In this mode:

Page content, the user’s questions, and the model’s answers are processed on the user’s own device.
None of it is sent to InferKit or any third-party LLM provider.
The InferKit API is contacted once at startup to validate the key and return configuration — it is a control plane, not a data pipe.

Remote inference (paid tiers, used as a fallback when local isn’t available) routes to the configured LLM provider; what’s sent is limited to the page-grounded prompt required to answer.

Key model

Publishable / secret split (Stripe-style). Publishable keys (ik_pub_…) are safe for the browser and fenced by an Origin / domain allowlist. Secret keys (ik_secret_…) are server-side only and support an IP allowlist.
Grace-period rotation. Rotate a key while the old one keeps working for a configurable window, then auto-revokes — no downtime.

Abuse & bot protection

Challenge before spend. Remote inference on paid keys can require a Cloudflare Turnstile challenge before any provider cost is incurred.
Anomaly auto-suspend. An off-path engine watches for usage spikes, single-IP concentration, and origin mismatches, and can automatically suspend a key.
Hard quotas. Monthly token caps are enforced server-side.

Platform hardening

Edge origin lock. The API only trusts traffic that arrives through our CDN (Cloudflare), preventing direct-to-origin spoofing of client IPs.
Encrypted BYOK vault. Bring-your-own provider keys are stored encrypted (AES-256-GCM); InferKit can also proxy to your own endpoint so we never hold the key.
Role-based access (RBAC). Organizations with roles (owner/admin/billing/ member/viewer); sensitive actions are enforced server-side, not just hidden in UI.
Least-data telemetry. Every request carries a correlation reference for support without exposing conversation content.

Compliance posture

Data minimization by design — local mode means there’s often no conversation data to process or store in the first place.
EU/GDPR-friendly — on-device inference avoids cross-border data transfer for the conversation itself.
Formal certifications (e.g., SOC 2) are on the roadmap; reach out for the current status and a security questionnaire.

Questions or a vendor assessment? [email protected]