Token reduction, no complexity overhead. Scale from prototype to 100M+ token flows.
Can I optimize existing API calls?
Yes. Drop in a proxy endpoint or modify headers. No model changes needed. Optimization happens at request orchestration layer.
What models are supported?
GPT-4, Claude, Llama, Mixtral, and custom endpoints. Add new models by name. Router automatically learns cost/latency tradeoffs.
Is there a monthly commitment?
No. Starter and Professional are month-to-month. Pay only for tokens used. Enterprise customers usually sign annual agreements.
How much can I save?
Typical reduction: 40-80% depending on traffic patterns and model diversity. Biggest wins come from batching and fallback routing to cheaper models.
What about latency?
Smart batching adds ~50ms for non-real-time. Streaming requests bypass batching. P95 latency target is within 100ms of direct API call.
Can I self-host?
Enterprise only. Includes Docker image, Kubernetes manifests, and 6-month onboarding support.