Edge LLM gateway, multi-provider routing, budget tokens and cost metering
Overview
An edge-deployed LLM gateway that sits in front of Anthropic, OpenAI and OpenRouter. It accepts a budget-scoped billing token, looks up the issuer’s real provider keys, routes to the right upstream by model, and returns per-request cost and token usage in response headers.
Why It Exists
Running agents and apps against several LLM providers raises three recurring problems: keeping real API keys out of clients, attributing and capping spend per user, and not rewriting call sites for each provider’s request/response format. This Worker centralizes all three behind one endpoint.
What We Built
A single Cloudflare Worker (src/index.ts) that receives a billing token in the
x-api-key header and validates it against a KV-stored issuer record, checking expiry
and budget (with budgets expressed in microdollars over an hour/day/week/month window).
It detects the target provider from the model name, gpt-/o1/o3/o4/chatgpt-
prefixes route to OpenAI, known claude-* models to Anthropic, and the remainder to
OpenRouter (including custom/self-hosted endpoints), and substitutes the issuer’s own
provider key. Requests and responses are translated between Anthropic and OpenAI formats
so callers can use one shape. Every response carries cost-accounting headers
(X-Request-Cost-USD, X-Input-Tokens, X-Output-Tokens, X-Model), computed from
per-model pricing with support for issuer-level custom pricing overrides. The KV issuer
namespace is shared with a sibling agent-cloud service, so token issuance and spend
tracking stay consistent across the platform.
Technologies & Approach
A stateless Cloudflare Worker keeps the gateway globally close to callers with minimal
ops. KV holds issuer secrets, provider keys and pricing; budget tokens carry a subject,
issuer, expiry and a windowed budget so enforcement is self-describing. A fallback
ANTHROPIC_API_KEY secret covers issuer records without their own key.
Outcome / Impact
A reusable internal building block that decouples applications from LLM providers while enforcing per-user budgets and producing exact cost telemetry on every call, the metering and key-isolation layer behind the studio’s agent/automation platform.
Capabilities Demonstrated
- Multi-provider LLM routing (Anthropic, OpenAI, OpenRouter) by model detection
- Anthropic↔OpenAI request/response format translation
- Budget-scoped billing tokens with windowed spend enforcement
- Per-request cost and token metering via response headers
- Provider-key isolation behind an edge gateway, with custom pricing overrides