Proxy Design
The proxy worker (apps/proxy-worker/src/index.ts) is a single Cloudflare Worker that handles the entire request lifecycle: authentication, dimension validation, routing, HTTP/streaming passthrough, usage extraction, and event emission.
Responsibilities
- Authentication — Validates the
Authorization: Bearertoken by computing its HMAC-SHA-256 hash and looking it up in KV. Returns 401/403 for invalid or inactive keys. - Dimension validation — Checks
X-ASO-*dimension headers against the key's allowed dimension schema. Rejects requests with unknown or invalid dimensions. - Routing — Maps the provider path segment to the correct upstream base URL and forwards the request.
- HTTP/streaming passthrough — Streams responses byte-for-byte using SSE with no buffering. Respects client aborts (ReadableStream cancellation).
- Usage extraction — Parses the final response chunk (or response body for non-streaming) to extract token counts, cost data, and timing information. Runs in
ctx.waitUntilto avoid adding latency to the response path. - Event emission — Queues a usage event (or denial event) to the appropriate Cloudflare Queue.
URL Structure
https://proxy.aispendops.com/v1/{provider}/{path}
{provider}— One of 14 supported providers:openrouter,openai,anthropic,google,xai,groq,deepinfra,novita,fireworks,perplexity,cerebras,mistral,deepseek,nebius{path}— The upstream API path, forwarded as-is (e.g.,chat/completions,messages)
Example: POST https://proxy.aispendops.com/v1/openai/chat/completions
Header Handling
Stripped headers (not forwarded upstream)
Headers removed before forwarding to prevent leaking proxy metadata or conflicting with upstream expectations:
host— Replaced with the upstream hostcf-*— Cloudflare-specific headersx-aso-*— AI SpendOps dimension headers (consumed by the proxy)cdn-*— CDN headersx-forwarded-*,x-real-ip— Proxy chain headersauthorization— Replaced with the upstream API key from the key policy
Passthrough headers
All other headers (e.g., content-type, accept, x-stainless-*, provider-specific headers) are forwarded unchanged.
Streaming
The proxy streams SSE responses byte-for-byte with zero buffering:
- Uses
TransformStreamto pipe upstream response chunks directly to the client - No parsing or modification of intermediate chunks
- Client abort detection: when the client disconnects, the readable side cancels, which tears down the upstream connection
- The final SSE chunk (containing
[DONE]or usage data) is captured inwaitUntilfor usage extraction without delaying the stream
Stream Options Injection
For OpenAI-compatible providers (OpenAI, Google, xAI), the proxy injects stream_options: { include_usage: true } into streaming requests if not already present. This ensures the final chunk contains token counts, which are required for usage extraction. The injection is done by parsing the request body, adding the field, and re-serialising — only for streaming requests to these specific providers.
Denial Points
The proxy has 8 distinct denial points, each producing a denial event with a specific type and reason:
| Stage | Denial | HTTP Status |
|---|---|---|
| Pre-auth | Missing API key header | 401 |
| Pre-auth | Invalid key prefix | 401 |
| Pre-auth | Hash not found in KV | 401 |
| Post-auth | Inactive key | 403 |
| Post-auth | Unknown provider | 400 |
| Post-auth | Provider blocked for key | 403 |
| Post-auth | Dimension validation failure | 400 |
| Post-auth | Model blocked for key | 403 |
All denials are queued to denial-events-{env} for audit. See Denial Event Pipeline for details.