Skip to main content

Proxy Design

The proxy worker (apps/proxy-worker/src/index.ts) is a single Cloudflare Worker that handles the entire request lifecycle: authentication, dimension validation, routing, HTTP/streaming passthrough, usage extraction, and event emission.

Responsibilities

  1. Authentication — Validates the Authorization: Bearer token by computing its HMAC-SHA-256 hash and looking it up in KV. Returns 401/403 for invalid or inactive keys.
  2. Dimension validation — Checks X-ASO-* dimension headers against the key's allowed dimension schema. Rejects requests with unknown or invalid dimensions.
  3. Routing — Maps the provider path segment to the correct upstream base URL and forwards the request.
  4. HTTP/streaming passthrough — Streams responses byte-for-byte using SSE with no buffering. Respects client aborts (ReadableStream cancellation).
  5. Usage extraction — Parses the final response chunk (or response body for non-streaming) to extract token counts, cost data, and timing information. Runs in ctx.waitUntil to avoid adding latency to the response path.
  6. Event emission — Queues a usage event (or denial event) to the appropriate Cloudflare Queue.

URL Structure

https://proxy.aispendops.com/v1/{provider}/{path}
  • {provider} — One of 14 supported providers: openrouter, openai, anthropic, google, xai, groq, deepinfra, novita, fireworks, perplexity, cerebras, mistral, deepseek, nebius
  • {path} — The upstream API path, forwarded as-is (e.g., chat/completions, messages)

Example: POST https://proxy.aispendops.com/v1/openai/chat/completions

Header Handling

Stripped headers (not forwarded upstream)

Headers removed before forwarding to prevent leaking proxy metadata or conflicting with upstream expectations:

  • host — Replaced with the upstream host
  • cf-* — Cloudflare-specific headers
  • x-aso-* — AI SpendOps dimension headers (consumed by the proxy)
  • cdn-* — CDN headers
  • x-forwarded-*, x-real-ip — Proxy chain headers
  • authorization — Replaced with the upstream API key from the key policy

Passthrough headers

All other headers (e.g., content-type, accept, x-stainless-*, provider-specific headers) are forwarded unchanged.

Streaming

The proxy streams SSE responses byte-for-byte with zero buffering:

  • Uses TransformStream to pipe upstream response chunks directly to the client
  • No parsing or modification of intermediate chunks
  • Client abort detection: when the client disconnects, the readable side cancels, which tears down the upstream connection
  • The final SSE chunk (containing [DONE] or usage data) is captured in waitUntil for usage extraction without delaying the stream

Stream Options Injection

For OpenAI-compatible providers (OpenAI, Google, xAI), the proxy injects stream_options: { include_usage: true } into streaming requests if not already present. This ensures the final chunk contains token counts, which are required for usage extraction. The injection is done by parsing the request body, adding the field, and re-serialising — only for streaming requests to these specific providers.

Denial Points

The proxy has 8 distinct denial points, each producing a denial event with a specific type and reason:

StageDenialHTTP Status
Pre-authMissing API key header401
Pre-authInvalid key prefix401
Pre-authHash not found in KV401
Post-authInactive key403
Post-authUnknown provider400
Post-authProvider blocked for key403
Post-authDimension validation failure400
Post-authModel blocked for key403

All denials are queued to denial-events-{env} for audit. See Denial Event Pipeline for details.