Usage Enrichment
The usage consumer (apps/usage-consumer) receives raw usage events from the proxy and enriches them with pricing data to produce finalised cost records in ClickHouse.
ExtractedUsage Interface
Every usage event carries these fields (all optional — providers populate different subsets):
| Field | Type | Description |
|---|---|---|
prompt_tokens | number | Input tokens |
completion_tokens | number | Output tokens |
total_tokens | number | Sum of prompt + completion (some providers report directly) |
provider_cost | number | Provider-reported cost (authoritative for OpenRouter) |
cache_read_tokens | number | Tokens served from provider cache |
cache_write_tokens | number | Tokens written to provider cache |
reasoning_tokens | number | Tokens used for chain-of-thought reasoning (OpenAI o-series, etc.) |
prompt_audio_tokens | number | Audio input tokens |
prompt_image_tokens | number | Image input tokens |
completion_audio_tokens | number | Audio output tokens |
web_search_requests | number | Web search tool invocations |
image_count | number | Number of images generated |
image_size | string | Generated image dimensions (e.g., 1024x1024) |
image_quality | string | Generated image quality level (e.g., hd, standard) |
started_at_ms | number | Request start timestamp (epoch ms) |
first_byte_at_ms | number | Time to first byte (epoch ms) |
ended_at_ms | number | Request end timestamp (epoch ms) |
Provider-Specific Extraction
OpenAI
Standard usage object in the response body or final SSE chunk. Fields map directly. For streaming, stream_options.include_usage is injected by the proxy to ensure the final chunk contains usage data.
Reasoning tokens are reported in completion_tokens_details.reasoning_tokens. These are a subset of completion_tokens, not additive.
Anthropic
Usage is split across two SSE events for streaming:
message_start— Containsusage.input_tokensandusage.cache_read_input_tokens,usage.cache_creation_input_tokensmessage_delta— Containsusage.output_tokens
The proxy merges these into a single usage record.
Cache token handling: Anthropic includes cache tokens within input_tokens (i.e., input_tokens = regular input + cache read + cache write). The usage consumer subtracts cache tokens from prompt_tokens to avoid double-counting when calculating costs.
Web search: Anthropic reports web search usage in usage.server_tool_use.web_search_requests.
Google
Uses a different field structure: usageMetadata instead of usage.
promptTokenCountmaps toprompt_tokenscandidatesTokenCountmaps tocompletion_tokenscachedContentTokenCountmaps tocache_read_tokens
The proxy normalises these to the standard field names.
xAI
Follows the OpenAI format. stream_options.include_usage is injected for streaming.
Image Generation
For image generation endpoints (e.g., OpenAI DALL-E), usage is not token-based. The proxy extracts metadata from the request body:
n— Number of images requested (defaults to 1)size— Dimensions stringquality— Quality level
Cost is looked up from the pricing blob using a composite key: provider:model:size:quality.
Cost Calculation
Base formula:
cost = tokens * rate / 1_000_000
Where rate is the per-1M-token rate from the pricing blob.
Anthropic Caching Example
Given: 1,000 prompt tokens (reported by Anthropic, which includes cache tokens), 200 cache read tokens, 50 cache write tokens, 500 completion tokens.
adjusted_prompt = 1000 - 200 - 50 = 750 (subtract cache tokens to avoid double-counting)
prompt_cost = 750 * input_rate / 1_000_000
cache_read_cost = 200 * (input_rate * cache_read_multiplier) / 1_000_000
cache_write_cost= 50 * (input_rate * cache_write_multiplier) / 1_000_000
completion_cost = 500 * output_rate / 1_000_000
total_cost = prompt_cost + cache_read_cost + cache_write_cost + completion_cost
OpenAI Reasoning Example
Given: 100 prompt tokens, 800 completion tokens (of which 600 are reasoning tokens).
prompt_cost = 100 * input_rate / 1_000_000
completion_cost = 800 * output_rate / 1_000_000 (reasoning tokens use the same output rate)
total_cost = prompt_cost + completion_cost
Reasoning tokens are not priced separately — they are a subset of completion tokens.
Image Generation Example
Given: 2 images, 1024x1024, HD quality, DALL-E 3.
per_image_cost = pricing_blob["openai:dall-e-3:1024x1024:hd"]
total_cost = 2 * per_image_cost
OpenRouter
When provider_cost is present in the OpenRouter response, it is treated as authoritative and used directly. The consumer does not recalculate from token counts for OpenRouter events that include provider_cost.