Usage Enrichment

The usage consumer (apps/usage-consumer) receives raw usage events from the proxy and enriches them with pricing data to produce finalised cost records in ClickHouse.

ExtractedUsage Interface

Every usage event carries these fields (all optional — providers populate different subsets):

Field	Type	Description
`prompt_tokens`	number	Input tokens
`completion_tokens`	number	Output tokens
`total_tokens`	number	Sum of prompt + completion (some providers report directly)
`provider_cost`	number	Provider-reported cost (authoritative for OpenRouter)
`cache_read_tokens`	number	Tokens served from provider cache
`cache_write_tokens`	number	Tokens written to provider cache
`reasoning_tokens`	number	Tokens used for chain-of-thought reasoning (OpenAI o-series, etc.)
`prompt_audio_tokens`	number	Audio input tokens
`prompt_image_tokens`	number	Image input tokens
`completion_audio_tokens`	number	Audio output tokens
`web_search_requests`	number	Web search tool invocations
`image_count`	number	Number of images generated
`image_size`	string	Generated image dimensions (e.g., `1024x1024`)
`image_quality`	string	Generated image quality level (e.g., `hd`, `standard`)
`started_at_ms`	number	Request start timestamp (epoch ms)
`first_byte_at_ms`	number	Time to first byte (epoch ms)
`ended_at_ms`	number	Request end timestamp (epoch ms)

Provider-Specific Extraction

OpenAI

Standard usage object in the response body or final SSE chunk. Fields map directly. For streaming, stream_options.include_usage is injected by the proxy to ensure the final chunk contains usage data.

Reasoning tokens are reported in completion_tokens_details.reasoning_tokens. These are a subset of completion_tokens, not additive.

Anthropic

Usage is split across two SSE events for streaming:

message_start — Contains usage.input_tokens and usage.cache_read_input_tokens, usage.cache_creation_input_tokens
message_delta — Contains usage.output_tokens

The proxy merges these into a single usage record.

Cache token handling: Anthropic includes cache tokens within input_tokens (i.e., input_tokens = regular input + cache read + cache write). The usage consumer subtracts cache tokens from prompt_tokens to avoid double-counting when calculating costs.

Web search: Anthropic reports web search usage in usage.server_tool_use.web_search_requests.

Google

Uses a different field structure: usageMetadata instead of usage.

promptTokenCount maps to prompt_tokens
candidatesTokenCount maps to completion_tokens
cachedContentTokenCount maps to cache_read_tokens

The proxy normalises these to the standard field names.

xAI

Follows the OpenAI format. stream_options.include_usage is injected for streaming.

Image Generation

For image generation endpoints (e.g., OpenAI DALL-E), usage is not token-based. The proxy extracts metadata from the request body:

n — Number of images requested (defaults to 1)
size — Dimensions string
quality — Quality level

Cost is looked up from the pricing blob using a composite key: provider:model:size:quality.

Cost Calculation

Base formula:

cost = tokens * rate / 1_000_000

Where rate is the per-1M-token rate from the pricing blob.

Anthropic Caching Example

Given: 1,000 prompt tokens (reported by Anthropic, which includes cache tokens), 200 cache read tokens, 50 cache write tokens, 500 completion tokens.

adjusted_prompt = 1000 - 200 - 50 = 750  (subtract cache tokens to avoid double-counting)
prompt_cost     = 750 * input_rate / 1_000_000
cache_read_cost = 200 * (input_rate * cache_read_multiplier) / 1_000_000
cache_write_cost= 50  * (input_rate * cache_write_multiplier) / 1_000_000
completion_cost = 500 * output_rate / 1_000_000
total_cost      = prompt_cost + cache_read_cost + cache_write_cost + completion_cost

OpenAI Reasoning Example

Given: 100 prompt tokens, 800 completion tokens (of which 600 are reasoning tokens).

prompt_cost     = 100 * input_rate / 1_000_000
completion_cost = 800 * output_rate / 1_000_000  (reasoning tokens use the same output rate)
total_cost      = prompt_cost + completion_cost

Reasoning tokens are not priced separately — they are a subset of completion tokens.

Image Generation Example

Given: 2 images, 1024x1024, HD quality, DALL-E 3.

per_image_cost = pricing_blob["openai:dall-e-3:1024x1024:hd"]
total_cost     = 2 * per_image_cost

OpenRouter

When provider_cost is present in the OpenRouter response, it is treated as authoritative and used directly. The consumer does not recalculate from token counts for OpenRouter events that include provider_cost.

ExtractedUsage Interface​

Provider-Specific Extraction​

OpenAI​

Anthropic​

Google​

xAI​

Image Generation​

Cost Calculation​

Anthropic Caching Example​

OpenAI Reasoning Example​

Image Generation Example​

OpenRouter​