Skip to main content

Usage Enrichment

The usage consumer (apps/usage-consumer) receives raw usage events from the proxy and enriches them with pricing data to produce finalised cost records in ClickHouse.

ExtractedUsage Interface

Every usage event carries these fields (all optional — providers populate different subsets):

FieldTypeDescription
prompt_tokensnumberInput tokens
completion_tokensnumberOutput tokens
total_tokensnumberSum of prompt + completion (some providers report directly)
provider_costnumberProvider-reported cost (authoritative for OpenRouter)
cache_read_tokensnumberTokens served from provider cache
cache_write_tokensnumberTokens written to provider cache
reasoning_tokensnumberTokens used for chain-of-thought reasoning (OpenAI o-series, etc.)
prompt_audio_tokensnumberAudio input tokens
prompt_image_tokensnumberImage input tokens
completion_audio_tokensnumberAudio output tokens
web_search_requestsnumberWeb search tool invocations
image_countnumberNumber of images generated
image_sizestringGenerated image dimensions (e.g., 1024x1024)
image_qualitystringGenerated image quality level (e.g., hd, standard)
started_at_msnumberRequest start timestamp (epoch ms)
first_byte_at_msnumberTime to first byte (epoch ms)
ended_at_msnumberRequest end timestamp (epoch ms)

Provider-Specific Extraction

OpenAI

Standard usage object in the response body or final SSE chunk. Fields map directly. For streaming, stream_options.include_usage is injected by the proxy to ensure the final chunk contains usage data.

Reasoning tokens are reported in completion_tokens_details.reasoning_tokens. These are a subset of completion_tokens, not additive.

Anthropic

Usage is split across two SSE events for streaming:

  • message_start — Contains usage.input_tokens and usage.cache_read_input_tokens, usage.cache_creation_input_tokens
  • message_delta — Contains usage.output_tokens

The proxy merges these into a single usage record.

Cache token handling: Anthropic includes cache tokens within input_tokens (i.e., input_tokens = regular input + cache read + cache write). The usage consumer subtracts cache tokens from prompt_tokens to avoid double-counting when calculating costs.

Web search: Anthropic reports web search usage in usage.server_tool_use.web_search_requests.

Google

Uses a different field structure: usageMetadata instead of usage.

  • promptTokenCount maps to prompt_tokens
  • candidatesTokenCount maps to completion_tokens
  • cachedContentTokenCount maps to cache_read_tokens

The proxy normalises these to the standard field names.

xAI

Follows the OpenAI format. stream_options.include_usage is injected for streaming.

Image Generation

For image generation endpoints (e.g., OpenAI DALL-E), usage is not token-based. The proxy extracts metadata from the request body:

  • n — Number of images requested (defaults to 1)
  • size — Dimensions string
  • quality — Quality level

Cost is looked up from the pricing blob using a composite key: provider:model:size:quality.

Cost Calculation

Base formula:

cost = tokens * rate / 1_000_000

Where rate is the per-1M-token rate from the pricing blob.

Anthropic Caching Example

Given: 1,000 prompt tokens (reported by Anthropic, which includes cache tokens), 200 cache read tokens, 50 cache write tokens, 500 completion tokens.

adjusted_prompt = 1000 - 200 - 50 = 750  (subtract cache tokens to avoid double-counting)
prompt_cost = 750 * input_rate / 1_000_000
cache_read_cost = 200 * (input_rate * cache_read_multiplier) / 1_000_000
cache_write_cost= 50 * (input_rate * cache_write_multiplier) / 1_000_000
completion_cost = 500 * output_rate / 1_000_000
total_cost = prompt_cost + cache_read_cost + cache_write_cost + completion_cost

OpenAI Reasoning Example

Given: 100 prompt tokens, 800 completion tokens (of which 600 are reasoning tokens).

prompt_cost     = 100 * input_rate / 1_000_000
completion_cost = 800 * output_rate / 1_000_000 (reasoning tokens use the same output rate)
total_cost = prompt_cost + completion_cost

Reasoning tokens are not priced separately — they are a subset of completion tokens.

Image Generation Example

Given: 2 images, 1024x1024, HD quality, DALL-E 3.

per_image_cost = pricing_blob["openai:dall-e-3:1024x1024:hd"]
total_cost = 2 * per_image_cost

OpenRouter

When provider_cost is present in the OpenRouter response, it is treated as authoritative and used directly. The consumer does not recalculate from token counts for OpenRouter events that include provider_cost.