Usage Tracking
AI SpendOps captures detailed usage data from every API call, including tokens, cost, and granular breakdowns.
Principles
- Provider usage is authoritative — we use the provider's reported token counts
- Exactly one usage event per request — no duplicates
- Estimation only when unavailable — if the provider doesn't report usage, tokens are estimated from character count (chars / 4)
What gets tracked
Core token counts
Every request captures:
| Field | Description |
|---|---|
prompt_tokens | Input tokens sent to the model |
completion_tokens | Output tokens generated by the model |
total_tokens | Total tokens (input + output) |
Granular breakdowns
When the provider reports them, additional fields are captured:
| Field | Description | Providers |
|---|---|---|
cache_read_tokens | Tokens served from prompt cache (discounted) | OpenAI, Anthropic, Google, xAI |
cache_write_tokens | Tokens written to prompt cache | Anthropic |
reasoning_tokens | Tokens used for reasoning/thinking | OpenAI (o3/o4-mini), Google |
prompt_audio_tokens | Audio input tokens | OpenAI |
prompt_image_tokens | Image input tokens | OpenAI |
completion_audio_tokens | Audio output tokens | OpenAI |
web_search_requests | Number of web searches performed | Anthropic |
Cost tracking
| Field | Description |
|---|---|
provider_cost | Actual cost reported by the provider (OpenRouter only) |
total_cost_usd | Calculated cost based on token counts and pricing data |
Timing metrics
| Field | Description |
|---|---|
started_at_ms | When the request was received |
first_byte_at_ms | Time to first byte from the provider |
ended_at_ms | When the response completed |
Usage sources
Each event records how usage data was obtained:
| Source | Description |
|---|---|
provider | Usage data from the provider's response |
estimate | Tokens estimated from character count |
partial | Some fields from provider, others estimated |
none | No usage data available |
Async extraction
Usage data extraction happens asynchronously in ctx.waitUntil after the response is returned to you. This means:
- Zero overhead on your response latency
- Usage events are processed in the background
- Events are enriched with pricing data and written to the analytics database