Pricing Pipeline
The pricing pipeline ensures the usage consumer always has up-to-date, accurate pricing data available in Cloudflare KV with zero per-event KV reads.
Architecture
Helicone API → PricingSync (Azure Function) → SQL → CloudflarePricingSync (Azure Function) → CF KV → usage-consumer
- PricingSync — Azure Function (timer-triggered). Pulls the latest model pricing from the Helicone API, reconciles it against the
pricing_modelstable in SQL, and inserts/updates rows. Computes model expense rankings at the same time. - CloudflarePricingSync — Azure Function (timer + HTTP trigger). Reads all pricing data from SQL, builds a single JSON blob with per-entry SHA-256 hashes, and writes it to Cloudflare KV. Skips the write entirely if the blob hash is unchanged.
KV Keys
| Key | Contents |
|---|---|
pricing:blob | Single JSON blob containing all base pricing rates, aliases, and known models. The usage consumer loads this once on cold start and caches it in memory. |
pricing:version | Monotonically increasing version number. The consumer checks this periodically to detect changes and reload the blob. |
pricing:tenant:{tid} | Per-tenant pricing overlay. Contains customer-specific overrides and markup_pct for margin calculation. |
Single Blob Design
Pricing is stored as a single KV blob rather than individual keys per model. This means the usage consumer performs zero KV reads per event — it loads the blob once, caches it in worker memory, and only reloads when the version changes. This is critical for keeping per-event overhead near zero.
Rate Format
All rates are per-1M-token. For example, a rate of 2.50 means $2.50 per 1,000,000 tokens.
Rates are stored in an abbreviated format:
| Field | Meaning |
|---|---|
i | Input (prompt) token rate |
o | Output (completion) token rate |
crm | Cache read multiplier (fraction of i rate) |
cwm | Cache write multiplier (fraction of i rate) |
ws | Web search request cost (flat rate per request) |
Alias Resolution
Many providers return model IDs with date suffixes or alternative names. The pipeline resolves these to canonical model IDs in three stages:
- Date stripping — Removes date suffixes (e.g.,
gpt-4o-mini-2024-07-18becomesgpt-4o-mini) - Helicone aliases — Uses the alias map from Helicone's API (e.g., provider-specific model names)
- Manual aliases — A hand-maintained list for edge cases not covered by the above
Alias keys are model-only (no provider prefix): "gpt-4o-mini-2024-07-18" resolves to "gpt-4o-mini".
Pricing map keys are provider:model_id (case-insensitive): "openai:gpt-4o-mini".
Customer Overrides
Each tenant can have a pricing overlay stored at pricing:tenant:{tid}. This contains:
- Per-model rate overrides (replaces base rates entirely)
markup_pct— A percentage markup applied on top of base rates for margin calculation
The usage consumer applies the overlay on top of base rates at enrichment time.
Change Detection
To minimise unnecessary KV writes (which have cost and latency implications):
- Each pricing entry has a per-entry SHA-256 hash computed from its rate fields
- The entire blob has a blob-level hash computed from the sorted concatenation of all entry hashes
- CloudflarePricingSync compares the new blob hash against the current
pricing:versionand performs zero KV writes when unchanged
This means the pipeline runs frequently (every 15 minutes) but only writes to KV when pricing actually changes.