Skip to main content

Pricing Pipeline

The pricing pipeline ensures the usage consumer always has up-to-date, accurate pricing data available in Cloudflare KV with zero per-event KV reads.

Architecture

Helicone API → PricingSync (Azure Function) → SQL → CloudflarePricingSync (Azure Function) → CF KV → usage-consumer
  1. PricingSync — Azure Function (timer-triggered). Pulls the latest model pricing from the Helicone API, reconciles it against the pricing_models table in SQL, and inserts/updates rows. Computes model expense rankings at the same time.
  2. CloudflarePricingSync — Azure Function (timer + HTTP trigger). Reads all pricing data from SQL, builds a single JSON blob with per-entry SHA-256 hashes, and writes it to Cloudflare KV. Skips the write entirely if the blob hash is unchanged.

KV Keys

KeyContents
pricing:blobSingle JSON blob containing all base pricing rates, aliases, and known models. The usage consumer loads this once on cold start and caches it in memory.
pricing:versionMonotonically increasing version number. The consumer checks this periodically to detect changes and reload the blob.
pricing:tenant:{tid}Per-tenant pricing overlay. Contains customer-specific overrides and markup_pct for margin calculation.

Single Blob Design

Pricing is stored as a single KV blob rather than individual keys per model. This means the usage consumer performs zero KV reads per event — it loads the blob once, caches it in worker memory, and only reloads when the version changes. This is critical for keeping per-event overhead near zero.

Rate Format

All rates are per-1M-token. For example, a rate of 2.50 means $2.50 per 1,000,000 tokens.

Rates are stored in an abbreviated format:

FieldMeaning
iInput (prompt) token rate
oOutput (completion) token rate
crmCache read multiplier (fraction of i rate)
cwmCache write multiplier (fraction of i rate)
wsWeb search request cost (flat rate per request)

Alias Resolution

Many providers return model IDs with date suffixes or alternative names. The pipeline resolves these to canonical model IDs in three stages:

  1. Date stripping — Removes date suffixes (e.g., gpt-4o-mini-2024-07-18 becomes gpt-4o-mini)
  2. Helicone aliases — Uses the alias map from Helicone's API (e.g., provider-specific model names)
  3. Manual aliases — A hand-maintained list for edge cases not covered by the above

Alias keys are model-only (no provider prefix): "gpt-4o-mini-2024-07-18" resolves to "gpt-4o-mini".

Pricing map keys are provider:model_id (case-insensitive): "openai:gpt-4o-mini".

Customer Overrides

Each tenant can have a pricing overlay stored at pricing:tenant:{tid}. This contains:

  • Per-model rate overrides (replaces base rates entirely)
  • markup_pct — A percentage markup applied on top of base rates for margin calculation

The usage consumer applies the overlay on top of base rates at enrichment time.

Change Detection

To minimise unnecessary KV writes (which have cost and latency implications):

  • Each pricing entry has a per-entry SHA-256 hash computed from its rate fields
  • The entire blob has a blob-level hash computed from the sorted concatenation of all entry hashes
  • CloudflarePricingSync compares the new blob hash against the current pricing:version and performs zero KV writes when unchanged

This means the pipeline runs frequently (every 15 minutes) but only writes to KV when pricing actually changes.