Skip to main content

Streaming

AI SpendOps fully supports streaming for all providers. Use "stream": true in your request body as you normally would.

How it works

  • SSE streams are forwarded byte-for-byte — no buffering
  • Client aborts are respected
  • Usage data is extracted from the stream asynchronously

Automatic stream options injection

For OpenAI-compatible providers, the proxy automatically injects stream_options if you haven't set it:

{
"stream_options": { "include_usage": true }
}

This ensures usage data is included in the final stream chunk. It does not affect the response content you receive.

Providers with auto-injection: OpenAI, Google AI Studio, xAI

Native streaming usage

Some providers always include usage in streaming responses:

  • Anthropic — Usage arrives in message_start (input) and message_delta (output) events
  • OpenRouter — Usage always included in the final chunk

Provider-specific notes

Anthropic: Use /v1/messages for accurate streaming

Anthropic's native endpoint (/v1/messages) includes full usage data in streaming responses. However, their OpenAI-compatible endpoint (/v1/chat/completions) does not return usage data during streaming and does not support stream_options.

When streaming through the OpenAI-compatible endpoint, token counts will be estimated rather than provider-reported. For accurate usage tracking, use the native /v1/messages endpoint.

# Recommended: native endpoint with accurate streaming usage
from anthropic import Anthropic

client = Anthropic(
api_key="sk-ant-...",
base_url="https://proxy.aispendops.com/v1/anthropic",
default_headers={"X-ASO-API-Key": "aso_k_yourkey.secret"},
)

with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
) as stream:
for text in stream.text_stream:
print(text, end="")