Streaming

AI SpendOps fully supports streaming for all providers. Use "stream": true in your request body as you normally would.

How it works

SSE streams are forwarded byte-for-byte — no buffering
Client aborts are respected
Usage data is extracted from the stream asynchronously

Automatic stream options injection

For OpenAI-compatible providers, the proxy automatically injects stream_options if you haven't set it:

{
  "stream_options": { "include_usage": true }
}

This ensures usage data is included in the final stream chunk. It does not affect the response content you receive.

Providers with auto-injection: OpenAI, Google AI Studio, xAI, Groq, Novita, Perplexity, Cerebras, DeepSeek, Nebius

Providers without injection (do not support stream_options): DeepInfra, Fireworks, Mistral

Native streaming usage

Some providers always include usage in streaming responses:

Anthropic — Usage arrives in message_start (input) and message_delta (output) events
OpenRouter — Usage always included in the final chunk

Provider-specific notes

Anthropic: Use `/v1/messages` for accurate streaming

Anthropic's native endpoint (/v1/messages) includes full usage data in streaming responses. However, their OpenAI-compatible endpoint (/v1/chat/completions) does not return usage data during streaming and does not support stream_options.

When streaming through the OpenAI-compatible endpoint, token counts will be estimated rather than provider-reported. For accurate usage tracking, use the native /v1/messages endpoint.

# Recommended: native endpoint with accurate streaming usage
from anthropic import Anthropic

client = Anthropic(
    api_key="sk-ant-...",
    base_url="https://proxy.aispendops.com/v1/anthropic",
    default_headers={"X-ASO-API-Key": "aso_k_yourkey.secret"},
)

with client.messages.stream(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

How it works​

Automatic stream options injection​

Native streaming usage​

Provider-specific notes​

Anthropic: Use /v1/messages for accurate streaming​

How it works

Automatic stream options injection

Native streaming usage

Provider-specific notes

Anthropic: Use `/v1/messages` for accurate streaming