Streaming
AI SpendOps fully supports streaming for all providers. Use "stream": true in your request body as you normally would.
How it works
- SSE streams are forwarded byte-for-byte — no buffering
- Client aborts are respected
- Usage data is extracted from the stream asynchronously
Automatic stream options injection
For OpenAI-compatible providers, the proxy automatically injects stream_options if you haven't set it:
{
"stream_options": { "include_usage": true }
}
This ensures usage data is included in the final stream chunk. It does not affect the response content you receive.
Providers with auto-injection: OpenAI, Google AI Studio, xAI
Native streaming usage
Some providers always include usage in streaming responses:
- Anthropic — Usage arrives in
message_start(input) andmessage_delta(output) events - OpenRouter — Usage always included in the final chunk
Provider-specific notes
Anthropic: Use /v1/messages for accurate streaming
Anthropic's native endpoint (/v1/messages) includes full usage data in streaming responses. However, their OpenAI-compatible endpoint (/v1/chat/completions) does not return usage data during streaming and does not support stream_options.
When streaming through the OpenAI-compatible endpoint, token counts will be estimated rather than provider-reported. For accurate usage tracking, use the native /v1/messages endpoint.
# Recommended: native endpoint with accurate streaming usage
from anthropic import Anthropic
client = Anthropic(
api_key="sk-ant-...",
base_url="https://proxy.aispendops.com/v1/anthropic",
default_headers={"X-ASO-API-Key": "aso_k_yourkey.secret"},
)
with client.messages.stream(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
) as stream:
for text in stream.text_stream:
print(text, end="")