Skip to main content

z.ai

Route your z.ai API calls through AI SpendOps for automatic usage tracking and cost attribution.

Configuration

SettingValue
Route/v1/zai/*
Upstreamhttps://api.z.ai/api/paas/v4
Auth headerAuthorization: Bearer ...
Streaming usageAuto-injected (stream_options.include_usage)

SDK base URL

https://proxy.aispendops.com/v1/zai

Example

curl https://proxy.aispendops.com/v1/zai/chat/completions \
-H "Authorization: Bearer your-zai-key" \
-H "X-ASO-API-Key: aso_k_yourkey.secret" \
-H "Content-Type: application/json" \
-d '{"model":"glm-4.5","messages":[{"role":"user","content":"Hello"}]}'

Usage fields

FieldDescription
prompt_tokensInput tokens
completion_tokensOutput tokens (includes reasoning tokens when thinking mode is enabled)
cache_read_tokensCached input tokens (via prompt_tokens_details.cached_tokens)

Thinking mode

z.ai supports a thinking/reasoning mode for GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1 models. Enable it by adding "thinking": {"type": "enabled"} to your request body. The proxy passes this through transparently.

Note: Reasoning tokens are included in completion_tokens and are not reported separately.

Notes

  • z.ai (formerly Zhipu AI) provides the GLM model family. OpenAI-compatible API.
  • Models include GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1.