z.ai
Route your z.ai API calls through AI SpendOps for automatic usage tracking and cost attribution.
Configuration
| Setting | Value |
|---|---|
| Route | /v1/zai/* |
| Upstream | https://api.z.ai/api/paas/v4 |
| Auth header | Authorization: Bearer ... |
| Streaming usage | Auto-injected (stream_options.include_usage) |
SDK base URL
https://proxy.aispendops.com/v1/zai
Example
curl https://proxy.aispendops.com/v1/zai/chat/completions \
-H "Authorization: Bearer your-zai-key" \
-H "X-ASO-API-Key: aso_k_yourkey.secret" \
-H "Content-Type: application/json" \
-d '{"model":"glm-4.5","messages":[{"role":"user","content":"Hello"}]}'
Usage fields
| Field | Description |
|---|---|
prompt_tokens | Input tokens |
completion_tokens | Output tokens (includes reasoning tokens when thinking mode is enabled) |
cache_read_tokens | Cached input tokens (via prompt_tokens_details.cached_tokens) |
Thinking mode
z.ai supports a thinking/reasoning mode for GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1 models. Enable it by adding "thinking": {"type": "enabled"} to your request body. The proxy passes this through transparently.
Note: Reasoning tokens are included in completion_tokens and are not reported separately.
Notes
- z.ai (formerly Zhipu AI) provides the GLM model family. OpenAI-compatible API.
- Models include GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1.