z.ai

Route your z.ai API calls through AI SpendOps for automatic usage tracking and cost attribution.

Configuration

Setting	Value
Route	`/v1/zai/*`
Upstream	`https://api.z.ai/api/paas/v4`
Auth header	`Authorization: Bearer ...`
Streaming usage	Auto-injected (`stream_options.include_usage`)

SDK base URL

https://proxy.aispendops.com/v1/zai

Example

curl https://proxy.aispendops.com/v1/zai/chat/completions \
  -H "Authorization: Bearer your-zai-key" \
  -H "X-ASO-API-Key: aso_k_yourkey.secret" \
  -H "Content-Type: application/json" \
  -d '{"model":"glm-4.5","messages":[{"role":"user","content":"Hello"}]}'

Usage fields

Field	Description
`prompt_tokens`	Input tokens
`completion_tokens`	Output tokens (includes reasoning tokens when thinking mode is enabled)
`cache_read_tokens`	Cached input tokens (via `prompt_tokens_details.cached_tokens`)

Thinking mode

z.ai supports a thinking/reasoning mode for GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1 models. Enable it by adding "thinking": {"type": "enabled"} to your request body. The proxy passes this through transparently.

Note: Reasoning tokens are included in completion_tokens and are not reported separately.

Notes

z.ai (formerly Zhipu AI) provides the GLM model family. OpenAI-compatible API.
Models include GLM-4.5, GLM-4.7, GLM-5, and GLM-5.1.

Configuration​

SDK base URL​

Example​

Usage fields​

Thinking mode​

Notes​