GLM-5V-Turbo

Available

Z.ai (Zhipu AI)Proprietary

Z.ai's native-multimodal vision agent: the first GLM model designed from the start as a multimodal agent, taking image, video, and text input and producing agent-oriented output (tool calling, task decomposition, and GUI interaction). Served via API with a ~203K-token context.

Official page ↗

Specifications

License: Proprietary
Weights: Not released
Architecture: unknown
Parameters: Undisclosed
Context window: 203K tokens
Max output: 131K tokens
Knowledge cutoff: —
Price (in / out, $/M): $1.2 / $4
Modalities: TextVisionCode

Benchmarks

No benchmark scores recorded yet. Spotted some? Submit a correction.

Vendor-reported figures are claims until independently verified. See methodology.