Step-3.7-Flash

Available

StepFunOpen source

StepFun's high-efficiency multimodal sparse-MoE successor to Step-3.5-Flash: a ~196B-total / ~11B-active vision-language model with native image and video understanding, a 256K context, and selectable reasoning tiers (high/medium/low). Tuned for coding agents and search workflows.

Official page ↗Model card ↗

Specifications

License: Open source · Apache-2.0
Weights: Downloadable
Architecture: Mixture-of-Experts
Parameters: 196B · 11B active
Context window: 256K tokens
Max output: —
Knowledge cutoff: —
Price (in / out, $/M): $0.2 / $1.15
Modalities: TextVisionVideoCode

Benchmarks

No benchmark scores recorded yet. Spotted some? Submit a correction.

Vendor-reported figures are claims until independently verified. See methodology.