Quick answer
A reasoning model is an LLM that works through a problem step by step — using extra inference-time compute — before giving its final answer.
That deliberation raises accuracy on hard, multi-step tasks like math, science, and software engineering, at the cost of higher latency and more output tokens.
How reasoning models work
Instead of answering in a single forward pass, a reasoning model generates an internal chain of thought first, then summarizes a final response. The longer it is allowed to “think”, the better it tends to do on difficult problems — a pattern often called inference-time or test-time scaling.
Many current flagships expose this as a control: a reasoning effort level, a thinking budget, or a separate “thinking” variant. A growing number are hybrid, answering instantly for easy prompts and escalating to deliberation only when needed.
The trade-off
- Better at: competition math, graduate-level science, long-horizon coding and agentic tasks, anything with dependent steps.
- Costs more: higher latency and more billed output tokens, since the thinking itself is generated text.
- Overkill for: short factual lookups, classification, autocomplete, and high-volume low-stakes calls.
Benchmarks like GPQA Diamond and AIME are the usual evidence a model's reasoning is strong.
Reasoning-oriented models in the catalog
Recent models whose release framing or benchmarks emphasize reasoning:
| Model | Lab | GPQA | AIME | Context | Released |
|---|---|---|---|---|---|
| Kimi K2.7 Code | Moonshot | - | - | 262K | Jun 18, 2026 |
| GLM-5.2 | Z.ai | 91.2% | - | 1M | Jun 17, 2026 |
| MiniMax-M3 | MiniMax | - | - | 1M | Jun 16, 2026 |
| GPT-5.6 | OpenAI | 88.1% | - | 1.5M | Jun 9, 2026 |
| Claude Fable 5 | Anthropic | - | - | — | Jun 9, 2026 |
| Nemotron 3 Ultra 550B-A55B | NVIDIA | - | - | 1M | Jun 4, 2026 |
| Qwen3.7-Plus | Qwen | - | - | 1M | Jun 2, 2026 |
| Claude Opus 4.8 | Anthropic | 89% | - | 500K | May 28, 2026 |
Browse the live reasoning model catalog for the full, filterable list.
Frequently asked questions
What is a reasoning model?
A reasoning model is an LLM trained to spend additional inference-time compute — an internal chain of thought — before producing a final answer. This trades latency and token cost for higher accuracy on multi-step problems like math, science, and code.
How is it different from a normal chat model?
A standard chat model responds in roughly one pass, optimized for speed and fluency. A reasoning model deliberates first, often with a tunable 'effort' or 'thinking' level. Many recent models are hybrid: they can answer quickly or switch into a thinking mode for harder questions.
When should I not use a reasoning model?
For short, latency-sensitive, or high-volume tasks — autocomplete, classification, simple Q&A — the extra thinking adds cost and delay without improving the result. Reserve reasoning modes for problems where a wrong answer is expensive or the task has several dependent steps.
Where to go next
See the frontier model leaderboard, learn how benchmarks are scored, or read about open-weight access.