What is a reasoning model?

Quick answer

A reasoning model is an LLM that works through a problem step by step — using extra inference-time compute — before giving its final answer.

That deliberation raises accuracy on hard, multi-step tasks like math, science, and software engineering, at the cost of higher latency and more output tokens.

How reasoning models work

Instead of answering in a single forward pass, a reasoning model generates an internal chain of thought first, then summarizes a final response. The longer it is allowed to “think”, the better it tends to do on difficult problems — a pattern often called inference-time or test-time scaling.

Many current flagships expose this as a control: a reasoning effort level, a thinking budget, or a separate “thinking” variant. A growing number are hybrid, answering instantly for easy prompts and escalating to deliberation only when needed.

The trade-off

Better at: competition math, graduate-level science, long-horizon coding and agentic tasks, anything with dependent steps.
Costs more: higher latency and more billed output tokens, since the thinking itself is generated text.
Overkill for: short factual lookups, classification, autocomplete, and high-volume low-stakes calls.

Benchmarks like GPQA Diamond and AIME are the usual evidence a model's reasoning is strong.

Reasoning-oriented models in the catalog

Recent models whose release framing or benchmarks emphasize reasoning:

Model	Lab	GPQA	AIME	Context	Released
Kimi K2.7 Code	Moonshot	-	-	262K	Jun 18, 2026
GLM-5.2	Z.ai	91.2%	-	1M	Jun 17, 2026
MiniMax-M3	MiniMax	-	-	1M	Jun 16, 2026
GPT-5.6	OpenAI	88.1%	-	1.5M	Jun 9, 2026
Claude Fable 5	Anthropic	-	-	—	Jun 9, 2026
Nemotron 3 Ultra 550B-A55B	NVIDIA	-	-	1M	Jun 4, 2026
Qwen3.7-Plus	Qwen	-	-	1M	Jun 2, 2026
Claude Opus 4.8	Anthropic	89%	-	500K	May 28, 2026

Browse the live reasoning model catalog for the full, filterable list.

Frequently asked questions

What is a reasoning model?

A reasoning model is an LLM trained to spend additional inference-time compute — an internal chain of thought — before producing a final answer. This trades latency and token cost for higher accuracy on multi-step problems like math, science, and code.

How is it different from a normal chat model?

A standard chat model responds in roughly one pass, optimized for speed and fluency. A reasoning model deliberates first, often with a tunable 'effort' or 'thinking' level. Many recent models are hybrid: they can answer quickly or switch into a thinking mode for harder questions.

When should I not use a reasoning model?

For short, latency-sensitive, or high-volume tasks — autocomplete, classification, simple Q&A — the extra thinking adds cost and delay without improving the result. Reserve reasoning modes for problems where a wrong answer is expensive or the task has several dependent steps.

Where to go next

See the frontier model leaderboard, learn how benchmarks are scored, or read about open-weight access.