LLM Releases

Model types

All guides

What is a reasoning model?

“Reasoning” or “thinking” models spend extra compute deliberating before they answer. This guide explains what that buys you, what it costs, and when to reach for one.

Quick answer

A reasoning model is an LLM that works through a problem step by step — using extra inference-time compute — before giving its final answer.

That deliberation raises accuracy on hard, multi-step tasks like math, science, and software engineering, at the cost of higher latency and more output tokens.

How reasoning models work

Instead of answering in a single forward pass, a reasoning model generates an internal chain of thought first, then summarizes a final response. The longer it is allowed to “think”, the better it tends to do on difficult problems — a pattern often called inference-time or test-time scaling.

Many current flagships expose this as a control: a reasoning effort level, a thinking budget, or a separate “thinking” variant. A growing number are hybrid, answering instantly for easy prompts and escalating to deliberation only when needed.

The trade-off

  • Better at: competition math, graduate-level science, long-horizon coding and agentic tasks, anything with dependent steps.
  • Costs more: higher latency and more billed output tokens, since the thinking itself is generated text.
  • Overkill for: short factual lookups, classification, autocomplete, and high-volume low-stakes calls.

Benchmarks like GPQA Diamond and AIME are the usual evidence a model's reasoning is strong.

Reasoning-oriented models in the catalog

Recent models whose release framing or benchmarks emphasize reasoning:

ModelLabGPQAAIMEContextReleased
Kimi K2.7 CodeMoonshot--262KJun 18, 2026
GLM-5.2Z.ai91.2%-1MJun 17, 2026
MiniMax-M3MiniMax--1MJun 16, 2026
GPT-5.6OpenAI88.1%-1.5MJun 9, 2026
Claude Fable 5Anthropic--Jun 9, 2026
Nemotron 3 Ultra 550B-A55BNVIDIA--1MJun 4, 2026
Qwen3.7-PlusQwen--1MJun 2, 2026
Claude Opus 4.8Anthropic89%-500KMay 28, 2026

Browse the live reasoning model catalog for the full, filterable list.

Frequently asked questions

What is a reasoning model?

A reasoning model is an LLM trained to spend additional inference-time compute — an internal chain of thought — before producing a final answer. This trades latency and token cost for higher accuracy on multi-step problems like math, science, and code.

How is it different from a normal chat model?

A standard chat model responds in roughly one pass, optimized for speed and fluency. A reasoning model deliberates first, often with a tunable 'effort' or 'thinking' level. Many recent models are hybrid: they can answer quickly or switch into a thinking mode for harder questions.

When should I not use a reasoning model?

For short, latency-sensitive, or high-volume tasks — autocomplete, classification, simple Q&A — the extra thinking adds cost and delay without improving the result. Reserve reasoning modes for problems where a wrong answer is expensive or the task has several dependent steps.

Where to go next

See the frontier model leaderboard, learn how benchmarks are scored, or read about open-weight access.