Question 1

Why are input and output prices listed separately?

Accepted Answer

Providers bill input (prompt) and output (generated) tokens at different rates — output is usually several times more expensive. Your real cost depends on the ratio of the two in your workload, so a model that looks cheap on input can be expensive for long generations, and vice-versa.

Question 2

How do I estimate my actual cost?

Accepted Answer

Multiply your expected input tokens by the input rate and your expected output tokens by the output rate, both per million, then add them. For chat and retrieval workloads input usually dominates; for drafting and agentic generation, output dominates.

Question 3

Are open-weight models free?

Accepted Answer

Self-hosting open-weight models has no per-token licence fee, but you still pay for the hardware or a hosting provider's API. The prices here are list API prices for the model's primary hosted endpoint, where one is published.

Model	Lab	Access	Input $/Mtok	Output $/Mtok	Context	Released	Source
Grok 4.3	xAI	Proprietary	$1.25	$2.5	1M	May 6, 2026	source
ERNIE 5.1	Baidu	Proprietary	$0.59	$2.65	128K	May 8, 2026	source
Kimi K2.7 Code	Moonshot	Modified MIT	$0.95	$4	262K	Jun 18, 2026	source
Qwen3.7-Plus	Qwen	Proprietary	$2.5	$7.5	1M	Jun 2, 2026	source
Qwen3.7-Max	Qwen	Proprietary	$2.5	$7.5	1M	May 19, 2026	source
Claude Opus 4.8	Anthropic	Proprietary	$15	$75	500K	May 28, 2026	source
Claude Opus 4.6	Anthropic	Proprietary	$15	$75	200K	Feb 5, 2026	source

LLM API pricing comparison

Frequently asked questions