Cost per million tokens
All reportsLLM API pricing comparison
Models with published API pricing, sorted cheapest-first by output cost. Input and output rates are shown separately because they bill differently.
Headline “price” for an LLM is really two numbers: the cost per million input tokens and per million outputtokens. Output is typically the pricier of the two, so the table is sorted by output cost — the figure that tends to dominate generation-heavy and agentic workloads.
Only models with a published price appear below. Open-weight models you run yourself are excluded unless their lab also lists a hosted API rate.
Prices are list rates per 1M tokens in USD and can change without notice; always confirm against the provider before budgeting.
| Model | Lab | Access | Input $/Mtok | Output $/Mtok | Context | Released | Source |
|---|---|---|---|---|---|---|---|
| Grok 4.3 | xAI | Proprietary | $1.25 | $2.5 | 1M | May 6, 2026 | source |
| ERNIE 5.1 | Baidu | Proprietary | $0.59 | $2.65 | 128K | May 8, 2026 | source |
| Kimi K2.7 Code | Moonshot | Modified MIT | $0.95 | $4 | 262K | Jun 18, 2026 | source |
| Qwen3.7-Plus | Qwen | Proprietary | $2.5 | $7.5 | 1M | Jun 2, 2026 | source |
| Qwen3.7-Max | Qwen | Proprietary | $2.5 | $7.5 | 1M | May 19, 2026 | source |
| Claude Opus 4.8 | Anthropic | Proprietary | $15 | $75 | 500K | May 28, 2026 | source |
| Claude Opus 4.6 | Anthropic | Proprietary | $15 | $75 | 200K | Feb 5, 2026 | source |
Frequently asked questions
Why are input and output prices listed separately?
Providers bill input (prompt) and output (generated) tokens at different rates — output is usually several times more expensive. Your real cost depends on the ratio of the two in your workload, so a model that looks cheap on input can be expensive for long generations, and vice-versa.
How do I estimate my actual cost?
Multiply your expected input tokens by the input rate and your expected output tokens by the output rate, both per million, then add them. For chat and retrieval workloads input usually dominates; for drafting and agentic generation, output dominates.
Are open-weight models free?
Self-hosting open-weight models has no per-token licence fee, but you still pay for the hardware or a hosting provider's API. The prices here are list API prices for the model's primary hosted endpoint, where one is published.
Related