Downloadable models
All reportsLocal LLM shortlist
A practical shortlist of downloadable releases with compact total size or low active-parameter MoE designs, meant for local evaluation or self-hosted deployments.
Memory estimates are rough weight-only estimates: about 0.55GB per billion parameters for 4-bit quantization and 2.05GB per billion for FP16. Runtime overhead, KV cache, batching, serving stack, and context length can materially increase requirements.
Single-workstation candidates
| Model | Lab | Params | INT4 rough | FP16 rough | Context | License | Released | Source |
|---|---|---|---|---|---|---|---|---|
| LFM2 1.2B | Liquid | 1.17B | 1GB | 3GB | 33K | LFM Open License v1.0 | Nov 28, 2025 | source |
| SmolLM2 1.7B | Hugging Face | 1.7B | 1GB | 4GB | — | Apache-2.0 | Nov 4, 2024 | source |
| Stable LM 2 1.6B | Stability | 1.6B | 1GB | 4GB | — | Stability AI Membership | Jan 19, 2024 | source |
| TinyLlama 1.1B Chat | TinyLlama | 1.1B | 1GB | 3GB | — | Apache-2.0 | Jan 1, 2024 | source |
| Phi-1 | Microsoft | 1.3B | 1GB | 3GB | — | MIT | Jun 21, 2023 | source |
| GPT-2 | OpenAI | 1.5B | 1GB | 4GB | 1K | MIT | Nov 5, 2019 | source |
| BERT | DeepMind | 0.34B | 1GB | 1GB | 512 | Apache-2.0 | Oct 11, 2018 | source |
| SmolLM3 3B | Hugging Face | 3B | 2GB | 7GB | 128K | Apache-2.0 | Jul 8, 2025 | source |
| Sarvam-1 | Sarvam | 2B | 2GB | 5GB | — | Sarvam AI License | Oct 22, 2024 | source |
| Phi-2 | Microsoft | 2.7B | 2GB | 6GB | — | MIT | Dec 12, 2023 | source |
| Phi-3 Mini | Microsoft | 3.8B | 3GB | 8GB | 128K | MIT | Apr 23, 2024 | source |
| Qwen2.5-Omni-7B | Qwen | 7B | 4GB | 15GB | — | Qwen License | Mar 26, 2025 | source |
| OLMoE 1B-7B | Ai2 | 7B · 1B active | 4GB | 15GB | — | Apache-2.0 | Sep 3, 2024 | source |
| CodeGemma 7B | DeepMind | 7B | 4GB | 15GB | 8K | Gemma Terms of Use | Apr 9, 2024 | source |
| Gemma 7B | DeepMind | 7B | 4GB | 15GB | 8K | Gemma Terms of Use | Feb 21, 2024 | source |
| OLMo 7B | Ai2 | 7B | 4GB | 15GB | 4K | Apache-2.0 | Feb 1, 2024 | source |
| OpenChat 3.5 | OpenChat | 7B | 4GB | 15GB | — | Apache-2.0 | Jan 6, 2024 | source |
| OpenHathi-7B | Sarvam | 7B | 4GB | 15GB | — | Llama 2 Community License | Dec 12, 2023 | source |
| Mistral 7B | Mistral | 7B | 4GB | 15GB | 8K | Apache-2.0 | Sep 27, 2023 | source |
| Qwen-7B | Qwen | 7B | 4GB | 15GB | 32K | Tongyi Qianwen License | Aug 3, 2023 | source |
| ChatGLM2-6B | Z.ai | 6B | 4GB | 13GB | 32K | ChatGLM2 License | Jun 25, 2023 | source |
| MPT-7B | Databricks | 7B | 4GB | 15GB | 2K | Apache-2.0 | May 5, 2023 | source |
| ChatGLM-6B | Z.ai | 6B | 4GB | 13GB | 2K | ChatGLM License | Mar 14, 2023 | source |
| Granite 3.3 8B | IBM | 8B | 5GB | 17GB | 128K | Apache-2.0 | Apr 30, 2025 | source |
| Granite 3.2 8B | IBM | 8B | 5GB | 17GB | 128K | Apache-2.0 | Feb 26, 2025 | source |
| DeepHermes 3 Llama 3 8B | Nous | 8B | 5GB | 17GB | 8K | Llama 3 Community License | Feb 18, 2025 | source |
| Dolphin 3.0 Llama 3.1 8B | Cognitive | 8B | 5GB | 17GB | 128K | Llama 3.1 Community License | Feb 2, 2025 | source |
| Granite 3.1 8B | IBM | 8B | 5GB | 17GB | 128K | Apache-2.0 | Dec 18, 2024 | source |
| Command R7B | Cohere | 8B | 5GB | 17GB | 128K | CC-BY-NC 4.0 | Dec 13, 2024 | source |
| Granite 3.0 8B | IBM | 8B | 5GB | 17GB | 4K | Apache-2.0 | Oct 21, 2024 | source |
| Yi-Coder-9B | 01.AI | 9B | 5GB | 19GB | 128K | Yi License | Sep 5, 2024 | source |
| EXAONE 3.0 7.8B | LG AI | 7.8B | 5GB | 16GB | — | EXAONE AI Model License | Aug 7, 2024 | source |
| MiniCPM-V 2.6 | OpenBMB | 8B | 5GB | 17GB | — | MiniCPM Model License | Aug 2, 2024 | source |
| GLM-4-9B | Z.ai | 9B | 5GB | 19GB | 128K | GLM-4 License | Jun 5, 2024 | source |
| Kimi-Audio-7B-Instruct | Moonshot | 10B | 6GB | 21GB | — | MIT | Apr 25, 2025 | source |
| Falcon 3 10B | TII | 10B | 6GB | 21GB | 32K | TII Falcon License 2.0 | Dec 17, 2024 | source |
| Pixtral 12B | Mistral | 12B | 7GB | 25GB | 128K | Apache-2.0 | Sep 17, 2024 | source |
| Mistral NeMo | Mistral | 12B | 7GB | 25GB | 128K | Apache-2.0 | Jul 18, 2024 | source |
| Falcon 2 11B | TII | 11B | 7GB | 23GB | 8K | TII Falcon License | May 13, 2024 | source |
| Phi-4 Reasoning | Microsoft | 14B | 8GB | 29GB | — | MIT | Apr 30, 2025 | source |
| Phi-4 | Microsoft | 14B | 8GB | 29GB | 16K | MIT | Dec 12, 2024 | source |
| LLaVA 1.5 13B | LLaVA | 13B | 8GB | 27GB | — | Llama 2 / research license | Sep 30, 2023 | source |
| Qwen-14B | Qwen | 14B | 8GB | 29GB | 8K | Tongyi Qianwen License | Sep 25, 2023 | source |
| Granite 13B | IBM | 13B | 8GB | 27GB | — | Apache-2.0 | Sep 7, 2023 | source |
| Nous-Hermes-Llama2-13B | Nous | 13B | 8GB | 27GB | 4K | Llama 2 Community License | Jul 24, 2023 | source |
| Vicuna 13B | LMSYS | 13B | 8GB | 27GB | — | LLaMA Research License | Mar 30, 2023 | source |
| Kimi-VL-A3B-Thinking-2506 | Moonshot | 16B · 3B active | 9GB | 33GB | 128K | MIT | Jun 21, 2025 | source |
| Kimi-VL-A3B-Instruct | Moonshot | 16B · 3B active | 9GB | 33GB | 128K | MIT | Apr 17, 2025 | source |
| Moonlight-16B-A3B-Instruct | Moonshot | 16B · 3B active | 9GB | 33GB | 8K | MIT | Feb 24, 2025 | source |
| StarCoder2 15B | BigCode | 16B | 9GB | 33GB | 16K | BigCode OpenRAIL-M v1 | Feb 28, 2024 | source |
| DeepSeekMoE 16B | DeepSeek | 16B · 2.8B active | 9GB | 33GB | 4K | DeepSeek License | Jan 11, 2024 | source |
| gpt-oss-20b | OpenAI | 21B · 3.6B active | 12GB | 44GB | 128K | Apache-2.0 | Aug 5, 2025 | source |
| Codestral 22B | Mistral | 22B | 13GB | 46GB | 32K | Mistral AI Non-Production License | May 29, 2024 | source |
| Magistral Small | Mistral | 24B | 14GB | 50GB | 40K | Apache-2.0 | Jun 10, 2025 | source |
| Mistral Small 3.1 | Mistral | 24B | 14GB | 50GB | 128K | Apache-2.0 | Mar 17, 2025 | source |
| Mistral Small 3 | Mistral | 24B | 14GB | 50GB | 32K | Apache-2.0 | Jan 30, 2025 | source |
| Qwen3.6-27B | Qwen | 27B | 15GB | 56GB | 256K | Apache-2.0 | May 12, 2026 | source |
| Gemma 3 27B | DeepMind | 27B | 15GB | 56GB | 128K | Gemma Terms of Use | Sep 4, 2025 | source |
| Gemma 2 27B | DeepMind | 27B | 15GB | 56GB | 8K | Gemma Terms of Use | Jun 27, 2024 | source |
| Nemotron 3 Nano 30B-A3B | NVIDIA | 30B · 3B active | 17GB | 62GB | 1M | Nemotron Open Model License | Dec 15, 2025 | source |
| Gemma 4 31B | DeepMind | 31B | 18GB | 64GB | — | Apache-2.0 | Apr 2, 2026 | source |
| OLMo 3 Think 32B | Ai2 | 32B | 18GB | 66GB | — | Apache-2.0 | Dec 15, 2025 | source |
| EXAONE 4.0 32B | LG AI | 32B | 18GB | 66GB | — | EXAONE AI Model License | Jul 15, 2025 | source |
| OLMo 2 32B | Ai2 | 32B | 18GB | 66GB | 4K | Apache-2.0 | Mar 13, 2025 | source |
| EXAONE 3.5 32B | LG AI | 32B | 18GB | 66GB | 32K | EXAONE AI Model License | Dec 9, 2024 | source |
| QwQ-32B-Preview | Qwen | 32B | 18GB | 66GB | 32K | Apache-2.0 | Nov 28, 2024 | source |
| Qwen2.5-Coder-32B | Qwen | 32B | 18GB | 66GB | 128K | Apache-2.0 | Nov 12, 2024 | source |
| Falcon-H1 34B | TII | 34B | 19GB | 70GB | 256K | Falcon LLM License 2.0 | Jul 31, 2025 | source |
| Yi-1.5-34B | 01.AI | 34B | 19GB | 70GB | 4K | Yi License | May 13, 2024 | source |
| Granite Code 34B | IBM | 34B | 19GB | 70GB | 8K | Apache-2.0 | May 6, 2024 | source |
| Yi-34B | 01.AI | 34B | 19GB | 70GB | 200K | Yi License | Nov 6, 2023 | source |
| DeepSeek Coder 33B | DeepSeek | 33B | 19GB | 68GB | 16K | DeepSeek License | Nov 2, 2023 | source |
| Code Llama 34B | Meta | 34B | 19GB | 70GB | 16K | Llama 2 Community License | Aug 24, 2023 | source |
| Aya 23 35B | Cohere | 35B | 20GB | 72GB | — | CC-BY-NC-4.0 | May 23, 2024 | source |
Larger local or small-cluster candidates
| Model | Lab | Params | INT4 rough | FP16 rough | Context | License | Released | Source |
|---|---|---|---|---|---|---|---|---|
| Seed-OSS-36B-Instruct | Seed | 36B | 20GB | 74GB | 512K | Apache-2.0 | Aug 20, 2025 | source |
| Falcon 40B | TII | 40B | 22GB | 82GB | 2K | Falcon License | May 25, 2023 | source |
| Phi-3.5 MoE | Microsoft | 42B · 6.6B active | 24GB | 87GB | 128K | MIT | Aug 20, 2024 | source |
| Nous Hermes 2 Mixtral | Nous | 47B · 13B active | 26GB | 97GB | 32K | Apache-2.0 | Jan 11, 2024 | source |
| Mixtral 8x7B | Mistral | 47B · 13B active | 26GB | 97GB | 32K | Apache-2.0 | Dec 11, 2023 | source |
| Kimi-Linear-48B-A3B-Instruct | Moonshot | 48B · 3B active | 27GB | 99GB | 1.0M | MIT | Oct 31, 2025 | source |
| Llama-3.3-Nemotron-Super-49B | NVIDIA | 49B | 27GB | 101GB | 128K | NVIDIA Open Model License | Apr 2, 2025 | source |
| Jamba | AI21 | 52B · 12B active | 29GB | 107GB | 256K | Jamba Open Model License | Mar 28, 2024 | source |
| LLaMA | Meta | 65B | 36GB | 134GB | 2K | LLaMA Research License | Feb 24, 2023 | source |
| DeepSeek LLM 67B | DeepSeek | 67B | 37GB | 138GB | 4K | DeepSeek License | Nov 29, 2023 | source |
| Llama 3.3 70B | Meta | 70B | 39GB | 144GB | 128K | Llama 3.3 Community License | Dec 6, 2024 | source |
| Llama-3.1-Nemotron-70B | NVIDIA | 70B | 39GB | 144GB | 128K | Llama 3.1 Community License | Oct 15, 2024 | source |
| Llama 3 70B | Meta | 70B | 39GB | 144GB | 8K | Llama 3 Community License | Apr 18, 2024 | source |
| Llama 2 70B | Meta | 70B | 39GB | 144GB | 4K | Llama 2 Community License | Jul 18, 2023 | source |
| Qwen2.5-VL-72B | Qwen | 72B | 40GB | 148GB | 128K | Qwen License | Jan 26, 2025 | source |
| Molmo 72B | Ai2 | 72B | 40GB | 148GB | — | Apache-2.0 / OLMo License | Sep 25, 2024 | source |
| Qwen2.5-72B | Qwen | 72B | 40GB | 148GB | 128K | Qwen License | Sep 19, 2024 | source |
| Qwen2-72B | Qwen | 72B | 40GB | 148GB | 128K | Qwen License | Jun 7, 2024 | source |
| Qwen-72B | Qwen | 72B | 40GB | 148GB | 32K | Tongyi Qianwen License | Nov 30, 2023 | source |
| Kimi-Dev-72B | Moonshot | 73B | 41GB | 150GB | — | MIT | Jun 17, 2025 | source |
| Hunyuan-A13B-Instruct | Hunyuan | 80B · 13B active | 44GB | 164GB | — | Tencent Hunyuan A13B License | Apr 22, 2026 | source |
| Qwen3-Coder-Next | Qwen | 80B · 3B active | 44GB | 164GB | 262K | Apache-2.0 | Feb 3, 2026 | source |
| Sarvam-105B | Sarvam | 105B · 10.3B active | 58GB | 216GB | 128K | Apache-2.0 | Mar 6, 2026 | source |
| GLM-4.5V | Z.ai | 106B · 12B active | 59GB | 218GB | — | MIT | Aug 11, 2025 | source |
| GLM-4.5-Air | Z.ai | 106B · 12B active | 59GB | 218GB | 128K | MIT | Jul 28, 2025 | source |
| gpt-oss-120b | OpenAI | 117B · 5.1B active | 65GB | 240GB | 128K | Apache-2.0 | Aug 5, 2025 | source |
| Nemotron 3 Super 120B-A12B | NVIDIA | 120B · 12B active | 66GB | 246GB | 1M | Nemotron Open Model License | Mar 16, 2026 | source |