Multimodal LLM releases

Moonshot AIFrontierOpen weights

Moonshot's open coding-focused agentic model built on K2.6, with native vision/video input, forced thinking mode, and stronger long-horizon software-engineering performance.

MoE1T262K ctxJun 18, 2026

MiniMax-M3

MiniMaxFrontierOpen weights

Native multimodal MiniMax model with a one-million-token context, sparse attention, and agentic coding/cowork positioning.

MoE428B1M ctxJun 16, 2026

GPT-5.6

Preview

OpenAIFrontierProprietary

OpenAI's mid-2026 flagship, headlined by an industry-leading 1.5M-token context window and long-horizon agentic tool use.

MoEUndisc.1.5M ctxJun 9, 2026

Claude Fable 5

Withdrawn

AnthropicFrontierProprietary

The public, guardrailed sibling of Mythos and Anthropic's most capable widely-released model, built for long-horizon agentic work. Launched June 9, 2026 across the Claude API, AWS, and Microsoft Foundry — then pulled three days later under a US government export-control directive barring access by foreign nationals.

—Undisc.— ctxJun 9, 2026

Claude Opus 4.8

AnthropicFrontierProprietary

Anthropic's most capable model, with strengthened agentic and long-running task performance.

—Undisc.500K ctxMay 28, 2026

Gemini 3.5 Pro

Preview

Google DeepMindFrontierProprietary

Announced at Google I/O 2026; emphasizes deep multimodal reasoning over a 2M-token context.

MoEUndisc.2M ctxMay 19, 2026

Grok 4.3

MoEUndisc.1M ctxMay 6, 2026

xAIFrontierProprietary

xAI's agentic flagship with a 1M-token context and aggressive API pricing.

Gemma 4 31B

Google DeepMindOpen source

Google DeepMind's Gemma 4 advanced-reasoning open model for personal computers, part of the April 2026 Gemma 4 family.

Dense31B— ctxApr 2, 2026

Kimi K2.6

Moonshot AIFrontierOpen weights

Moonshot's open native multimodal agentic model for long-horizon coding, visual interface generation, and autonomous tool orchestration.

MoE1T256K ctxMar 30, 2026

GPT-5.4

OpenAIFrontierProprietary

Workhorse GPT-5 release with a dedicated Thinking mode; widely deployed across ChatGPT and the API.

MoEUndisc.400K ctxMar 5, 2026

Qwen3.5-397B

Alibaba (Qwen)FrontierOpen source

Native vision-language MoE supporting 201 languages with a 1M-token context.

MoE397B1M ctxFeb 20, 2026

Gemini 3.1 Pro

Google DeepMindFrontierProprietary

Generally available multimodal flagship with native tool use and a 2M-token context.

MoEUndisc.2M ctxFeb 19, 2026

Claude Opus 4.6

AnthropicFrontierProprietary

Introduced genuinely autonomous multi-file coding and stronger computer use.

—Undisc.200K ctxFeb 5, 2026

Kimi K2.5

Moonshot AIFrontierOpen weights

Open multimodal Kimi model that adds native visual agentic intelligence, instant and thinking modes, and agent-swarm workflows on top of the K2 base.

MoE1T256K ctxJan 27, 2026

GLM-4.6V

Z.ai (Zhipu AI)Open source

Open 106B-class vision-language model with native multimodal function calling for visual agents.

MoE106B128K ctxDec 8, 2025

Mistral Large 3

Mistral AIFrontierOpen weights

Mistral's largest open-weight MoE, aimed at frontier reasoning while remaining self-hostable.

MoE675B256K ctxDec 2, 2025

Gemma 3 27B

Google DeepMindOpen weights

Google's open multimodal model: 128k context, 140+ languages, runs on a single GPU.

Dense27B128K ctxSep 4, 2025

GLM-4.5V

Z.ai (Zhipu AI)Open source

Vision-language GLM based on GLM-4.5-Air, covering image, video, document, grounding, and GUI-agent tasks.

MoE106B— ctxAug 11, 2025

Grok 4

xAIProprietary

xAI's fourth-generation Grok line, preceding the later 4.x API updates already tracked in the catalog.

—Undisc.— ctxJul 9, 2025

ERNIE-4.5-VL-424B-A47B

MoE424B128K ctxJun 30, 2025

BaiduOpen source

Baidu's largest ERNIE 4.5 vision-language MoE, supporting text, image, and video inputs with thinking and non-thinking modes.

Kimi-VL-A3B-Thinking-2506

MoE16B128K ctxJun 21, 2025

Moonshot AIOpen source

Updated MIT-licensed Kimi-VL reasoning model with better multimodal reasoning, video understanding, high-resolution perception, and lower thinking-token use.

Claude Opus 4

—Undisc.200K ctxMay 22, 2025

First Claude 4 Opus model, positioned for long-running agentic and coding work before the 4.x point releases.

Kimi-Audio-7B-Instruct

Hybrid10B— ctxApr 25, 2025

Moonshot AIOpen source

Open audio foundation model for audio understanding, generation, speech recognition, audio QA, captioning, and speech conversation.

Kimi-VL-A3B-Instruct

MoE16B128K ctxApr 17, 2025

Moonshot AIOpen source

Efficient MIT-licensed vision-language MoE for OCR, image/video understanding, long documents, and OS-style agent tasks.

OpenAI o3

—Undisc.— ctxApr 16, 2025

Reasoning model released alongside o4-mini with tool use, image reasoning, and stronger agentic problem solving.

GPT-4.1

—Undisc.1M ctxApr 14, 2025

API model family focused on coding, instruction following, and one-million-token long-context work.

Llama 4 Maverick

Meta AIFrontierOpen weights

Meta's flagship open-weight MoE; highest MMLU among open models at release.

MoE400B1M ctxApr 5, 2025

Llama 4 Scout

MoE109B10M ctxApr 5, 2025

Meta AIOpen weights

Efficient open-weight MoE designed for very long context on modest hardware.

Qwen2.5-Omni-7B

Alibaba (Qwen)Open weights

Local omni-modal Qwen model that supports text, image, audio, video, and speech generation in a 7B package.

Dense7B— ctxMar 26, 2025

Gemini 2.5 Pro

—Undisc.1M ctxMar 25, 2025

Reasoning-focused Gemini 2.5 model that made thinking a core part of Google's flagship model line.

Mistral Small 3.1

Dense24B128K ctxMar 17, 2025

Mistral AIOpen source

Apache-licensed Small update adding vision and a 128K context window to the efficient 24B line.

Claude 3.7 Sonnet

—Undisc.200K ctxFeb 24, 2025

Anthropic's first hybrid-reasoning Sonnet. Shut down May 11, 2026 as the 4.x line matured.

Grok 3

—Undisc.— ctxFeb 17, 2025

xAIProprietary

xAI's third-generation model family, introduced with stronger reasoning, search, and coding modes.

Qwen2.5-VL-72B

Alibaba (Qwen)Open weights

Vision-language Qwen2.5 model for image, document, video, and agentic visual grounding tasks.

Dense72B128K ctxJan 26, 2025

Doubao-1.5-pro

ByteDance SeedProprietary

Doubao 1.5 Pro update positioned for stronger multimodal, reasoning, and agentic work in Volcano Engine.

—Undisc.— ctxJan 22, 2025

Kimi k1.5

—Undisc.— ctxJan 20, 2025

Moonshot AIProprietary

Moonshot's multimodal reinforcement-learning reasoning model, reported as matching OpenAI o1 on math, coding, and multimodal reasoning.

MiniMax-01

Hybrid456B4M ctxJan 15, 2025

MiniMaxOpen weights

Open MiniMax generation with MiniMax-Text-01 and MiniMax-VL-01 long-context models.

Step-2

—Undisc.— ctxDec 23, 2024

StepFunProprietary

Second-generation StepFun foundation model line with larger-scale multimodal and reasoning ambitions.

Gemini 2.0 Flash

—Undisc.1M ctxDec 11, 2024

First Gemini 2.0 release, built for native multimodal input/output, tool use, and agentic product integrations.

OpenAI o1

General release of OpenAI's o1 reasoning model with stronger deliberative reasoning and multimodal ChatGPT integration.

—Undisc.— ctxDec 5, 2024

Amazon Nova Pro

—Undisc.300K ctxDec 3, 2024

AmazonProprietary

AWS-native multimodal model with a 300k context; size and architecture undisclosed.

Amazon Nova Lite

—Undisc.300K ctxDec 3, 2024

AmazonProprietary

Lower-cost multimodal Nova understanding model for text, image, and video inputs.

Claude 3.5 Haiku

—Undisc.200K ctxOct 22, 2024

Fast, lower-cost Claude 3.5 model for latency-sensitive coding, tool-use, and customer-facing workloads.

Llama 3.2 90B Vision

Dense90B128K ctxSep 25, 2024

Meta AIOpen weights

First Llama family release with native vision models, alongside smaller edge-oriented 1B and 3B text models.

Molmo 72B

Allen Institute for AI (Ai2)Open weights

Open multimodal model family trained for strong image understanding, pointing, and visual grounding.

Dense72B— ctxSep 25, 2024

Pixtral 12B

Dense12B128K ctxSep 17, 2024

Mistral AIOpen source

Mistral's first open multimodal model, adding image understanding to a Mistral text backbone.

Grok-2

—Undisc.— ctxAug 13, 2024

xAIProprietary

Second-generation Grok release with Grok-2 and Grok-2 mini for chat, coding, reasoning, and image-enabled product experiences.

MiniCPM-V 2.6

OpenBMBOpen weights

8B vision-language model for local image, multi-image, OCR, and video understanding, with llama.cpp and Ollama support.

Dense8B— ctxAug 2, 2024

Claude 3.5 Sonnet

—Undisc.200K ctxJun 20, 2024

Major Sonnet upgrade that became Anthropic's default high-intelligence workhorse for coding, writing, and visual reasoning.

GPT-4o

—Undisc.128K ctxMay 13, 2024

The 2024 omni-modal model that defined a generation of assistants. Deprecated in Feb 2026 and fully retired across ChatGPT on April 3, 2026.

Falcon 2 11B

Technology Innovation InstituteOpen weights

Falcon 2 generation, including text and vision-language 11B models under a permissive TII license.

Dense11B8K ctxMay 13, 2024

Step-1V

—Undisc.— ctxApr 12, 2024

StepFunProprietary

StepFun's first major vision-language model, released after the Step-1 language model.

Claude 3 Opus

—Undisc.200K ctxMar 4, 2024

Highest-capability Claude 3 model, launched with Sonnet and Haiku and Anthropic's first major vision-capable Claude family.

Gemini 1.5 Pro

MoEUndisc.2M ctxFeb 15, 2024

Gemini generation that introduced production-scale long context, eventually expanding to a two-million-token window.

GLM-4

Z.ai (Zhipu AI)Proprietary

Zhipu's GLM-4 flagship generation, launched as the successor to ChatGLM3 with stronger tool use and multimodal variants.

—Undisc.128K ctxJan 16, 2024

Gemini 1.0 Ultra

—Undisc.32K ctxDec 6, 2023

Google's first natively multimodal Gemini flagship, since superseded by the 1.5/2/3 lines.

GPT-4 Turbo

—Undisc.128K ctxNov 6, 2023

Lower-cost GPT-4 generation with a 128K context window, introduced at OpenAI DevDay.

ERNIE 4.0

—Undisc.— ctxOct 17, 2023

BaiduProprietary

Baidu's fourth-generation ERNIE flagship, announced with stronger understanding, generation, reasoning, and memory.

LLaVA 1.5 13B

Hybrid13B— ctxSep 30, 2023

LLaVAOpen weights

Open vision-language assistant and one of the most widely run early local multimodal models.

EXAONE 2.0

LG AI ResearchProprietary

Second EXAONE generation, improving bilingual Korean-English performance and enterprise deployment options.

—Undisc.— ctxJul 19, 2023

GPT-4

—Undisc.8K ctxMar 14, 2023

The model that brought reliable multi-step reasoning to the mainstream; size never disclosed.

EXAONE 1.0