Three quick choices without scrolling: one for the hardest work, one for cheap volume and one for teams that do not want to depend only on a closed API.
Claude Fable 5
top pick
Anthropic
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · company knowledge
✗ Skip for: mass volume · strict latency
Claude Fable 5 has IQ 64.9 and input $10/M in the sources. Consider it for coding, AI agents; for mass-volume, real-time-latency, run a second benchmark before rollout.
premium, when mistakes hurt · market frontier · verified by external data
Signals:
market frontier
premium, when mistakes hurt
codingAI agents
When to pick it →
GPT-5.5
top pick
OpenAI
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · document extraction
✗ Skip for: self-hosted stack · strict latency
GPT-5.5 has IQ 60.2 and input $5/M and DeepSWE pass@1 70.0 % in the sources. Consider it for coding, AI agents; for self-hosted stack, real-time-latency, run a second benchmark before rollout.
mid-budget · market frontier · verified by external data
Signals:
market frontier
mid-budget
codingAI agents
When to pick it →
DeepSeek V4 Flash
top pick
DeepSeek
My read
batch pick backed by current AA.
✓ Use for: large batches · fast replies · self-hosted stack
✗ Skip for: deep frontier reasoning · top coding
DeepSeek V4 Flash has IQ 46.5 and input $0.14/M in the sources. Consider it for large batches, fast replies; for deep-frontier-reasoning, top-coding, run a second benchmark before rollout.
open / self-run · specialist · verified by external data
Signals:
specialist
open / self-run
large batchesfast replies
When to pick it →
This is the broader catalogue. Filters narrow it by situation; details keep hard numbers and sources out of the first read.
Claude Fable 5
Anthropic
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · company knowledge
✗ Skip for: mass volume · strict latency
Claude Fable 5 has IQ 64.9 and input $10/M in the sources. Consider it for coding, AI agents; for mass-volume, real-time-latency, run a second benchmark before rollout.
premium, when mistakes hurt · market frontier · verified by external data
Signals:
market frontier
premium, when mistakes hurt
codingAI agents
When to pick it →
Claude Opus 4.8
Anthropic
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · company knowledge
✗ Skip for: mass volume · strict latency
Claude Opus 4.8 has IQ 61.4 and input $5/M and DeepSWE pass@1 58.2 % in the sources. Consider it for coding, AI agents; for mass-volume, real-time-latency, run a second benchmark before rollout.
mid-budget · market frontier · verified by external data
Signals:
market frontier
mid-budget
codingAI agents
When to pick it →
GPT-5.5
OpenAI
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · document extraction
✗ Skip for: self-hosted stack · strict latency
GPT-5.5 has IQ 60.2 and input $5/M and DeepSWE pass@1 70.0 % in the sources. Consider it for coding, AI agents; for self-hosted stack, real-time-latency, run a second benchmark before rollout.
mid-budget · market frontier · verified by external data
Signals:
market frontier
mid-budget
codingAI agents
When to pick it →
Gemini 3.1 Pro Preview
Google
My read
rag pick backed by current AA.
✓ Use for: company knowledge · vision and multimodal · multilingual content
✗ Skip for: strict latency · self-hosted stack
Gemini 3.1 Pro Preview has IQ 57.2 and input $2/M and DeepSWE pass@1 9.7 % in the sources. Consider it for company knowledge, vision and multimodal; for real-time-latency, self-hosted stack, run a second benchmark before rollout.
mid-budget · market frontier · verified by external data
Signals:
market frontier
mid-budget
company knowledgevision and multimodal
When to pick it →
Qwen3.7 Max
Alibaba
My read
multilingual pick backed by current AA.
✓ Use for: multilingual content · coding · large batches
✗ Skip for: self-hosted stack · premium agents
Qwen3.7 Max has IQ 56.6 and input $2.5/M and DeepSWE pass@1 17.7 % in the sources. Consider it for multilingual content, coding; for self-hosted stack, premium-agents, run a second benchmark before rollout.
mid-budget · market frontier · verified by external data
Signals:
market frontier
mid-budget
multilingual contentcoding
When to pick it →
Gemini 3.5 Flash
Google
My read
batch pick backed by current AA.
✓ Use for: large batches · company knowledge · fast replies
✗ Skip for: hard coding work · strict latency
Gemini 3.5 Flash has IQ 55.3 and input $1.5/M and DeepSWE pass@1 28.3 % in the sources. Consider it for large batches, company knowledge; for deep-coding, real-time-latency, run a second benchmark before rollout.
mid-budget · specialist · verified by external data
Signals:
specialist
mid-budget
large batchescompany knowledge
When to pick it →
Kimi K2.6
Moonshot
My read
batch pick backed by current AA.
✓ Use for: large batches · coding · company knowledge
✗ Skip for: company controls and audit · reliable tool use
Kimi K2.6 has IQ 53.9 and input $0.95/M and DeepSWE pass@1 23.9 % in the sources. Consider it for large batches, coding; for enterprise-governance, tool-use, run a second benchmark before rollout.
cheap at volume · specialist · verified by external data
Signals:
specialist
cheap at volume
large batchescoding
When to pick it →
Claude Sonnet 4.6
Anthropic
My read
coding pick backed by current AA.
✓ Use for: coding · AI agents · company knowledge
✗ Skip for: self-hosted stack · mass volume
Claude Sonnet 4.6 has IQ 51.7 and input $3/M and DeepSWE pass@1 31.8 % in the sources. Consider it for coding, AI agents; for self-hosted stack, mass-volume, run a second benchmark before rollout.
mid-budget · specialist · verified by external data
Signals:
specialist
mid-budget
codingAI agents
When to pick it →
GLM-5.1
Z.AI/Zhipu
My read
self-hosted pick backed by current AA.
✓ Use for: self-hosted stack · sensitive deployments · large batches
✗ Skip for: premium agents · top coding
GLM-5.1 has IQ 51.4 and input $1.4/M and DeepSWE pass@1 17.5 % in the sources. Consider it for self-hosted stack, sensitive deployments; for premium-agents, top-coding, run a second benchmark before rollout.
open / self-run · specialist · verified by external data
Signals:
specialist
open / self-run
self-hosted stacksensitive deployments
When to pick it →
DeepSeek V4 Flash
DeepSeek
My read
batch pick backed by current AA.
✓ Use for: large batches · fast replies · self-hosted stack
✗ Skip for: deep frontier reasoning · top coding
DeepSeek V4 Flash has IQ 46.5 and input $0.14/M in the sources. Consider it for large batches, fast replies; for deep-frontier-reasoning, top-coding, run a second benchmark before rollout.
open / self-run · specialist · verified by external data
Signals:
specialist
open / self-run
large batchesfast replies
When to pick it →
DeepSeek V4 Pro
DeepSeek
My read
batch pick backed by current AA.
✓ Use for: large batches · coding · self-hosted stack
✗ Skip for: company controls and audit · premium agents
DeepSeek V4 Pro has IQ 51.5 and input $0.435/M and DeepSWE pass@1 7.5 % in the sources. Consider it for large batches, coding; for enterprise-governance, premium-agents, run a second benchmark before rollout.
open / self-run · specialist · verified by external data
Signals:
specialist
open / self-run
large batchescoding
When to pick it →
Command A+
Cohere
My read
rag pick backed by current AA.
✓ Use for: company knowledge · document extraction · sensitive deployments
✗ Skip for: top coding · deep frontier reasoning
Command A Plus has IQ 37.2 and input $0/M in the sources. Consider it for company knowledge, document extraction; for top-coding, deep-frontier-reasoning, run a second benchmark before rollout.
cheap at volume · specialist · verified by external data
Signals:
specialist
cheap at volume
company knowledgedocument extraction
When to pick it →
Grok 4.3
xAI
My read
coding pick backed by current AA.
✓ Use for: coding · fast replies · document extraction
✗ Skip for: self-hosted stack · sensitive deployments
Grok 4.3 has IQ 53.2 and input $1.25/M in the sources. Consider it for coding, fast replies; for self-hosted stack, sensitive deployments, run a second benchmark before rollout.
mid-budget · specialist · verified by external data
Signals:
specialist
mid-budget
codingfast replies
When to pick it →
Llama 4 Maverick
Meta
My read
self-hosted pick backed by current AA.
✓ Use for: self-hosted stack · sensitive deployments · company knowledge
✗ Skip for: managed API comfort · deep frontier reasoning
Llama 4 Maverick has IQ 18.4 and input $0.35/M in the sources. Consider it for self-hosted stack, sensitive deployments; for managed-api-comfort, deep-frontier-reasoning, run a second benchmark before rollout.
open / self-run · specialist · verified by external data
Signals:
specialist
open / self-run
self-hosted stacksensitive deployments
When to pick it →
Mistral Medium 3.5
Mistral AI
My read
compliance pick backed by current AA.
✓ Use for: sensitive deployments · company knowledge · document extraction
✗ Skip for: deep frontier reasoning · top coding
Mistral Medium 3.5 has IQ 39.2 and input $1.5/M in the sources. Consider it for sensitive deployments, company knowledge; for deep-frontier-reasoning, top-coding, run a second benchmark before rollout.
mid-budget · specialist · verified by external data
Signals:
specialist
mid-budget
sensitive deploymentscompany knowledge
When to pick it →
Mistral Large 3
Mistral AI
My read
compliance pick backed by current AA.
✓ Use for: sensitive deployments · document extraction · large batches
✗ Skip for: deep frontier reasoning · AI agents
Mistral Large 3 has IQ 22.8 and input $0.5/M in the sources. Consider it for sensitive deployments, document extraction; for deep-frontier-reasoning, AI agents, run a second benchmark before rollout.
cheap at volume · specialist · verified by external data
Signals:
specialist
cheap at volume
sensitive deploymentsdocument extraction
When to pick it →