Models

One platform. Every frontier model.

Pick the right brain for the job. Route the easy stuff to a fast, cheap model and reserve the expensive reasoners for the work that actually needs them.

100+Models available
6Providers
1M+Tokens of context

How to read the cost meter

$ = cheap. $$$$$ = expensive.

The dollar-sign meter on each card ranks every listed model against the others by output-token cost. Five $ marks the priciest on this page; one $ marks the cheapest. Hover the card to see the actual per-million-token rates.

$$$$$Cheapest$$$$$Low cost$$$$$Mid$$$$$Premium$$$$$Most expensive

Featured catalog

The headline models

Twelve flagships across the providers we support today. The full catalog is bigger — sign in to see specialty models, regional variants, and the latest previews.

Claude Opus 4.7

Anthropic
$$$$$Premium

Most capable Claude with 1M context and adaptive thinking. Excels at complex reasoning, agentic coding, planning, and debugging.

Context1M
FamilyClaude 4 (Opus)
Input$5/M
Output$25/M
ReasoningVisionToolsPrompt cache

Claude Sonnet 4.6

Anthropic
$$$$$Premium

Best speed-to-intelligence ratio with 1M context and extended thinking. The default for most production workloads.

Context1M
FamilyClaude 4 (Sonnet)
Input$3/M
Output$15/M
ReasoningVisionToolsPrompt cache

Claude Haiku 4.5

Anthropic
$$$$$Mid

Fast and cost-effective — near-Sonnet quality at roughly one-third the price. Built for latency-sensitive, high-volume work.

Context200K
FamilyClaude 4 (Haiku)
Input$1/M
Output$5/M
ReasoningVisionToolsPrompt cache

GPT-5.5 Pro

OpenAI
$$$$$Most expensive

OpenAI's smartest and most precise — extra compute to think harder on the hardest tasks. Available via Responses, Chat Completions, and Batch.

Context1M
FamilyGPT-5 series
Input$30/M
Output$180/M
ReasoningVisionToolsPrompt cache

GPT-5.5

OpenAI
$$$$$Most expensive

OpenAI's flagship for real work — excels at coding, debugging, research, data analysis, and document creation. 1M+ context.

Context1M
FamilyGPT-5 series
Input$5/M
Output$30/M
ReasoningVisionToolsPrompt cache

GPT-5 mini

OpenAI
$$$$$Low cost

Cost-efficient GPT-5 variant for high-volume, low-latency workloads. 400K context with reasoning.

Context400K
FamilyGPT-5 series
Input$0.25/M
Output$2/M
ReasoningVisionToolsPrompt cache

GPT-4.1

OpenAI
$$$$$Mid

1M context flagship for coding and instruction following. Reliable workhorse without reasoning overhead.

Context1M
FamilyGPT-4.1 series
Input$2/M
Output$8/M
VisionToolsPrompt cache

Gemini 3.5 Flash

Google
$$$$$Premium

Google's fast multimodal — text, image, video, and audio input with adaptive thinking. 1M context on Vertex AI.

Context1M
FamilyGemini 3.5
Input$2.08/M
Output$12.47/M
ReasoningVisionVideoAudioToolsPrompt cache

Grok 4.3

xAI
$$$$$Low cost

xAI's flagship Grok 4.3. Reasoning always-on with configurable effort. Strong agentic performance, 1M context, text + image.

Context1M
FamilyGrok 4
Input$1.25/M
Output$2.50/M
ReasoningVisionToolsPrompt cache

DeepSeek V4-Pro

DeepSeek
$$$$$Cheapest

Premium MoE (1.6T total / 49B active) with thinking + non-thinking modes. 1M context, 384K max output, FIM completion.

Context1M
FamilyDeepSeek V4
Input$0.43/M
Output$0.87/M
ReasoningToolsPrompt cache

DeepSeek V4-Flash

DeepSeek
$$$$$Cheapest

Lightweight flagship MoE (284B / 13B active). Thinking + non-thinking modes, 1M context, 384K max output — at a fraction of the cost.

Context1M
FamilyDeepSeek V4
Input$0.14/M
Output$0.28/M
ReasoningToolsPrompt cache

Mistral Large 3

Mistral
$$$$$Low cost

Open-weight 675B MoE (41B active) under Apache 2.0. General-purpose multimodal with vision input and 256K context.

Context256K
FamilyMistral Large
Input$0.50/M
Output$1.50/M
VisionTools

How we think about cost

The right model for the right task

You don't need the most expensive model for every step. AI Expedite lets you route by stage — drafting on cheap models, reasoning on premium ones — and shows the true cost of every execution.

$$$$$High volume

Use cheap models for drafts, classifications, fan-out, and boilerplate. DeepSeek V4-Flash and GPT-5 mini stretch a credit the furthest.

$$$$$General workloads

The default tier — Sonnet, Haiku, GPT-4.1, Gemini Flash. Good quality, sane prices, multimodal where it matters.

$$$$$Hard reasoning

Opus, GPT-5.5, and GPT-5.5 Pro for the calls that actually matter — gnarly debugging, architectural decisions, anything you'd ask a senior engineer to review.

Mix and match across providers

No vendor lock-in. Switch a step from Claude to Gemini to GPT-5 to DeepSeek without rewriting your agent.

Start free

Pricing reflects vendor list rates as of May 2026 in USD per 1M tokens. Catalog updated continuously as new models ship.