Models

One platform. Every frontier model.

Pick the right brain for the job. Route the easy stuff to a fast, cheap model and reserve the expensive reasoners for the work that actually needs them.

100+Models available

6Providers

1M+Tokens of context

How to read the cost meter

$ = cheap. $$$$$ = expensive.

The dollar-sign meter on each card ranks every listed model against the others by output-token cost. Five $ marks the priciest on this page; one $ marks the cheapest. Hover the card to see the actual per-million-token rates.

$$$$$Cheapest$$$$$Low cost$$$$$Mid$$$$$Premium$$$$$Most expensive

Featured catalog

The headline models

Twelve flagships across the providers we support today. The full catalog is bigger — sign in to see specialty models, regional variants, and the latest previews.

Claude Opus 4.7

Anthropic

$$$$$Premium

Most capable Claude with 1M context and adaptive thinking. Excels at complex reasoning, agentic coding, planning, and debugging.

Context1M

FamilyClaude 4 (Opus)

Input$5/M

Output$25/M

ReasoningVisionToolsPrompt cache

Claude Sonnet 4.6

Anthropic

$$$$$Premium

Best speed-to-intelligence ratio with 1M context and extended thinking. The default for most production workloads.

Context1M

FamilyClaude 4 (Sonnet)

Input$3/M

Output$15/M

ReasoningVisionToolsPrompt cache

Claude Haiku 4.5

Anthropic

$$$$$Mid

Fast and cost-effective — near-Sonnet quality at roughly one-third the price. Built for latency-sensitive, high-volume work.

Context200K

FamilyClaude 4 (Haiku)

Input$1/M

Output$5/M

ReasoningVisionToolsPrompt cache

GPT-5.5 Pro

OpenAI

$$$$$Most expensive

OpenAI's smartest and most precise — extra compute to think harder on the hardest tasks. Available via Responses, Chat Completions, and Batch.

Context1M

FamilyGPT-5 series

Input$30/M

Output$180/M

ReasoningVisionToolsPrompt cache

GPT-5.5

OpenAI

$$$$$Most expensive

OpenAI's flagship for real work — excels at coding, debugging, research, data analysis, and document creation. 1M+ context.

Context1M

FamilyGPT-5 series

Input$5/M

Output$30/M

ReasoningVisionToolsPrompt cache

GPT-5 mini

OpenAI

$$$$$Low cost

Cost-efficient GPT-5 variant for high-volume, low-latency workloads. 400K context with reasoning.

Context400K

FamilyGPT-5 series

Input$0.25/M

Output$2/M

ReasoningVisionToolsPrompt cache

GPT-4.1

OpenAI

$$$$$Mid

1M context flagship for coding and instruction following. Reliable workhorse without reasoning overhead.

Context1M

FamilyGPT-4.1 series

Input$2/M

Output$8/M

VisionToolsPrompt cache

Gemini 3.5 Flash

Google

$$$$$Premium

Google's fast multimodal — text, image, video, and audio input with adaptive thinking. 1M context on Vertex AI.

Context1M

FamilyGemini 3.5

Input$2.08/M

Output$12.47/M

ReasoningVisionVideoAudioToolsPrompt cache

Grok 4.3

xAI

$$$$$Low cost

xAI's flagship Grok 4.3. Reasoning always-on with configurable effort. Strong agentic performance, 1M context, text + image.

Context1M

FamilyGrok 4

Input$1.25/M

Output$2.50/M

ReasoningVisionToolsPrompt cache

DeepSeek V4-Pro

DeepSeek

$$$$$Cheapest

Premium MoE (1.6T total / 49B active) with thinking + non-thinking modes. 1M context, 384K max output, FIM completion.

Context1M

FamilyDeepSeek V4

Input$0.43/M

Output$0.87/M

ReasoningToolsPrompt cache

DeepSeek V4-Flash

DeepSeek

$$$$$Cheapest

Lightweight flagship MoE (284B / 13B active). Thinking + non-thinking modes, 1M context, 384K max output — at a fraction of the cost.

Context1M

FamilyDeepSeek V4

Input$0.14/M

Output$0.28/M

ReasoningToolsPrompt cache

Mistral Large 3

Mistral

$$$$$Low cost

Open-weight 675B MoE (41B active) under Apache 2.0. General-purpose multimodal with vision input and 256K context.

Context256K

FamilyMistral Large

Input$0.50/M

Output$1.50/M

VisionTools

How we think about cost

The right model for the right task

You don't need the most expensive model for every step. AI Expedite lets you route by stage — drafting on cheap models, reasoning on premium ones — and shows the true cost of every execution.

$$$$$High volume

Use cheap models for drafts, classifications, fan-out, and boilerplate. DeepSeek V4-Flash and GPT-5 mini stretch a credit the furthest.

$$$$$General workloads

The default tier — Sonnet, Haiku, GPT-4.1, Gemini Flash. Good quality, sane prices, multimodal where it matters.

$$$$$Hard reasoning

Opus, GPT-5.5, and GPT-5.5 Pro for the calls that actually matter — gnarly debugging, architectural decisions, anything you'd ask a senior engineer to review.

Mix and match across providers

No vendor lock-in. Switch a step from Claude to Gemini to GPT-5 to DeepSeek without rewriting your agent.

Start free

Pricing reflects vendor list rates as of May 2026 in USD per 1M tokens. Catalog updated continuously as new models ship.