Claude Opus 4.7
AnthropicMost capable Claude with 1M context and adaptive thinking. Excels at complex reasoning, agentic coding, planning, and debugging.
Models
Pick the right brain for the job. Route the easy stuff to a fast, cheap model and reserve the expensive reasoners for the work that actually needs them.
How to read the cost meter
The dollar-sign meter on each card ranks every listed model against the others by output-token cost. Five $ marks the priciest on this page; one $ marks the cheapest. Hover the card to see the actual per-million-token rates.
Featured catalog
Twelve flagships across the providers we support today. The full catalog is bigger — sign in to see specialty models, regional variants, and the latest previews.
Most capable Claude with 1M context and adaptive thinking. Excels at complex reasoning, agentic coding, planning, and debugging.
Best speed-to-intelligence ratio with 1M context and extended thinking. The default for most production workloads.
Fast and cost-effective — near-Sonnet quality at roughly one-third the price. Built for latency-sensitive, high-volume work.
OpenAI's smartest and most precise — extra compute to think harder on the hardest tasks. Available via Responses, Chat Completions, and Batch.
OpenAI's flagship for real work — excels at coding, debugging, research, data analysis, and document creation. 1M+ context.
Cost-efficient GPT-5 variant for high-volume, low-latency workloads. 400K context with reasoning.
1M context flagship for coding and instruction following. Reliable workhorse without reasoning overhead.
Google's fast multimodal — text, image, video, and audio input with adaptive thinking. 1M context on Vertex AI.
xAI's flagship Grok 4.3. Reasoning always-on with configurable effort. Strong agentic performance, 1M context, text + image.
Premium MoE (1.6T total / 49B active) with thinking + non-thinking modes. 1M context, 384K max output, FIM completion.
Lightweight flagship MoE (284B / 13B active). Thinking + non-thinking modes, 1M context, 384K max output — at a fraction of the cost.
Open-weight 675B MoE (41B active) under Apache 2.0. General-purpose multimodal with vision input and 256K context.
How we think about cost
You don't need the most expensive model for every step. AI Expedite lets you route by stage — drafting on cheap models, reasoning on premium ones — and shows the true cost of every execution.
Use cheap models for drafts, classifications, fan-out, and boilerplate. DeepSeek V4-Flash and GPT-5 mini stretch a credit the furthest.
The default tier — Sonnet, Haiku, GPT-4.1, Gemini Flash. Good quality, sane prices, multimodal where it matters.
Opus, GPT-5.5, and GPT-5.5 Pro for the calls that actually matter — gnarly debugging, architectural decisions, anything you'd ask a senior engineer to review.
No vendor lock-in. Switch a step from Claude to Gemini to GPT-5 to DeepSeek without rewriting your agent.
Start freePricing reflects vendor list rates as of May 2026 in USD per 1M tokens. Catalog updated continuously as new models ship.