Compare API pricing across 43+ LLM models. Estimate your monthly costs for Gemini 3.1, Claude Opus 4.7, GPT-5.4, Gemma 4, GLM-5.1, Canva AI 2.0, DeepSeek, Llama 4 and more.
| Model | Input $/1M | Output $/1M | Total Cost ▼ | Context | Quality |
|---|
AI APIs charge per token processed. Input tokens (your prompt) and output tokens (the response) are priced separately. Most providers price per 1 million tokens. Your total cost = (input tokens × input price) + (output tokens × output price) × number of requests.
A token is roughly 3/4 of a word in English. 1,000 tokens ≈ 750 words. Code tends to use more tokens per character than natural language. Different models use different tokenizers, so exact counts vary slightly.
As of April 2026, DeepSeek V3 and Google Gemini 2.0 Flash offer the lowest per-token pricing. However, the cheapest model may not be the best value — consider quality, speed, and context window size for your use case.
Claude Sonnet 4 is generally cheaper than GPT-4o for most workloads, especially for output-heavy tasks. Claude Opus 4 costs more than GPT-4o. Use the calculator above to compare with your specific usage pattern.
Key strategies: 1) Use prompt caching to avoid re-processing repeated context. 2) Choose the right model tier — use smaller models for simple tasks. 3) Optimize prompts to reduce input tokens. 4) Batch requests where possible. 5) Use cached/discounted input pricing when available.
Prompt caching lets you reuse previously processed input tokens at a discount (typically 50-90% off). Anthropic, OpenAI, and Google all offer caching. It's especially valuable for RAG applications or system prompts that repeat across requests.
Gemma 4 is Google's open-weight model family released April 2026, with 31B and 26B MoE variants. Since it's open-source, you can self-host it for free (just compute costs). Via API providers like Google AI Studio, the 31B model costs roughly $0.15/M input and $0.60/M output tokens — significantly cheaper than Gemini 2.5 Pro. The 26B MoE variant is even cheaper at $0.06/M input. Quality is lower than Gemini Pro models, but it's ideal for edge/local deployment or budget-constrained applications.
Gemini 3.1 Pro, Google's latest flagship released April 2026, costs $2.00 per million input tokens and $12.00 per million output tokens. It features a massive 2M token context window, real-time voice and vision processing, and strong multi-task reasoning. The Flash-Lite variant starts at just $0.25/M input — ideal for high-volume, cost-sensitive applications.
Gemini 3.1 Pro ($2/M input, $12/M output) is cheaper than Claude Opus 4.7 ($5/$25) — roughly 2.5x cheaper on input and output. Gemini 3.1 offers a 2M context window vs Opus's 1M. Both excel at complex reasoning. Claude Opus 4.7 leads on autonomous agent tasks and self-verification; Gemini 3.1 Pro leads on multimodal real-time processing. For cost-sensitive workloads, Gemini 3.1 Pro delivers better value.
Claude Opus 4.7, the latest Anthropic flagship released April 16, 2026, costs $5 per million input tokens and $25 per million output tokens — same as Opus 4.6. It supports 1M+ token context with prompt caching at $0.50/M for cache hits (up to 90% savings). Batch API offers 50% off. Opus 4.7 adds improved vision (3x image resolution), better self-verification, and stronger performance on complex reasoning and autonomous coding tasks.
Claude Opus 4.7 ($5/$25 per M tokens) and GPT-4.1 ($2/$8) are both top coding models in April 2026. Opus 4.7 scores higher on complex reasoning, autonomous agent tasks, and self-verification, while GPT-4.1 is cheaper and has a 1M context window. For heavy coding workloads, GPT-4.1 costs ~60% less; for hard reasoning and agentic tasks, Opus 4.7 delivers superior results.