← Back to blog

The True Cost of LLM APIs in 2026: 40+ Models Compared

· Michael · 10 min read

We maintain a pricing table for 40+ models across 16 providers, compiled into Stockyard’s binary for real-time cost tracking. Here’s every model we track, what it costs, and where the real value is.

All prices are per 1 million tokens. Output tokens are always more expensive than input because the model does more work generating them.

The Full Pricing Table

Frontier Models (Best Quality)

ModelProviderInput/1MOutput/1M
o1OpenAI$15.00$60.00
claude-opus-4-6Anthropic$15.00$75.00
o3OpenAI$10.00$40.00
gpt-4-turboOpenAI$10.00$30.00
grok-3xAI$3.00$15.00
claude-sonnet-4-5Anthropic$3.00$15.00
sonar-proPerplexity$3.00$15.00

The frontier tier runs $3–75 per million output tokens. Claude Opus 4.6 is the most expensive output at $75/M. For most production workloads, you don’t need frontier models — the mid-tier has caught up dramatically.

Mid-Tier (Best Value for Production)

ModelProviderInput/1MOutput/1M
gpt-4oOpenAI$2.50$10.00
command-r-plusCohere$2.50$10.00
gemini-2.5-proGoogle$1.25$10.00
gpt-4.1OpenAI$2.00$8.00
mistral-largeMistral$2.00$6.00
grok-2xAI$2.00$10.00
gemini-2.0-proGoogle$1.25$5.00

Gemini 2.5 Pro is the standout here: $1.25 input is half the price of GPT-4o, with competitive quality. Mistral Large is the best European option at $2/$6.

Budget Tier (High Volume / Low Cost)

ModelProviderInput/1MOutput/1M
gpt-4o-miniOpenAI$0.15$0.60
gemini-2.5-flashGoogle$0.15$0.60
deepseek-chatDeepSeek$0.14$0.28
command-rCohere$0.15$0.60
gpt-4.1-nanoOpenAI$0.10$0.40
gemini-2.0-flashGoogle$0.10$0.40
gemini-1.5-flashGoogle$0.075$0.30
llama-3.1-8bGroq$0.05$0.08
DeepSeek Chat at $0.14/$0.28 is the cheapest model with near-GPT-4o quality. If you’re optimizing for cost and don’t need real-time speed, it’s hard to beat. Groq’s Llama 3.1 8B at $0.05/$0.08 is the absolute floor — but you’re trading quality for price.

Reasoning Models (Thinking Tokens)

ModelProviderInput/1MOutput/1M
o1OpenAI$15.00$60.00
o3OpenAI$10.00$40.00
o3-miniOpenAI$1.10$4.40
o4-miniOpenAI$1.10$4.40
deepseek-reasonerDeepSeek$0.55$2.19

Reasoning models use “thinking tokens” that count as output. A complex reasoning query might generate 5,000+ thinking tokens before producing a 200-token answer. That makes the effective cost 10–20x higher than the sticker price suggests. DeepSeek’s Reasoner at $0.55/$2.19 is dramatically cheaper than OpenAI’s o-series for comparable reasoning tasks.

What 1 Million Tokens Actually Looks Like

A million tokens is roughly:

— 750,000 English words (about 10 novels)
— 4,000 typical API calls (250 tokens average)
— 2,000 customer support conversations
— 500 long-form content generations

For a SaaS product doing 10,000 API calls per day at 500 tokens per call, you’re using about 150M tokens per month. On GPT-4o that’s about $375/month input + $1,500/month output = $1,875/month. Switch to GPT-4o-mini and it drops to $22.50 + $90 = $112.50/month — a 94% reduction for workloads where mini-quality is sufficient.

The Hidden Costs Nobody Talks About

Retries. When a provider returns a 429 or 500, you retry. Every retry is a full re-send of the input tokens. At 5% error rates with one retry, your actual input cost is 5% higher than the sticker price. At 10% error rates (common during peak hours), it’s 10% higher.

Cache misses. The same question asked 100 times costs 100x without caching. A semantic cache that recognizes “What’s the weather?” and “How’s the weather today?” as the same query can cut your bill by 30–60% depending on your traffic patterns.

Prompt bloat. System prompts accumulate cruft. A 2,000-token system prompt sent with every 200-token user message means 90% of your input spend is the system prompt. Token trimming and context packing can compress this significantly.

No output caps. Ask GPT-4o to “explain quantum computing” without a max_tokens limit and you might get 4,000 tokens back. Set max_tokens: 500 and you get 500. That’s an 8x difference in output cost for a single request.

How to Cut Your LLM Bill by 50–80%

Based on the traffic patterns we see through Stockyard, here’s what actually moves the needle:

1. Cache aggressively. Even a simple exact-match cache cuts costs 20–40%. Semantic caching (fuzzy matching) gets you to 40–60%. This is the single highest-impact optimization.

2. Use the cheapest model that works. Route simple queries (classification, extraction, formatting) to GPT-4o-mini or Gemini Flash. Only send complex reasoning to frontier models. Stockyard’s tierdrop module does this automatically based on query complexity.

3. Set output caps. Always set max_tokens. For structured output (JSON, classifications), 500 tokens is usually plenty. For summarization, 1,000. This alone can reduce output costs by 50%.

4. Trim your prompts. Audit your system prompts monthly. Remove examples that aren’t improving quality. Compress instructions. Every token you remove from a system prompt saves money on every single request.

5. Failover to cheaper providers. If OpenAI returns a 429, don’t retry on OpenAI — failover to Anthropic or Groq. You avoid the retry cost and get faster recovery. Stockyard’s failover module handles this automatically across all 16 providers.

The Math: What This Costs in Practice

Scenario: SaaS product, 10K requests/day, 500 tokens avg

Without optimization (GPT-4o): ~$1,875/month
With caching (40% hit rate): ~$1,125/month
With model routing (60% to mini): ~$540/month
With output caps (avg 300 output): ~$380/month

Total savings: ~80% ($1,875 → $380)

These aren’t theoretical numbers. They’re the kind of reductions we see when customers enable Stockyard’s cost control modules: cache, tierdrop, outputcap, tokentrim, and costwarn.

This Data Is in the Binary

Stockyard’s cost tracking uses this pricing table to calculate the exact cost of every request in real time. The x-stockyard-cost response header shows you exactly what each call cost. The Lookout dashboard aggregates this into per-model, per-provider, and per-customer cost attribution.

The pricing table is updated with each release. You can also query it via the API:

GET /api/proxy/pricing returns the full table. GET /api/observe/costs shows your actual spend.

Try it: curl -sSL stockyard.dev/install.sh | sh

Stockyard. Wrangle your Stack.
Cut your LLM bill with Stockyard
curl -sSL stockyard.dev/install.sh | sh
GitHub → Pricing → Docs →

See how Stockyard compares

Stockyard vs LiteLLM · Stockyard vs Helicone · Stockyard vs Portkey

Explore: Cost tracking · Proxy-only mode · Install guide · Why SQLite