We maintain a pricing table for 40+ models across 16 providers, compiled into Stockyard’s binary for real-time cost tracking. Here’s every model we track, what it costs, and where the real value is.
All prices are per 1 million tokens. Output tokens are always more expensive than input because the model does more work generating them.
| Model | Provider | Input/1M | Output/1M |
|---|---|---|---|
| o1 | OpenAI | $15.00 | $60.00 |
| claude-opus-4-6 | Anthropic | $15.00 | $75.00 |
| o3 | OpenAI | $10.00 | $40.00 |
| gpt-4-turbo | OpenAI | $10.00 | $30.00 |
| grok-3 | xAI | $3.00 | $15.00 |
| claude-sonnet-4-5 | Anthropic | $3.00 | $15.00 |
| sonar-pro | Perplexity | $3.00 | $15.00 |
The frontier tier runs $3–75 per million output tokens. Claude Opus 4.6 is the most expensive output at $75/M. For most production workloads, you don’t need frontier models — the mid-tier has caught up dramatically.
| Model | Provider | Input/1M | Output/1M |
|---|---|---|---|
| gpt-4o | OpenAI | $2.50 | $10.00 |
| command-r-plus | Cohere | $2.50 | $10.00 |
| gemini-2.5-pro | $1.25 | $10.00 | |
| gpt-4.1 | OpenAI | $2.00 | $8.00 |
| mistral-large | Mistral | $2.00 | $6.00 |
| grok-2 | xAI | $2.00 | $10.00 |
| gemini-2.0-pro | $1.25 | $5.00 |
Gemini 2.5 Pro is the standout here: $1.25 input is half the price of GPT-4o, with competitive quality. Mistral Large is the best European option at $2/$6.
| Model | Provider | Input/1M | Output/1M |
|---|---|---|---|
| gpt-4o-mini | OpenAI | $0.15 | $0.60 |
| gemini-2.5-flash | $0.15 | $0.60 | |
| deepseek-chat | DeepSeek | $0.14 | $0.28 |
| command-r | Cohere | $0.15 | $0.60 |
| gpt-4.1-nano | OpenAI | $0.10 | $0.40 |
| gemini-2.0-flash | $0.10 | $0.40 | |
| gemini-1.5-flash | $0.075 | $0.30 | |
| llama-3.1-8b | Groq | $0.05 | $0.08 |
| Model | Provider | Input/1M | Output/1M |
|---|---|---|---|
| o1 | OpenAI | $15.00 | $60.00 |
| o3 | OpenAI | $10.00 | $40.00 |
| o3-mini | OpenAI | $1.10 | $4.40 |
| o4-mini | OpenAI | $1.10 | $4.40 |
| deepseek-reasoner | DeepSeek | $0.55 | $2.19 |
Reasoning models use “thinking tokens” that count as output. A complex reasoning query might generate 5,000+ thinking tokens before producing a 200-token answer. That makes the effective cost 10–20x higher than the sticker price suggests. DeepSeek’s Reasoner at $0.55/$2.19 is dramatically cheaper than OpenAI’s o-series for comparable reasoning tasks.
A million tokens is roughly:
— 750,000 English words (about 10 novels)
— 4,000 typical API calls (250 tokens average)
— 2,000 customer support conversations
— 500 long-form content generations
For a SaaS product doing 10,000 API calls per day at 500 tokens per call, you’re using about 150M tokens per month. On GPT-4o that’s about $375/month input + $1,500/month output = $1,875/month. Switch to GPT-4o-mini and it drops to $22.50 + $90 = $112.50/month — a 94% reduction for workloads where mini-quality is sufficient.
Retries. When a provider returns a 429 or 500, you retry. Every retry is a full re-send of the input tokens. At 5% error rates with one retry, your actual input cost is 5% higher than the sticker price. At 10% error rates (common during peak hours), it’s 10% higher.
Cache misses. The same question asked 100 times costs 100x without caching. A semantic cache that recognizes “What’s the weather?” and “How’s the weather today?” as the same query can cut your bill by 30–60% depending on your traffic patterns.
Prompt bloat. System prompts accumulate cruft. A 2,000-token system prompt sent with every 200-token user message means 90% of your input spend is the system prompt. Token trimming and context packing can compress this significantly.
No output caps. Ask GPT-4o to “explain quantum computing” without a max_tokens limit and you might get 4,000 tokens back. Set max_tokens: 500 and you get 500. That’s an 8x difference in output cost for a single request.
Based on the traffic patterns we see through Stockyard, here’s what actually moves the needle:
1. Cache aggressively. Even a simple exact-match cache cuts costs 20–40%. Semantic caching (fuzzy matching) gets you to 40–60%. This is the single highest-impact optimization.
2. Use the cheapest model that works. Route simple queries (classification, extraction, formatting) to GPT-4o-mini or Gemini Flash. Only send complex reasoning to frontier models. Stockyard’s tierdrop module does this automatically based on query complexity.
3. Set output caps. Always set max_tokens. For structured output (JSON, classifications), 500 tokens is usually plenty. For summarization, 1,000. This alone can reduce output costs by 50%.
4. Trim your prompts. Audit your system prompts monthly. Remove examples that aren’t improving quality. Compress instructions. Every token you remove from a system prompt saves money on every single request.
5. Failover to cheaper providers. If OpenAI returns a 429, don’t retry on OpenAI — failover to Anthropic or Groq. You avoid the retry cost and get faster recovery. Stockyard’s failover module handles this automatically across all 16 providers.
These aren’t theoretical numbers. They’re the kind of reductions we see when customers enable Stockyard’s cost control modules: cache, tierdrop, outputcap, tokentrim, and costwarn.
Stockyard’s cost tracking uses this pricing table to calculate the exact cost of every request in real time. The x-stockyard-cost response header shows you exactly what each call cost. The Lookout dashboard aggregates this into per-model, per-provider, and per-customer cost attribution.
The pricing table is updated with each release. You can also query it via the API:
GET /api/proxy/pricing returns the full table. GET /api/observe/costs shows your actual spend.
Try it: curl -sSL stockyard.dev/install.sh | sh
Stockyard vs LiteLLM · Stockyard vs Helicone · Stockyard vs Portkey