Groq offers a free tier with rate limits. Paid usage is very affordable. Stockyard's cost tracking shows exact per-request costs for Groq models.

Why proxy Groq if it's already fast?

Speed is Groq's strength, but it lacks built-in cost tracking, caching, audit logs, and failover. Stockyard adds these without measurable latency impact.

Can I fail over from Groq to OpenAI?

Yes. Set up both providers and use model aliasing. When Groq rate limits or errors, Stockyard retries on your fallback provider automatically.

Groq Proxy — Route Groq LPU Requests Through Stockyard

Environment variable

GROQ_API_KEY

Models

llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b

Failover to

OpenAI GPT-4o-mini, DeepSeek, or Anthropic Haiku

API format

OpenAI-compatible

Why proxy Groq?

Groq runs open-source models on custom LPU hardware with extremely low latency. Proxying through Stockyard adds cost tracking (Groq is cheap but not free), response caching (save even more), and failover to other providers when Groq hits rate limits.

Groq is already OpenAI-compatible, so the translation overhead is minimal. Stockyard adds the operational layer that Groq does not provide: per-request logging, audit trails, and middleware modules.

Quick start

# Install Stockyard
curl -fsSL stockyard.dev/install.sh | sh

# Set your Groq API key
export GROQ_API_KEY=your-key-here

# Start the proxy
stockyard
# Provider: groq (from GROQ_API_KEY)
# Proxy listening on :4200

# Send a request through the proxy
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"hello"}]}'

Good to know

Groq has aggressive rate limits on free tiers. Stockyard's rate limiting module can smooth out request bursts and the cache reduces redundant API calls.

Handle Groq rate limits gracefully

Groq's LPU hardware delivers sub-second responses, but free-tier rate limits can throttle you at 30 requests per minute. Stockyard helps in two ways:

CACHING

Identical prompts return cached responses instantly. For iterative development, this can cut your effective request count by 50-80%.

FAILOVER

When Groq returns a 429 rate limit error, Stockyard automatically retries on your fallback provider (OpenAI, DeepSeek, etc.) so your app never sees the error.

Route Groq through Stockyard

Why proxy Groq?

Quick start

Good to know

Handle Groq rate limits gracefully