Reliability

Your LLM provider goes down. Your app does not.

Stockyard automatically fails over to the next provider when one returns errors. Model-aware routing, circuit breakers, and streaming failover are all built into the proxy. No app code changes needed.

Get started free

The problem

Every LLM provider has outages. OpenAI goes down. Anthropic returns 500s. Google rate-limits you. When your app calls one provider directly and that provider fails, your users see errors.

The usual fix is wrapping every API call in retry logic with fallback chains. That means your application code knows about providers, manages provider-specific error handling, and handles streaming retries. It is fragile, it is scattered across your codebase, and it breaks every time you add a new model.

Stockyard handles failover at the proxy layer. Your app sends a request for a model. The proxy knows which provider to try first, detects failures, and routes to the next provider in the chain. Your application code never changes.

When you send a request for claude-sonnet-4-5, the failover chain works like this:

1. Anthropic (natural provider)

Model-aware routing detects "claude" in the model name and tries Anthropic first.

try first

2. Anthropic returns 503

Circuit breaker records the failure. After the configured threshold, the circuit opens and skips Anthropic for subsequent requests.

circuit records failure

3. OpenAI (next in chain)

Request is re-sent to the next provider. If the model is available there, the user gets a response. If not, the chain continues.

response returned

The same logic works for GPT models (OpenAI first), Gemini models (Google first), and local models (Ollama first).

Configuration

Enable failover by listing your providers in priority order. Stockyard reorders the chain per-request based on the model being requested, so the "natural" provider always gets tried first.

# stockyard.yaml failover: enabled: true strategy: priority providers: - openai - anthropic - google - ollama circuit_breaker: failure_threshold: 3 recovery_timeout: 30s

With this config, a request for gpt-4o tries OpenAI first (natural provider), then Anthropic, Google, and Ollama. A request for claude-sonnet-4-5 tries Anthropic first, then OpenAI, Google, and Ollama. The provider order adapts to the model automatically.

Each provider has its own circuit breaker with three states:

Closed (normal) — requests flow through. Every failure increments a counter.

Open (tripped) — after the failure threshold is hit, the circuit opens. All requests skip this provider and go straight to the next one in the chain. No wasted time sending requests to a provider you know is down.

Half-open (recovery) — after the recovery timeout, one probe request is allowed through. If it succeeds, the circuit closes and the provider is back in the chain. If it fails, the circuit stays open.

This prevents the thundering herd problem where every request retries a broken provider before failing over. Once a provider is marked down, subsequent requests skip it instantly.

Streaming failover

Streaming (SSE) requests have their own failover path. If the first provider fails before sending any chunks, Stockyard retries on the next provider transparently. Your client receives a single uninterrupted stream.

If the provider fails mid-stream (after chunks have already been sent to the client), Stockyard cannot transparently retry because the client has already received partial data. In that case, the stream ends with an error and your client handles it the same way it would handle any interrupted stream.

Non-retryable errors (400 Bad Request, 401 Unauthorized, 403 Forbidden) are never retried. If your API key is wrong or your request is malformed, failover will not help and should not waste time trying.

If a user passes their own provider API key in the Authorization header, Stockyard skips the failover chain entirely and routes directly to the detected provider. Failover only applies to requests using server-configured provider keys.

What you do not have to build

Without a proxy-level failover, your application code typically needs to handle provider-specific error detection, retry logic with backoff, fallback model selection, circuit breaker state, streaming retry coordination, and provider health monitoring.

With Stockyard, your app sends one request to one endpoint. The proxy handles all of that. Your application code stays the same whether you have one provider or six.

# Your app code doesn't change. # Same request whether failover is on or off. curl http://localhost:4200/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Provider outages are inevitable. App downtime is not.

Model-aware failover with circuit breakers. Free tier. One binary.

Get started
Local + cloud fallback → Proxy-only mode → OpenAI-compatible proxy → Response caching →
Explore: OpenAI-compatible · Model aliasing · Why SQLite