Failover
When a provider goes down, requests automatically route to backups.
Enable failover
Failover requires at least two providers configured. Enable it in your config:
failover: enabled: true strategy: priority # priority, round-robin, or latency providers: - openai - anthropic - groq
Or enable via the API:
curl -X PUT http://localhost:4200/api/proxy/modules/failover \ -d '{"enabled": true}'
Routing strategies
priority sends to the first provider in the list. If it fails, try the next. This is the default and gives you predictable cost behavior since you control which provider handles most traffic.
round-robin distributes requests evenly across providers. Good for load distribution but less predictable costs since different providers charge different rates.
latency sends to the provider with the lowest recent latency. Good for optimizing response time.
Circuit breaker
Stockyard tracks error rates per provider. When a provider exceeds the failure threshold, the circuit "opens" and that provider is temporarily removed from routing. After a recovery timeout, it is added back for a test request.
failover: circuit_breaker: failure_threshold: 5 # consecutive failures before opening recovery_timeout: 30s # wait before trying again
Cross-provider model mapping
Failover works best with model aliasing. When failing over from OpenAI to Anthropic, you need to map models:
aliases: - alias: gpt-4o model: claude-sonnet-4-5-20250929 # used when failing over to Anthropic
Without aliasing, a failover to Anthropic would fail because Anthropic does not have a model called gpt-4o.
Monitoring failover
Check provider health and circuit breaker state:
curl http://localhost:4200/api/diag/providers # Shows status, error count, circuit state per provider