Failover

When a provider goes down, requests automatically route to backups.

Enable failover

Failover requires at least two providers configured. Enable it in your config:

failover:
  enabled: true
  strategy: priority    # priority, round-robin, or latency
  providers:
    - openai
    - anthropic
    - groq

Or enable via the API:

curl -X PUT http://localhost:4200/api/proxy/modules/failover \
  -d '{"enabled": true}'

Routing strategies

priority sends to the first provider in the list. If it fails, try the next. This is the default and gives you predictable cost behavior since you control which provider handles most traffic.

round-robin distributes requests evenly across providers. Good for load distribution but less predictable costs since different providers charge different rates.

latency sends to the provider with the lowest recent latency. Good for optimizing response time.

Circuit breaker

Stockyard tracks error rates per provider. When a provider exceeds the failure threshold, the circuit "opens" and that provider is temporarily removed from routing. After a recovery timeout, it is added back for a test request.

failover:
  circuit_breaker:
    failure_threshold: 5   # consecutive failures before opening
    recovery_timeout: 30s  # wait before trying again

Cross-provider model mapping

Failover works best with model aliasing. When failing over from OpenAI to Anthropic, you need to map models:

aliases:
  - alias: gpt-4o
    model: claude-sonnet-4-5-20250929    # used when failing over to Anthropic

Without aliasing, a failover to Anthropic would fail because Anthropic does not have a model called gpt-4o.

Monitoring failover

Check provider health and circuit breaker state:

curl http://localhost:4200/api/diag/providers
# Shows status, error count, circuit state per provider