Chute

Proxy & middleware. 76 modules, 16 providers, runtime toggles.

Overview

The proxy is the gateway layer. Every LLM request passes through a middleware chain of 76 toggleable modules. The proxy is OpenAI API-compatible — any SDK that talks to /v1/chat/completions works out of the box.

If you only need the proxy layer without tracing or audit, see proxy-only mode.

Modules

Modules are organized by category:

CategoryModulesPurpose
routingfallbackrouter, modelswitch, regionroute, localsync, abrouterProvider failover, model aliasing, geo routing
cachingcachelayer, embedcache, semanticcacheLLM cache for responses and embeddings
costcostcap, tierdrop, idlekill, outputcap, usagepulse, rateshieldSpending limits, LLM rate limiting, usage reporting
safetypromptguard, toxicfilter, guardrail, agegate, hallucicheck, secretscan, agentguardContent moderation, injection detection, PII
transformpromptslim, tokentrim, contextpack, chatmem, langbridge, voicebridgePrompt compression, context management
validatestructuredshield, evalgate, codefenceJSON validation, quality gating
shimsanthrofit, geminishimUse Claude/Gemini with OpenAI SDK
observellmtap, tracelink, alertpulse, driftwatchLogging, tracing, alerting, drift detection

Every module is wrapped with toggle.Wrap and checks enabled state on every request. Disable a module and it’s bypassed instantly.

See the interactive module visualization for the full dependency graph.

Providers

Stockyard supports 16 LLM providers out of the box. Set an environment variable and the provider is auto-configured on boot:

ProviderEnv VarModels
OpenAIOPENAI_API_KEYgpt-4o, gpt-4.1, o3-mini, etc.
AnthropicANTHROPIC_API_KEYclaude-sonnet-4-5, claude-haiku-4-5
Google GeminiGEMINI_API_KEYgemini-2.5-pro, gemini-2.0-flash
GroqGROQ_API_KEYllama-3.3-70b, mixtral-8x7b
MistralMISTRAL_API_KEYmistral-large, codestral
DeepSeekDEEPSEEK_API_KEYdeepseek-chat, deepseek-reasoner
TogetherTOGETHER_API_KEYLlama 3.1, Qwen 2.5
FireworksFIREWORKS_API_KEYLlama 3.3, Qwen 2.5
PerplexityPERPLEXITY_API_KEYsonar-pro, sonar
xAIXAI_API_KEYgrok-3, grok-2
CohereCOHERE_API_KEYcommand-r-plus, command-a
OpenRouterOPENROUTER_API_KEYAny model via OpenRouter
ReplicateREPLICATE_API_TOKENAny model via Replicate
Azure OpenAIAZURE_OPENAI_API_KEYAzure-hosted models
Ollama(auto at :11434)Any local model
LM Studio(auto at :1234)Any local model

Any OpenAI-compatible API works as a custom provider via the user settings or config file.

Routes

Routes map model patterns to providers. When a request comes in for gpt-4o, the router checks the routes table and sends it to the matching provider.

# List routes
curl http://localhost:4200/api/proxy/routes

Toggling Modules

Enable or disable any module at runtime without restart:

# Enable the response cache
curl -X PUT http://localhost:4200/api/proxy/modules/cachelayer \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{"enabled": true}'

# Disable toxicity filtering
curl -X PUT http://localhost:4200/api/proxy/modules/toxicfilter \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

Module state changes take effect on the next request. No restart needed.

Module Configuration

Modules accept configuration alongside the enabled flag:

# Configure costcap with a $10/day limit
curl -X PUT http://localhost:4200/api/proxy/modules/costcap \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "config": {
      "daily_limit_usd": 10.00,
      "action": "block",
      "notify_at_pct": 80
    }
  }'

Provider Shims

Shim modules translate between the OpenAI API format and provider-native formats. Enable a shim and route requests to any provider using the standard OpenAI SDK:

# Route Claude requests through the anthrofit shim
curl http://localhost:4200/v1/chat/completions \
  -H "Authorization: Bearer sy_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{"role": "user", "content": "Hello from the OpenAI SDK!"}]
  }'

Available shims: anthrofit, geminishim, groqshim, ollamashim, bedrockshim, azureshim, mistralshim, cohereshim, togethershim, deepseekshim, fireworksshim, replicateshim, perplexityshim.

Failover & Routing

Configure automatic provider failover with the fallbackrouter module:

# stockyard.yaml
modules:
  fallbackrouter:
    enabled: true
    config:
      chain:
        - provider: openai
          model: gpt-4o
        - provider: anthropic
          model: claude-sonnet-4-5
        - provider: groq
          model: llama-3.3-70b-versatile
      max_retries: 2

If OpenAI fails or is rate-limited, the request automatically falls back to Anthropic, then Groq.

Streaming

Streaming works transparently. Set "stream": true in the request body:

curl http://localhost:4200/v1/chat/completions \
  -H "Authorization: Bearer sy_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream": true
  }'

The streamsnap module captures the full streamed response for logging without affecting delivery latency.

Embeddings

The proxy also handles embedding requests:

curl http://localhost:4200/v1/embeddings \
  -H "Authorization: Bearer sy_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Stockyard is an LLM proxy"
  }'

The embedcache module caches embedding results for identical inputs, saving both time and cost.

Inspecting Module State

Query the full module list with current status and config:

curl http://localhost:4200/api/proxy/modules \
  -H "Authorization: Bearer sy_admin_..."
{
  "modules": [
    {"name": "fallbackrouter", "enabled": true, "category": "routing", "order": 1},
    {"name": "cachelayer", "enabled": true, "category": "caching", "order": 13},
    {"name": "costcap", "enabled": false, "category": "cost", "order": 16}
  ],
  "total": 70,
  "enabled": 24
}

Caching Patterns

Stockyard offers three caching layers, each suited to different use cases:

ModuleMatch StrategyBest For
cachelayerExact SHA-256 hashIdentical repeated queries (chatbots, FAQ)
embedcacheEmbedding vector hashEmbedding deduplication
semanticcacheCosine similarity thresholdSimilar but not identical queries

Configure cache TTL and similarity thresholds per module:

# stockyard.yaml
modules:
  cachelayer:
    enabled: true
    config:
      ttl: 3600
      max_entries: 10000
  semanticcache:
    enabled: true
    config:
      similarity_threshold: 0.92
      embedding_model: "text-embedding-3-small"

Safety Module Chain

For production deployments, enable the full safety chain:

# Enable all safety modules at once
for module in promptguard toxicfilter guardrail secretscan agentguard; do
  curl -X PUT http://localhost:4200/api/proxy/modules/$module \
    -H "Authorization: Bearer sy_admin_..." \
    -d '{"enabled": true}'
done
Performance: The full 76-module chain adds ~4ms of overhead. Safety modules are the heaviest at 0.06–0.35ms each, but still negligible compared to LLM provider latency (500ms–3s).

Rate Limiting

The rateshield module enforces per-user rate limits using a token bucket algorithm:

# stockyard.yaml
modules:
  rateshield:
    enabled: true
    config:
      rpm: 60           # Requests per minute
      burst: 10          # Allow burst of 10 above limit
      per: "api_key"     # Rate limit per API key

When rate-limited, the proxy returns HTTP 429 with a Retry-After header indicating when the client can retry.

Explore: OpenAI-compatible · Model aliasing · Why SQLite