How a request flows through Stockyard
Click any module to learn what it does, or hit Play to watch a request animate through the full chain.
Click any module to learn what it does, or hit Play to watch a request animate through the full chain.
Click any module to see what it does.
Every request enters through the standard OpenAI-compatible /v1/chat/completions endpoint. It flows through the middleware chain in order: routing decides which provider handles it, safety modules check for harmful content, cost modules enforce spending limits, the cache checks for a hit, transforms optimize the prompt, validators ensure output quality, and observe modules record everything. After the response, hooks write traces to Lookout and audit entries to Brand automatically.
Every module is wrapped with toggle.Wrap. Disabled modules are bypassed with zero overhead. Toggle any module on or off at runtime via the API or the console — no restart required.
The proxy is the foundation. On top of it, Stockyard runs integrated products:
Trading Post (config packs, one-click install)
The ~200ms proxy overhead is the full middleware chain (rate limiting, cost tracking, logging, failover, filtering). A typical LLM provider response takes 1-30 seconds, so proxy overhead is under 2% of total request time.