Provider

Route Ollama through Stockyard

Add cost tracking, caching, failover, and 76 middleware modules to your Ollama requests. One URL change, no SDK swap.

Environment variable
Auto-detected at localhost:11434
Models
Any model pulled in Ollama (Llama, Mistral, Phi, Gemma, etc.)
Failover to
OpenAI GPT-4o, Anthropic Claude, or Groq
API format
OpenAI-compatible

Why proxy Ollama?

Ollama runs open-source models locally. Proxying through Stockyard gives your local models the same infrastructure as cloud providers: request tracing, latency tracking, and middleware modules like safety guardrails and PII redaction.

The real power is local-plus-cloud failover. Route requests to Ollama first (free, fast for small models), and fall back to OpenAI or Anthropic when the local model cannot handle the request or when you need a more capable model.

Quick start

# Install Stockyard
curl -fsSL stockyard.dev/install.sh | sh

# Make sure Ollama is running with a model
ollama pull llama3.2
ollama serve

# Install and start Stockyard (auto-detects Ollama)
curl -fsSL stockyard.dev/install.sh | sh
stockyard
# Provider: ollama (auto-detected at localhost:11434)
# Proxy listening on :4200

# Send a request through the proxy
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"hello"}]}'

Good to know

Ollama must be running before Stockyard starts. Stockyard auto-detects it at localhost:11434. Custom ports can be set via OLLAMA_BASE_URL.

Local + cloud failover

The real power of proxying Ollama is combining it with cloud providers. Route requests to your local model first (free, fast), and automatically fall back to OpenAI or Anthropic when you need a more capable model or when the local machine is busy.

# Set both local and cloud providers
export OPENAI_API_KEY=sk-...
stockyard
# Provider: ollama (auto-detected at localhost:11434)
# Provider: openai (from OPENAI_API_KEY)

# Alias for automatic routing
curl -X PUT http://localhost:4200/v1/api/proxy/aliases \
  -d '{"alias":"default","model":"llama3.2","fallback":"gpt-4o-mini"}'

Requests go to Ollama first. If Ollama is down or returns an error, Stockyard retries on OpenAI. Your app sends requests to default and never knows which provider handled it. See local + cloud fallback for the full setup.

Route Ollama through Stockyard in under 60 seconds.

Install Guide

All 16 providers · Proxy-only mode · What is an LLM proxy? · Best self-hosted proxy · One-binary proxy

Explore: OpenAI · Anthropic · Groq · DeepSeek