Can I use local and cloud models together?

Yes. Stockyard routes by model name. Local Ollama models and cloud providers work through the same endpoint. Use model aliasing to set up automatic failover from local to cloud.

Does Ollama through Stockyard add latency?

Negligible. Stockyard adds ~400 nanoseconds of middleware overhead. Local network latency to Ollama is typically under 1ms.

Can I track costs for local models?

Yes. Stockyard tracks tokens and latency for local models. Cost is $0 per request, but the tracking helps you compare local vs cloud performance.

Ollama Proxy — Route Local Models Through Stockyard

Environment variable

Auto-detected at localhost:11434

Models

Any model pulled in Ollama (Llama, Mistral, Phi, Gemma, etc.)

Failover to

OpenAI GPT-4o, Anthropic Claude, or Groq

API format

OpenAI-compatible

Why proxy Ollama?

Ollama runs open-source models locally. Proxying through Stockyard gives your local models the same infrastructure as cloud providers: request tracing, latency tracking, and middleware modules like safety guardrails and PII redaction.

The real power is local-plus-cloud failover. Route requests to Ollama first (free, fast for small models), and fall back to OpenAI or Anthropic when the local model cannot handle the request or when you need a more capable model.

Quick start

# Install Stockyard
curl -fsSL stockyard.dev/install.sh | sh

# Make sure Ollama is running with a model
ollama pull llama3.2
ollama serve

# Install and start Stockyard (auto-detects Ollama)
curl -fsSL stockyard.dev/install.sh | sh
stockyard
# Provider: ollama (auto-detected at localhost:11434)
# Proxy listening on :4200

# Send a request through the proxy
curl http://localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2","messages":[{"role":"user","content":"hello"}]}'

Good to know

Ollama must be running before Stockyard starts. Stockyard auto-detects it at localhost:11434. Custom ports can be set via OLLAMA_BASE_URL.

Local + cloud failover

The real power of proxying Ollama is combining it with cloud providers. Route requests to your local model first (free, fast), and automatically fall back to OpenAI or Anthropic when you need a more capable model or when the local machine is busy.

# Set both local and cloud providers
export OPENAI_API_KEY=sk-...
stockyard
# Provider: ollama (auto-detected at localhost:11434)
# Provider: openai (from OPENAI_API_KEY)

# Alias for automatic routing
curl -X PUT http://localhost:4200/v1/api/proxy/aliases \
  -d '{"alias":"default","model":"llama3.2","fallback":"gpt-4o-mini"}'

Requests go to Ollama first. If Ollama is down or returns an error, Stockyard retries on OpenAI. Your app sends requests to default and never knows which provider handled it. See local + cloud fallback for the full setup.

Route Ollama through Stockyard

Why proxy Ollama?

Quick start

Good to know

Local + cloud failover