Why One Binary: Building an LLM Gateway with Go and SQLite

February 1, 2026 · Michael

Why One Binary: Building an LLM Gateway with Go and SQLite

March 19, 2026 · Michael

Stockyard is an LLM proxy with tracing, cost tracking, and audit logs. It ships as one Go binary with embedded SQLite. No Redis. No Postgres. No Docker compose. This post explains why.

The problem I was solving

Every LLM app I shipped needed the same infrastructure: a proxy to sit between my app and the LLM provider, request tracing so I could see what was happening, cost tracking so I didn’t get surprised by bills, and audit logs for compliance.

The standard approach is to assemble this from parts. LiteLLM for routing. Langfuse or Helicone for observability. Your own Postgres for audit logs. Redis for caching. Docker Compose to glue it together. Each tool has its own config format, its own failure modes, its own upgrade cycle.

I wanted one thing I could install and run.

Why Go over Python

Most LLM tooling is written in Python. That makes sense — the ML ecosystem is Python-native, and most LLM SDKs are Python-first.

But a proxy is not an ML workload. It’s a network service. It accepts HTTP requests, runs them through a chain of middleware, forwards them to an upstream provider, and pipes the response back. This is exactly what Go is designed for.

Specific advantages that mattered:

Static binary. CGO_ENABLED=0 go build produces a ~25MB binary with zero runtime dependencies. No virtualenv, no pip, no system Python version conflicts. Copy the binary to a server and run it.

Goroutine-per-request. Each proxied request gets its own goroutine. The 76-module middleware chain runs synchronously within that goroutine — no callback spaghetti, no async/await coloring. The runtime handles scheduling across cores.

Predictable performance. The full 76-module middleware chain adds ~400ns of overhead per request (benchmarked with go test -bench on Xeon Platinum). That’s noise compared to the 1–5 second LLM provider latency. No GC pauses that matter at this scale.

Single-port serving. Go’s net/http serves the proxy endpoint (/v1/*), the REST API (/api/*), the dashboard (/ui), and the marketing site (/) all on one port from one process. No nginx, no reverse proxy, no port mapping.

Why SQLite over Postgres

This is the decision people question most. “SQLite can’t handle production traffic.”

That depends on what you’re asking it to do. Stockyard uses SQLite for:

Request traces. Every proxied request writes one row: model, provider, tokens, cost, latency, timestamp. This is append-heavy, single-writer workload — exactly what SQLite with WAL mode handles well.

Audit ledger. Each audit event is one row with a SHA-256 hash of the previous event. Append-only. Sequential writes. SQLite handles this trivially.

Configuration. Module settings, provider configs, prompt templates, workflow definitions. These are read-heavy, rarely-written. SQLite shines here.

Cost tracking. Daily spend rollups per project. One upsert per request. WAL mode means readers never block.

What SQLite doesn’t do well: high-concurrency writes from multiple processes, horizontal scaling across machines, replication. Stockyard doesn’t need any of these. It’s one process on one machine.

The operational benefit is enormous: no database server to run, no connection pooling to configure, no schema migrations to coordinate across services, no backup infrastructure. The entire database is one file. Back it up with cp.

The middleware chain

Stockyard has 76 middleware modules. Each one implements a simple interface: receive the request context, optionally modify it, call next. The chain runs synchronously — no event loops, no message queues.

Every module is toggleable at runtime via a PUT to /api/proxy/modules/{name}. This is stored in SQLite and checked on each request. The overhead of checking 76 module states is negligible (it’s an in-memory map lookup with periodic SQLite refresh).

Some modules people ask about:

Cache. Hash the (model + messages + temperature) tuple. If it’s in SQLite, return it without hitting the provider. Simple but effective — most teams see 15–30% cache hit rates on real traffic.

Cost cap. Each request estimates cost from the model’s pricing table and running token count. If the daily/monthly cap is exceeded, the request is rejected before reaching the provider.

Failover. If the primary provider returns a 5xx or times out, retry on the next provider in the chain. No code changes in the calling app.

Prompt guard. Pattern-matching against known prompt injection patterns. Not bulletproof, but catches the obvious stuff before it reaches the LLM.

The audit ledger

This is the piece I’m most proud of. Every request through Stockyard creates an audit event. Each event includes a SHA-256 hash of the previous event, forming a hash chain. Tampering with any event breaks the chain from that point forward.

Verification is one API call: GET /api/trust/ledger/verify. It walks the chain and confirms every hash matches. On the current production deployment, this verifies 1,085 events in milliseconds.

This isn’t blockchain. There’s no distributed consensus, no proof of work. It’s a hash-linked list in SQLite. Simple, fast, and tamper-evident.

What I’d do differently

If I were starting over:

Fewer apps at launch. Stockyard ships with 16 core apps and 29 total products. Chute, Lookout, and Brand are the core. The product count grew because each solves a real problem, but naming and presenting 29 things is the real challenge.

Streaming from day one. SSE streaming through a reverse proxy has edge cases I underestimated. Non-streaming works perfectly; streaming needs more work on certain hosting platforms.

Provider coverage. 16 providers is good but LiteLLM supports 100+. For a routing proxy, breadth matters. I prioritized depth (full middleware chain) over breadth (provider count). Whether that was right depends on the user.

Try it

curl -fsSL stockyard.dev/install.sh | sh
stockyard

# Make a request
curl localhost:4200/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}]}'

# Check what happened
curl localhost:4200/api/lookout/traces?limit=1
curl localhost:4200/api/brand/ledger?limit=1

Source-available. Code on GitHub.

Why I Built StockyardFeb 1 Architecture Decisions Behind StockyardFeb 10 Replacing 150 LLM Tools with One BinaryFeb 18

Stockyard. Wrangle your Stack.

See how Stockyard compares

Stockyard vs LiteLLM · Stockyard vs Helicone · Stockyard vs Portkey

Explore: Self-hosted proxy · Proxy-only mode · Install guide · Best self-hosted proxy