Feature

LLM Request Replay

Take any past request, re-run it against a different model, and compare the results. No code changes, no test harness.

Why replay matters

You want to switch from GPT-4o to Claude or DeepSeek. But will the output quality hold? The only way to know is to test with your actual production requests, not synthetic benchmarks.

Request replay takes a real request from your logs, sends it to a different model, and shows you the original and new response side by side. You compare cost, latency, token count, and output quality on your own data.

How it works in Stockyard

Every request through Stockyard is logged with the full prompt, response, model, tokens, cost, and latency. Lasso (Stockyard's replay engine) lets you pick any logged request and re-run it:

# Replay a request against a different model
curl -X POST http://localhost:4200/api/replay \
  -d '{"trace_id": "tr_a8f21c4e", "model": "claude-sonnet-4-5-20250929"}'

# Compare the results
curl http://localhost:4200/api/replay/compare/tr_a8f21c4e
# Original: gpt-4o, $0.0045, 1.2s, 342 tokens
# Replay:   claude-sonnet-4-5, $0.0038, 0.9s, 318 tokens

Use cases

Provider migration. Before switching from OpenAI to Anthropic, replay 100 production requests and compare quality. Make the decision with data, not guesswork.

Cost optimization. Replay your most expensive requests against cheaper models. Find out which requests can safely use DeepSeek or Gemini Flash instead of GPT-4o.

Regression testing. After changing prompts, replay historical requests to verify the new prompt produces equivalent or better output.

Shareable comparisons. Generate a share link for any replay comparison. Send it to your team to discuss whether the switch makes sense.

How this differs from benchmarks

Public benchmarks test models on standardized tasks. Replay tests models on your actual workload. A model that scores well on MMLU might perform poorly on your specific prompt patterns. Replay gives you the answer for your data.

Try Stockyard. One binary, 16 providers, under 60 seconds.

Get Started

Proxy-only mode · Pricing · Self-hosted proxy · Best self-hosted proxy

Explore: Prompt versioning · Reduce LLM costs · Self-hosted proxy · Proxy-only mode
Stockyard also makes 150 focused self-hosted tools — browse the catalog or get everything for $29/mo.