Tack Room (Studio)

Version, test, and optimize every prompt in your stack.

Prompt Templates

# Create a versioned prompt template
curl -X POST http://localhost:4200/api/studio/templates \
  -d '{"slug":"summarizer","name":"Summarizer","content":"Summarize: {{text}}"}'

# List all templates
curl http://localhost:4200/api/studio/templates

Templates support {{variable}} interpolation. Every edit creates a new version automatically.

A/B Experiments

# Run an A/B test across models
curl -X POST http://localhost:4200/api/studio/experiments/run \
  -d '{
    "name": "speed-vs-quality",
    "prompt": "Explain quantum computing in one paragraph",
    "models": ["gpt-4o","claude-sonnet-4-5-20250929","gemini-2.0-flash"],
    "runs": 3,
    "eval": "length"
  }'

Eval methods: length (longer = better), concise (shorter = better), json (valid JSON), contains (keyword match), or empty for cost comparison.

Benchmarks

# Run a multi-prompt benchmark
curl -X POST http://localhost:4200/api/studio/benchmarks/run \
  -d '{
    "name": "model-eval-q1",
    "models": ["gpt-4o-mini","deepseek-chat"],
    "prompts": [
      {"name":"summarize","prompt":"Summarize: ...","eval":"concise"},
      {"name":"code","prompt":"Write fizzbuzz","eval":"contains","eval_arg":"for"}
    ],
    "runs": 3
  }'

See the Studio product page and the Diff view (coming soon).

Prompt Templates

Create versioned prompt templates for reuse and A/B testing:

curl -X POST http://localhost:4200/api/studio/templates \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customer-support-v1",
    "system_prompt": "You are a helpful customer support agent for {{company}}. Be concise and friendly.",
    "variables": ["company"],
    "model": "gpt-4o",
    "temperature": 0.3
  }'
{
  "id": "tpl_abc123",
  "name": "customer-support-v1",
  "version": 1,
  "created_at": "2026-02-28T12:00:00Z"
}

Listing Templates

curl http://localhost:4200/api/studio/templates \
  -H "Authorization: Bearer sy_admin_..."
{
  "templates": [
    {"id": "tpl_abc123", "name": "customer-support-v1", "version": 1, "model": "gpt-4o"},
    {"id": "tpl_def456", "name": "code-review-v2", "version": 2, "model": "claude-sonnet-4-5"}
  ],
  "total": 2
}

A/B Experiments

Run experiments to compare prompts, models, or temperatures:

curl -X POST http://localhost:4200/api/studio/experiments/run \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT-4o vs Claude on support",
    "variants": [
      {"model": "gpt-4o", "template": "customer-support-v1", "variables": {"company": "Acme"}},
      {"model": "claude-sonnet-4-5", "template": "customer-support-v1", "variables": {"company": "Acme"}}
    ],
    "test_inputs": [
      {"messages": [{"role": "user", "content": "How do I reset my password?"}]},
      {"messages": [{"role": "user", "content": "I want a refund"}]}
    ],
    "eval_criteria": ["helpfulness", "conciseness", "tone"]
  }'

Results include side-by-side comparisons and cost breakdowns:

curl http://localhost:4200/api/studio/experiments/exp_id \
  -H "Authorization: Bearer sy_admin_..."

Benchmark Runs

Run benchmark suites to compare model performance and cost:

curl -X POST http://localhost:4200/api/studio/benchmarks/run \
  -H "Authorization: Bearer sy_admin_..." \
  -H "Content-Type: application/json" \
  -d '{
    "models": ["gpt-4o", "claude-sonnet-4-5", "llama-3.3-70b-versatile"],
    "prompts": ["Explain quantum computing in one paragraph"],
    "runs_per_model": 3
  }'

Tack Room Configuration

# stockyard.yaml
apps:
  studio:
    max_experiments: 100
    max_templates: 500
    default_eval_model: "gpt-4o"
Tip: Use the promptlint module to automatically validate prompt templates against common anti-patterns before they reach the LLM.

Tack Room Status

curl http://localhost:4200/api/studio/status \
  -H "Authorization: Bearer sy_admin_..."
{
  "templates": 12,
  "experiments_run": 47,
  "last_experiment": "2026-02-27T18:00:00Z"
}

Template Variables

Templates support Mustache-style variables ({{variable}}) that are substituted at runtime. This lets you reuse the same prompt across different contexts:

# Template with variables
{
  "name": "product-description",
  "system_prompt": "Write a {{tone}} product description for {{product}} targeting {{audience}}.",
  "variables": ["tone", "product", "audience"],
  "defaults": {"tone": "professional", "audience": "enterprise buyers"}
}

Variables without defaults are required at runtime. Variables with defaults can be overridden.

Version History

Each template update creates a new version. Previous versions are preserved for rollback and A/B comparison. The promptpad module automatically resolves template references to the latest version unless a specific version is pinned.

Evaluation Criteria

Experiments support custom evaluation criteria. Built-in criteria include: helpfulness, conciseness, tone, accuracy, and relevance. Define custom criteria as natural language descriptions that an evaluator model scores on a 1–5 scale.

Cost control: Use mockllm during prompt development to test template rendering and workflow logic without making real API calls.

Typical Tack Room Workflow

A typical prompt engineering workflow with Tack Room:

StepActionAPI
1. DraftCreate a prompt template with variablesPOST /api/studio/templates
2. LintValidate with promptlint moduleAutomatic on proxy
3. TestRun A/B experiment across modelsPOST /api/studio/experiments/run
4. EvaluateReview results and scoresGET /api/studio/experiments/{id}
5. BenchmarkMeasure latency and cost across providersPOST /api/studio/benchmarks/run
6. DeployPin the winning template version in productionPUT /api/proxy/modules/promptpad
Integration: Tack Room templates can be referenced in Forge workflows, enabling versioned prompt management across complex multi-step pipelines.

API Summary

MethodPathDescription
GET/api/studio/statusStudio app health and stats
GET/api/studio/templatesList prompt templates
POST/api/studio/templatesCreate versioned template
POST/api/studio/experiments/runRun A/B experiment
GET/api/studio/experiments/{id}Get experiment results and scores
POST/api/studio/benchmarks/runBenchmark models on a prompt set

All Tack Room endpoints require admin authentication. Templates created via the API are immediately available for use through the promptpad proxy module.

Playground: The built-in playground at /playground provides a visual interface for testing templates and running experiments without curl commands.

Prompt Engineering Tips

When creating templates in Tack Room, keep these best practices in mind:

PracticeWhy It Matters
Use system prompts for roleMore consistent behavior across models
Keep variables focusedSmaller variable surface = fewer edge cases
Test with multiple modelsPrompts that work on GPT-4o may need tweaks for Claude
Version every changeTack Room auto-versions; never edit in place
Run experiments before deployA/B test catches regressions that manual review misses

For the full Tack Room API reference, see API Reference: Tack Room.

The playground provides an interactive UI for quick prompt experimentation without writing API calls.

Explore: Self-hosted proxy · OpenAI-compatible · Model aliasing