How I cut my Cursor API bill by 60% with a local proxy

March 31, 2026 · Michael

I started using Cursor full time in January. By the end of February my OpenAI bill was $380. That was more than double what I expected, and I had no idea where the money went.

The problem is Cursor does not show you per-request costs. You see a monthly total from OpenAI, aggregated across everything. Tab completions, chat, composer, inline edits. Each one sends tokens to the API and you have no visibility into which feature is expensive.

Step 1: see what Cursor is actually doing

I installed Stockyard and pointed Cursor at it. One config change in Cursor's settings:

# Cursor Settings > Models > OpenAI API Base
http://localhost:4200/v1

Immediately I could see every request. Stockyard's cost dashboard showed the breakdown: 73% of my spend was tab completions. They fire constantly as you type, and each one sends the entire file as context. The chat and composer features were only 27% of the bill.

Step 2: cache the repeated requests

Tab completions resend the same context repeatedly as you edit a file. If you type three characters, Cursor sends three nearly identical requests. With prompt caching enabled, the second and third requests hit the cache and cost nothing.

# Enable caching
curl -X PUT http://localhost:4200/api/proxy/modules/cache \
  -d '{"enabled": true}'

This alone cut my daily spend by about 35%. Cache hit rate for tab completions was around 40% because so many requests share overlapping context.

Step 3: route tab completions to a cheaper model

Tab completions do not need GPT-4o. They need fast, cheap, good-enough completions. I set up model aliasing to route the default completion model to DeepSeek, which is roughly 20x cheaper per token:

# Route cheap completions to DeepSeek
curl -X PUT http://localhost:4200/api/proxy/aliases \
  -d '{"alias": "gpt-4o-mini", "model": "deepseek-chat"}'

Cursor still sends requests to gpt-4o-mini but Stockyard routes them to DeepSeek. The completions feel the same. The cost dropped another 25%.

Step 4: set a daily cap

I set a $15/day spending cap so I would never get surprised again. When the cap hits, Stockyard returns a clear error and Cursor gracefully falls back to local completions. I have never actually hit the cap since adding caching and model routing.

The result

February: $380. March: $152. Same coding output, same Cursor workflow. The proxy added zero noticeable latency (Stockyard's middleware chain adds about 400 nanoseconds).

The breakdown of the 60% savings: ~35% from caching repeated requests, ~25% from routing tab completions to a cheaper model. The daily cap did not contribute to savings directly but removed the anxiety of runaway costs.

Try it yourself

Stockyard installs in 60 seconds. Set it as your Cursor API base and you immediately see where your money goes. The full Cursor setup guide has the details. If you are spending more than $100/month on API calls through an editor, this is probably worth 5 minutes of your time.

The 5 strategies for reducing LLM costs page covers additional techniques beyond what I used here.

— Michael