Kept the rate limiter in-memory until traffic justifies Upstash, Apoorv Mittal

Context

The portfolio's AI endpoints (/api/chat, /api/nl-sql) talk to Anthropic. Each call costs money. I wanted three guardrails before launching the chat publicly:

A daily global budget in cents — caps total spend regardless of what hits.
A per-IP daily cap — caps the blast radius of a single abuser so one IP can't drain the global budget alone.
Request-shape pre-checks — bot UA filter, origin guard, body size limit, Content-Length requirement — to reject the cheap-to- identify junk before it reaches the model.

The straightforward implementation lives in src/lib/rate-limit.ts: two Map<string, ...> objects, one for per-IP buckets and one for the global counter, with reservation + commit semantics so the actual model cost — known only after streaming completes — reconciles against the reservation taken at request entry.

The audit (2026-05-27) flagged this correctly: the in-memory store does not survive across Vercel function instances. Two requests landing on two warm Lambdas at once can each reserve up to the cap.

I decided to ship it anyway.

What I asked for before deciding

For an upgrade to a distributed store (Upstash Redis is the obvious fit on Vercel) I needed:

The actual traffic shape. A portfolio site gets, on average, ~0 simultaneous AI requests. The relevant scenario isn't normal traffic; it's "what happens if this site hits HN front page". That's a real failure mode but its frequency over a year is low.
The actual cost ceiling. The daily budget is set at $1/ d a y . W i t h N in s t an ces, t h e w or s t c a se i s N \times$ 1/day across simultaneous warm Lambdas. On Vercel a portfolio rarely keeps more than 1–2 warm at once. Worst plausible overshoot: $2–3 on a viral day. Acceptable as a one-off; would not be acceptable as a steady state.
The migration shape. What does the code look like with Upstash? How much of the current API surface changes?

What I found

Traffic — not yet a problem

Realistic load: handful of recruiter visits per week, a few chat exchanges per visit. Multi-instance fanout is a non-issue because multi-instance doesn't activate. The audit-flagged bypass scenario needs concurrent traffic on multiple warm Lambdas, which doesn't happen organically here.

Cost — bounded by the daily cap

The daily cap is the actual circuit-breaker. The audit-flagged overshoot is bounded above by N × cap_per_day. Set cap_per_day to a budget I'd be willing to absorb on a viral day, and the overshoot is by definition tolerable. If a future business decision makes this site cost-sensitive, the cap is the lever, not the store.

Migration shape — drafted now, not shipped

The cost of leaving this for "later" is the cost of doing the distributed work under pressure when traffic finally arrives. To absorb that pressure I drafted a RateLimitStore interface and a sketch of the Upstash REST-API adapter (no SDK dep needed — Upstash exposes plain HTTP):

// src/lib/rate-limit-redis.ts (sketch)
export interface RateLimitStore {
  reserve(key: string, cents: number): Promise<ReserveResult>;
  commit(key: string, actualCents: number): Promise<void>;
}

The current in-memory store would adopt the same interface. Routes switch from synchronous reserveBudget(3) to await reserveBudget(3) behind a feature-flag env var; flip when needed; the rest of the code is unchanged.

What I decided

Keep the in-memory store. Ship the interface. Document the migration trigger.

Triggers to flip to Upstash:

Sustained concurrent traffic. Two or more warm instances serving AI requests at the same time, more days than not.
A viral incident. One spike where the daily-cap × instance-count overshoot was painful in the bank statement, not just on paper.
A second portfolio surface. If a future case study adds a third AI endpoint and the shared budget becomes shared across routes, the per-store-per-route accounting collapses.

Until then, the simpler code path wins:

No external service in the request path → fewer failure modes, no Upstash latency on the cold path.
No async refactor through three route handlers and the test suite.
The cost ceiling is the daily cap, which is set at a number I'd be willing to absorb even at the worst-case overshoot.

The boring-tech move is keeping the in-memory store and writing this decision down so the next maintainer (probably me, a year from now) doesn't have to re-derive why.

The trigger I'd watch for

A single Vercel Function Log line: event:"daily_budget_exhausted" firing more than once a week. That's the structured event the new src/lib/observability.ts emits when a reservation is denied because the global counter hit the cap. Once that fires on a recurring basis it's no longer a worst-case scenario, it's the median case, and the per-instance overshoot stops being a rounding error.