Context
The portfolio's AI endpoints (/api/chat, /api/nl-sql) talk to Anthropic.
Each call costs money. I wanted three guardrails before launching the
chat publicly:
- A daily global budget in cents — caps total spend regardless of what hits.
- A per-IP daily cap — caps the blast radius of a single abuser so one IP can't drain the global budget alone.
- Request-shape pre-checks — bot UA filter, origin guard, body
size limit,
Content-Lengthrequirement — to reject the cheap-to- identify junk before it reaches the model.
The straightforward implementation lives in src/lib/rate-limit.ts:
two Map<string, ...> objects, one for per-IP buckets and one for the
global counter, with reservation + commit semantics so the actual model
cost — known only after streaming completes — reconciles against the
reservation taken at request entry.
The audit (2026-05-27) flagged this correctly: the in-memory store does not survive across Vercel function instances. Two requests landing on two warm Lambdas at once can each reserve up to the cap.
I decided to ship it anyway.
What I asked for before deciding
For an upgrade to a distributed store (Upstash Redis is the obvious fit on Vercel) I needed:
- The actual traffic shape. A portfolio site gets, on average, ~0 simultaneous AI requests. The relevant scenario isn't normal traffic; it's "what happens if this site hits HN front page". That's a real failure mode but its frequency over a year is low.
- The actual cost ceiling. The daily budget is set at 1/day across simultaneous warm Lambdas. On Vercel a portfolio rarely keeps more than 1–2 warm at once. Worst plausible overshoot: $2–3 on a viral day. Acceptable as a one-off; would not be acceptable as a steady state.
- The migration shape. What does the code look like with Upstash? How much of the current API surface changes?
What I found
Traffic — not yet a problem
Realistic load: handful of recruiter visits per week, a few chat exchanges per visit. Multi-instance fanout is a non-issue because multi-instance doesn't activate. The audit-flagged bypass scenario needs concurrent traffic on multiple warm Lambdas, which doesn't happen organically here.
Cost — bounded by the daily cap
The daily cap is the actual circuit-breaker. The audit-flagged
overshoot is bounded above by N × cap_per_day. Set cap_per_day
to a budget I'd be willing to absorb on a viral day, and the
overshoot is by definition tolerable. If a future business decision
makes this site cost-sensitive, the cap is the lever, not the store.
Migration shape — drafted now, not shipped
The cost of leaving this for "later" is the cost of doing the
distributed work under pressure when traffic finally arrives. To
absorb that pressure I drafted a RateLimitStore interface and a
sketch of the Upstash REST-API adapter (no SDK dep needed — Upstash
exposes plain HTTP):
// src/lib/rate-limit-redis.ts (sketch)
export interface RateLimitStore {
reserve(key: string, cents: number): Promise<ReserveResult>;
commit(key: string, actualCents: number): Promise<void>;
}
The current in-memory store would adopt the same interface. Routes
switch from synchronous reserveBudget(3) to await reserveBudget(3)
behind a feature-flag env var; flip when needed; the rest of the code
is unchanged.
What I decided
Keep the in-memory store. Ship the interface. Document the migration trigger.
Triggers to flip to Upstash:
- Sustained concurrent traffic. Two or more warm instances serving AI requests at the same time, more days than not.
- A viral incident. One spike where the daily-cap × instance-count overshoot was painful in the bank statement, not just on paper.
- A second portfolio surface. If a future case study adds a third AI endpoint and the shared budget becomes shared across routes, the per-store-per-route accounting collapses.
Until then, the simpler code path wins:
- No external service in the request path → fewer failure modes, no Upstash latency on the cold path.
- No async refactor through three route handlers and the test suite.
- The cost ceiling is the daily cap, which is set at a number I'd be willing to absorb even at the worst-case overshoot.
The boring-tech move is keeping the in-memory store and writing this decision down so the next maintainer (probably me, a year from now) doesn't have to re-derive why.
The trigger I'd watch for
A single Vercel Function Log line:
event:"daily_budget_exhausted" firing more than once a week. That's
the structured event the new src/lib/observability.ts emits when a
reservation is denied because the global counter hit the cap. Once
that fires on a recurring basis it's no longer a worst-case scenario,
it's the median case, and the per-instance overshoot stops being a
rounding error.