Context
A public GraphQL read API on a stack drifting out of support. The resolver layer had calcified around earlier framework idioms; adding a field had become a coordination exercise. Fifty-plus consumers, web, native iOS, native Android, partner integrations, and an internal marketplace product, none coordinating releases with us.
The team wanted to upgrade the framework. Every in-place plan I sketched ended in a freeze, a maintenance window, or both. The previous attempt (eighteen months earlier, before my time) had stalled for the same reason.
The two options I weighed
| Option | Risk profile | Cost |
|---|---|---|
| In-place upgrade with feature flags | High, every consumer is one rollback away from breaking; framework upgrade and resolver-shape changes are coupled | Lower in plumbing; very high in coordination |
| Strangler-fig: build v2 in a new repo, run in parallel, cut over per consumer behind toggles | Low at every step, v1 keeps serving traffic; cutover is reversible by toggle | High in plumbing, re-implement DataLoaders, context, validation by hand |
The in-place option had two non-obvious risks. First: the framework upgrade and the consumer-visible behaviour were coupled. Flipping the framework also re-shaped resolver outputs in subtle ways (serialisation order, error envelopes, default field handling). Every consumer was implicitly betting that v2's outputs matched v1's, with no way to verify before the flip. Second: every previous in-place upgrade in the team's history had stalled because someone needed a freeze that nobody could grant.
The decision
Build v2 in a separate repository, on a different framework
(graphql-yoga + hono), as a sibling service to v1. No shared
code with v1. Run in parallel for as long as it takes to validate.
Cut over per consumer cohort behind feature toggles, with diff metrics
as the gate.
I reached this by inverting the risk question. Instead of "how do we make an in-place upgrade safe?" I asked "what would have to be true for a parallel-build to be wrong?" The honest answer was only the plumbing cost. I could carry that.
What I gave up
Re-implementing every DataLoader, every context binding, every runtime validation, by hand, in a new codebase, while v1 kept serving traffic. Six weeks of resolver-by-resolver porting before any of it was production-shaped. A build budget that didn't translate to visible feature progress for a quarter.
The cleaner-looking alternative, share types or clients between v1 and v2 to "make the migration easier", is the version of strangler-fig that fails. Every time I've seen it tried, the old code's design constraints leak into the new code. I forced the discipline of zero-shared-code from day one.
What played out
The framework upgrade and the consumer migration decoupled cleanly.
v2 could be wrong in dozens of ways without any consumer noticing,
because v1 was still serving traffic. Once the
shadow-diff pipeline landed in month
three, the divergences it surfaced were the kind no parity test would
have caught: the adTargeting JSON-string key-ordering quirk, the
price-info field-shape variance, a handful of locale-specific edge
cases.
The plumbing cost was real. Phase 1 produced no user-visible feature work for three weeks. The kind of decision a less experienced team might have rolled back inside the first month. I'd defend it again.
What I'd do differently
Build the diff pipeline in week one, not week ten. The strangler-fig framing was correct, but the confidence engine (the v1↔v2 diff) should land before the resolvers, not after. Six weeks of resolver work without anything production-shaped to validate against meant the parity tests carried more weight than they deserved. Pointing the diff at shadow-shaped staging data from day one would have surfaced the worst divergences two months earlier.