Two codebases, no shared code, no shared domain, no shared tech stack. One is
a Next.js 14 Pages Router app serving millions of listings, Zustand state,
SCSS modules, Optimizely A/B tests, i18n managed via Phrase. The other is a
multi-tenant institutional ERP with 294 view components, 239 database tables,
a Hono backend with 74 route files, four languages including Bengali and
Gujarati, and a 14-role RBAC model where every query must carry a tenant id
or it doesn't ship.
I built a Claude Code layer for both. Same core idea: a set of skills, commands, hooks, and memory files that turn Claude Code from a capable assistant into something closer to a second engineer who already knows the codebase's rules. By the time I was working across both projects in parallel, I had 37 commands in the marketplace product, 47 skills in the ERP, and 14 hooks in each. This is what I learned sorting out which pieces travelled and which ones had to stay home.
What travelled without modification
The session-boundary hooks moved verbatim. A SessionStart hook that runs
git diff --name-only main and prints the changed files costs thirty lines of
shell. I wrote it once for the marketplace product and dropped it into the ERP the
same afternoon. Same with the Stop hook that runs tsc --noEmit, if the
session ends and the build is broken, I want to know before I close the
terminal. Neither hook knows what the project is about. They just enforce a
discipline that applies everywhere.
The memory structure pattern transferred completely. Both repos now have a
memories/repo/ directory with four files: modules.md (what each part of
the codebase owns), decisions.md (architecture calls and why they were made),
conventions.md (patterns discovered by reading, not documented anywhere),
and known-flakes.md (a registry of test failures that are noise, not signal).
Claude reads these at the start of relevant sessions. The structure is
identical; the contents are obviously not.
The linting post-hooks (Prettier, ESLint, Stylelint) moved with zero changes. Formatting a file after an edit is mechanical and project-agnostic. The point is that the lowest-level hooks, the ones closest to the file system are the most portable by a wide margin.
What needed a rewrite at the seam
The implement command in the marketplace product is a nine-phase orchestrator: read
the ticket, audit the affected area, plan, scaffold, implement, self-review
against the project rules, fix, typecheck, summarise. It grew out of watching
Claude take shortcuts on step three and spend an hour untangling the
consequences of skipping an audit. The command enforces the audit. It works
well when the rules fit in one place, the marketplace product's CLAUDE.md is 187
lines and covers everything.
The ERP's surface area is roughly six times larger. Trying to maintain one
CLAUDE.md for 239 tables and 58 architectural rules wasn't viable after month
two. I switched to 58 separate rule files in .claude/rules/, each covering
one concern (sql-safety.md, fee-ledger.md, mutation-wiring.md,
state-machines.md, and so on), with @-imports pulling in only the rules
relevant to the current task. The implement concept survived; the file
structure that supports it had to be rebuilt from scratch.
The same split happened with hooks. In the marketplace product, the post-tool-use
hooks are about code style: ESLint, Prettier, Stylelint, then tsc. In the
ERP, those run first, and then eight domain-validator hooks run after every
file write: check-ddl-safety.sh (no unsafe migration patterns),
validate-service-pattern.sh (every service file must implement both the local
PGlite adapter and the remote Hono adapter), check-seed-version-sync.sh
(SEED_VERSION must stay in sync after schema changes), detect-dead-buttons.sh
(no UI button left without a wired handler), and four more. You don't write
those for a marketplace product. The ERP's failure modes are different, a
misconfigured service adapter silently falls back to local data in production;
a dead button in a fee-payment form is a support ticket.
What I deleted and rebuilt as something better
The marketplace product has three A/B test commands: ab-test-kickoff.md,
ab-test-health.md, ab-test-wrapup.md. They know about the experiment
framework's IDs, the internal flag-naming convention, and the cleanup pattern
when an experiment concludes. They're genuinely useful on that codebase. They're
completely meaningless anywhere else.
I made the mistake of trying to generalise them. I spent an afternoon writing a "generic experiment workflow" skill that accepted the A/B framework as a parameter. Nobody used it because it had no opinions, and skills without opinions aren't useful, they just become a more verbose way to write a prompt. The right lesson was: keep the experiment skills in the marketplace product, accept that the ERP's equivalent problem (feature rollouts behind RBAC gates) needs its own dedicated tooling, and stop trying to make one thing do both jobs.
The marketplace product's scan-stores.md command is a similar story. It audits
Zustand store definitions for common problems: stale selectors, missing
shallow equality checks, state that belongs in a URL instead. It's a good
command. The ERP uses TanStack Query for server state and Zustand only for
ephemeral UI state. A wholesale port would have been useless. I wrote a much
narrower refactor-state.md skill that handles the patterns that actually come
up there, then moved on.
The one structural decision that changed everything
The marketplace product's skills are skills. The ERP's most important workflows are
subagents: bulk-module-scaffold.md, port-to-backend.md,
extract-queries.md. The distinction matters.
A skill is invoked by a human, runs a contained workflow, and stops. A subagent
is spawned by another agent, has a defined scope, uses a constrained tool set,
and reports back. port-to-backend.md creates a Hono route, wires up the
middleware stack, updates the service file with a dual-mode adapter, and
returns a summary. It runs inside a larger orchestration that might be porting
five service files in one session. Trying to do that with a skill and a human
in the loop between each step doesn't scale.
The marketplace product doesn't need subagents because its tasks don't compose that way. But in a project with 74 backend route files and a strict adapter pattern that every one of them must follow, the ability to delegate a well-defined sub-task to an agent with a limited tool allowlist is the difference between orchestrating a migration in two hours and babysitting it for two days.
The part I get wrong every time
I still don't write eval sets early enough. Neither project has an __evals__/
directory. The hooks catch regressions at the file level; nothing catches the
model drifting on a skill's reasoning over time after a version update. I know
this is the gap. I've written about it before. I'm still not fixing it fast
enough.
The rest of the tooling is earning its keep. The session-boundary hooks, the memory files, the domain-validator hooks in the ERP, these have all paid back the time it took to write them. The things that failed were either too generic to be useful, or too specific to one codebase to survive the move. The skill that travels is the one with strong opinions that happen to be correct for the problem in front of it. That's not a rule about AI tooling. That's just a rule.