37 commands, 47 skills, one lesson about portable AI tooling, Apoorv Mittal

Two codebases, no shared code, no shared domain, no shared tech stack. One is a Next.js 14 Pages Router app serving millions of listings, Zustand state, SCSS modules, Optimizely A/B tests, i18n managed via Phrase. The other is a multi-tenant institutional ERP with 294 view components, 239 database tables, a Hono backend with 74 route files, four languages including Bengali and Gujarati, and a 14-role RBAC model where every query must carry a tenant id or it doesn't ship.

I built a Claude Code layer for both. Same core idea: a set of skills, commands, hooks, and memory files that turn Claude Code from a capable assistant into something closer to a second engineer who already knows the codebase's rules. By the time I was working across both projects in parallel, I had 37 commands in the marketplace product, 47 skills in the ERP, and 14 hooks in each. This is what I learned sorting out which pieces travelled and which ones had to stay home.

What travelled without modification

The session-boundary hooks moved verbatim. A SessionStart hook that runs git diff --name-only main and prints the changed files costs thirty lines of shell. I wrote it once for the marketplace product and dropped it into the ERP the same afternoon. Same with the Stop hook that runs tsc --noEmit, if the session ends and the build is broken, I want to know before I close the terminal. Neither hook knows what the project is about. They just enforce a discipline that applies everywhere.

The memory structure pattern transferred completely. Both repos now have a memories/repo/ directory with four files: modules.md (what each part of the codebase owns), decisions.md (architecture calls and why they were made), conventions.md (patterns discovered by reading, not documented anywhere), and known-flakes.md (a registry of test failures that are noise, not signal). Claude reads these at the start of relevant sessions. The structure is identical; the contents are obviously not.

The linting post-hooks (Prettier, ESLint, Stylelint) moved with zero changes. Formatting a file after an edit is mechanical and project-agnostic. The point is that the lowest-level hooks, the ones closest to the file system are the most portable by a wide margin.

What needed a rewrite at the seam

The implement command in the marketplace product is a nine-phase orchestrator: read the ticket, audit the affected area, plan, scaffold, implement, self-review against the project rules, fix, typecheck, summarise. It grew out of watching Claude take shortcuts on step three and spend an hour untangling the consequences of skipping an audit. The command enforces the audit. It works well when the rules fit in one place, the marketplace product's CLAUDE.md is 187 lines and covers everything.

The ERP's surface area is roughly six times larger. Trying to maintain one CLAUDE.md for 239 tables and 58 architectural rules wasn't viable after month two. I switched to 58 separate rule files in .claude/rules/, each covering one concern (sql-safety.md, fee-ledger.md, mutation-wiring.md, state-machines.md, and so on), with @-imports pulling in only the rules relevant to the current task. The implement concept survived; the file structure that supports it had to be rebuilt from scratch.

The same split happened with hooks. In the marketplace product, the post-tool-use hooks are about code style: ESLint, Prettier, Stylelint, then tsc. In the ERP, those run first, and then eight domain-validator hooks run after every file write: check-ddl-safety.sh (no unsafe migration patterns), validate-service-pattern.sh (every service file must implement both the local PGlite adapter and the remote Hono adapter), check-seed-version-sync.sh (SEED_VERSION must stay in sync after schema changes), detect-dead-buttons.sh (no UI button left without a wired handler), and four more. You don't write those for a marketplace product. The ERP's failure modes are different, a misconfigured service adapter silently falls back to local data in production; a dead button in a fee-payment form is a support ticket.

What I deleted and rebuilt as something better

The marketplace product has three A/B test commands: ab-test-kickoff.md, ab-test-health.md, ab-test-wrapup.md. They know about the experiment framework's IDs, the internal flag-naming convention, and the cleanup pattern when an experiment concludes. They're genuinely useful on that codebase. They're completely meaningless anywhere else.

I made the mistake of trying to generalise them. I spent an afternoon writing a "generic experiment workflow" skill that accepted the A/B framework as a parameter. Nobody used it because it had no opinions, and skills without opinions aren't useful, they just become a more verbose way to write a prompt. The right lesson was: keep the experiment skills in the marketplace product, accept that the ERP's equivalent problem (feature rollouts behind RBAC gates) needs its own dedicated tooling, and stop trying to make one thing do both jobs.

The marketplace product's scan-stores.md command is a similar story. It audits Zustand store definitions for common problems: stale selectors, missing shallow equality checks, state that belongs in a URL instead. It's a good command. The ERP uses TanStack Query for server state and Zustand only for ephemeral UI state. A wholesale port would have been useless. I wrote a much narrower refactor-state.md skill that handles the patterns that actually come up there, then moved on.

The one structural decision that changed everything

The marketplace product's skills are skills. The ERP's most important workflows are subagents: bulk-module-scaffold.md, port-to-backend.md, extract-queries.md. The distinction matters.

A skill is invoked by a human, runs a contained workflow, and stops. A subagent is spawned by another agent, has a defined scope, uses a constrained tool set, and reports back. port-to-backend.md creates a Hono route, wires up the middleware stack, updates the service file with a dual-mode adapter, and returns a summary. It runs inside a larger orchestration that might be porting five service files in one session. Trying to do that with a skill and a human in the loop between each step doesn't scale.

The marketplace product doesn't need subagents because its tasks don't compose that way. But in a project with 74 backend route files and a strict adapter pattern that every one of them must follow, the ability to delegate a well-defined sub-task to an agent with a limited tool allowlist is the difference between orchestrating a migration in two hours and babysitting it for two days.

The part I get wrong every time

I still don't write eval sets early enough. Neither project has an __evals__/ directory. The hooks catch regressions at the file level; nothing catches the model drifting on a skill's reasoning over time after a version update. I know this is the gap. I've written about it before. I'm still not fixing it fast enough.

The rest of the tooling is earning its keep. The session-boundary hooks, the memory files, the domain-validator hooks in the ERP, these have all paid back the time it took to write them. The things that failed were either too generic to be useful, or too specific to one codebase to survive the move. The skill that travels is the one with strong opinions that happen to be correct for the problem in front of it. That's not a rule about AI tooling. That's just a rule.

What travelled without modification

What needed a rewrite at the seam

What I deleted and rebuilt as something better

The one structural decision that changed everything

The marketplace product's skills are skills. The ERP's most important workflows are subagents: bulk-module-scaffold.md, port-to-backend.md, extract-queries.md. The distinction matters.

The part I get wrong every time