git-xray-tools: fourteen CLI tools that read your git history, Apoorv Mittal

Every engineering team carries invisible debt that ordinary metrics miss.

Coverage tells you how much code is tested. It says nothing about which files change every week and keep breaking. Linting enforces style. It cannot tell you that two supposedly independent services change together 85% of the time. Code review catches bugs. It cannot tell you that the file currently under review has been modified 300 times in two years and is the single most dangerous file in the repository.

git-xray-tools reads the one source of ground truth every team generates but almost no one analyses systematically: the commit history.

Fourteen CLI tools. Each answers one question with a defensible number you can show in a sprint planning meeting. No LLM in the loop. No server. No data leaves the machine. Apache-style discipline: every tool has a wizard, scripted mode, JSON output for pipelines, and a self-contained HTML report you can email to the team.

Available on npm as git-xray-tools and on GitHub at cloudhobbit/git-xray.

Why bash, not Python or Go

The thing every static-analysis tool quietly fails at is being installed. Pick Python and you're fighting pyenv on day one. Pick Go and you're shipping a 12 MB binary per arch. Pick Node and you've inherited a dependency tree the team will eventually have to audit.

I picked bash + jq. Both are present on every developer machine that has git itself. The install story is npm install -g git-xray-tools and the only runtime checks are git and jq; the postinstall script prints clear instructions if either is missing.

The trade-off is real. Bash is not pleasant to write a 14-tool suite in. But the runtime is portable, the tools chain together over pipes naturally, and there's no version-skew between Python or Node that the user has installed and the version the tool expects.

The fourteen tools, by question they answer

Tool	Question
`git-churn`	Which files change most often?
`git-cochange`	Which files always change together (hidden coupling)?
`git-orphans`	Which files haven't been touched in N days?
`git-hotspots`	Which files are both large and frequently changed?
`git-age`	How many days since each file was last committed?
`git-authors`	Which files have bus factor of 1?
`git-fixratio`	Which files attract the most bug-fix commits?
`git-velocity`	Has commit pace accelerated or decelerated?
`git-blame-summary`	Who owns the lines of code that exist today?
`git-commits`	How disciplined is commit hygiene?
`git-test-ratio`	Which source files have no test coverage proxy in git history?
`git-branches`	Which branches are ghost WIP or overloading specific authors?
`git-intel`	All of the above, fused into one HTML report
`git-healthscore`	A single 0–100 score + the top three actions to take

The same shape applies to every tool: wizard for first-run exploration, --yes for scripted CI, --format json for piping, --format html for an artifact the team can read.

What "team behaviour" actually looks like in git

A single example. git-cochange reads every commit's changed-file list, records which file pairs appeared together, and computes a coupling ratio: how often did file A and file B both appear in the same commit, out of all commits that touched either?

What this surfaces, in real codebases I've run it against:

auth/session.ts and payments/checkout.ts co-change 85% of the time. The architecture diagram says these are separate services. Git says they're one service split across two folders.
i18n/dictionaries/en.ts and i18n/dictionaries/de.ts co-change 100% of the time, by design. Add --exclude '(^|/)locales/' to the run and the signal gets clean.
Two files in apps/admin/ co-change 70% of the time with three files in services/inventory/. Translation: "Admin" is downstream of "Inventory" in a way that no import graph captures, and every inventory schema change quietly forces an admin change.

The point isn't that git is the ultimate source of truth about architecture. The point is that git records what actually happened, not what was supposed to happen.

One tool in detail: git-intel

git-intel is the unified report — the single command for a codebase you've never seen before. It runs ten analyses in parallel and combines them into one interactive HTML file with scatter plots, ranking tables, and cluster cards.

What it surfaces:

Blast Radius — hotspot score amplified by coupling weight. The files most dangerous to touch because they're also coupled to half the codebase.
Frozen Hotspots — files that churned heavily once, then went completely silent. Probable fear-driven abandonment: "nobody wants to touch the auth flow any more."
Coupling Clusters — connected-component analysis of the co-change graph. Surfaces real module groupings the architecture diagram doesn't show.
Knowledge Risk — bus factor per file, weighted by churn.
Bug Magnets — files where fix commits dominate the history, with a delta metric flagging the ones getting worse.
Orphan Candidates — files past --orphan-days (default 365).
Test Coverage Gaps — high-churn source files with no test commits.
Team Velocity — monthly histogram with accelerating / stable / decelerating verdict.
Line Ownership — opt-in via --include-blame; surviving-line ownership per author.

The HTML report is self-contained: no CDN, no fonts loaded from elsewhere, no external CSS or JS. The team can email it, attach it to a JIRA ticket, archive it as a build artifact.

The bug I'm proudest of fixing

The first HTML template was a 1.3 kilo-line bash heredoc. Bash heredocs are syntactically awful, escaping is fragile, and the template grew. Eventually I needed to render lists with hundreds of rows per section, and the obvious approach — innerHTML += in a for loop — turned out to be O(N²) because every assignment re-parses the entire DOM string.

On a real-world repo with 700 orphan candidates and 400 test gaps, the HTML report took 18 seconds to render in the browser. Useless as a sharable artifact.

The fix was two changes, neither glamorous:

Extract the template out of the bash heredoc into its own file, so the bash code stays readable and the HTML can be edited as HTML.
Replace innerHTML += with array.push() then a single el.innerHTML = arr.join('') at the end. The runtime went from 18 seconds to 280 milliseconds. The git-intel commit message reads: "perf(intel-html): replace O(N²) innerHTML += in OR/TG loops with array.join".

This is the kind of bug that doesn't appear in any code review until someone runs the report on a real production codebase. It's the kind of bug a co-change tool would predict, if you ran one on the bash script that produced it.

What 14 tools needed in common, and how the suite stays composable

Every tool resolved into the same four phases:

Argument parsing — repo path, --since, --branch, --author, --exclude, output format, output path. Lives in lib/args-common.sh, shared by every tool.
Git log mining — the same handful of git log invocations, parametrised by filter flags. Lives in lib/git-helpers.sh.
Analysis — the tool-specific logic. The shortest is git-age at ~80 lines; the longest is git-intel at ~700.
Rendering — JSON, table, CSV, HTML. Shared rendering helpers in lib/render-*.sh for each format.

Result: adding a fifteenth tool is mostly writing the analysis phase. The shape of each tool — wizard, --yes, four output formats, JSON schema — was designed once and copied.

Outcomes

Outcome	Note
14 tools shipped	npm `git-xray-tools` v0.1.0, MIT licence
Days from scaffold to release-ready	8 (May 2 → May 10, 2026), 51 commits
Runtime model calls	0. Deterministic analysis only
Output formats per tool	4: JSON, table, CSV, HTML
HTML report rendering time (700-orphan repo)	18 s → 280 ms after the O(N²) fix
Lines of bash extracted from the largest heredoc	~1,300, into a standalone template
Configurable filters	`--since`, `--branch`, `--path`, `--author`, `--exclude`
CI integration	`--yes` reads `.git-<tool>.json`, exits 1 on threshold

What I'd do differently

I'd write the JSON schema first. Three tools shipped before I realised the JSON output shapes were drifting. git-churn had files: [...], git-velocity had buckets: [...], git-cochange had pairs: [...]. Reasonable per-tool, but the user-facing surface is "pipe one tool into another via jq" — and a stable, documented schema should have been step zero, not a discovered retrofit.

I'd pick a TypeScript-on-Node rewrite path earlier. Bash works. Bash will keep working. But every contributor onto this codebase has the same first reaction ("this is bash?"), and the test runner I wrote is a smaller version of every test framework. The right call may be a clean rewrite in TypeScript once the API surface stabilises in v1.0, keeping the npm distribution story intact.

I'd ship the HTML report from week one. I shipped JSON and table first, on the assumption that "real users" wanted scripted output. The single most-loved feature, in conversations after I open-sourced the suite, has been the HTML report. People want artifacts they can email. Lesson: the artifact matters more than the format.

Every engineering team carries invisible debt that ordinary metrics miss.

git-xray-tools reads the one source of ground truth every team generates but almost no one analyses systematically: the commit history.

Available on npm as git-xray-tools and on GitHub at cloudhobbit/git-xray.

Why bash, not Python or Go

The fourteen tools, by question they answer

Tool	Question
`git-churn`	Which files change most often?
`git-cochange`	Which files always change together (hidden coupling)?
`git-orphans`	Which files haven't been touched in N days?
`git-hotspots`	Which files are both large and frequently changed?
`git-age`	How many days since each file was last committed?
`git-authors`	Which files have bus factor of 1?
`git-fixratio`	Which files attract the most bug-fix commits?
`git-velocity`	Has commit pace accelerated or decelerated?
`git-blame-summary`	Who owns the lines of code that exist today?
`git-commits`	How disciplined is commit hygiene?
`git-test-ratio`	Which source files have no test coverage proxy in git history?
`git-branches`	Which branches are ghost WIP or overloading specific authors?
`git-intel`	All of the above, fused into one HTML report
`git-healthscore`	A single 0–100 score + the top three actions to take

The same shape applies to every tool: wizard for first-run exploration, --yes for scripted CI, --format json for piping, --format html for an artifact the team can read.

What "team behaviour" actually looks like in git

What this surfaces, in real codebases I've run it against:

auth/session.ts and payments/checkout.ts co-change 85% of the time. The architecture diagram says these are separate services. Git says they're one service split across two folders.
i18n/dictionaries/en.ts and i18n/dictionaries/de.ts co-change 100% of the time, by design. Add --exclude '(^|/)locales/' to the run and the signal gets clean.
Two files in apps/admin/ co-change 70% of the time with three files in services/inventory/. Translation: "Admin" is downstream of "Inventory" in a way that no import graph captures, and every inventory schema change quietly forces an admin change.

The point isn't that git is the ultimate source of truth about architecture. The point is that git records what actually happened, not what was supposed to happen.

One tool in detail: git-intel

What it surfaces:

Blast Radius — hotspot score amplified by coupling weight. The files most dangerous to touch because they're also coupled to half the codebase.
Frozen Hotspots — files that churned heavily once, then went completely silent. Probable fear-driven abandonment: "nobody wants to touch the auth flow any more."
Coupling Clusters — connected-component analysis of the co-change graph. Surfaces real module groupings the architecture diagram doesn't show.
Knowledge Risk — bus factor per file, weighted by churn.
Bug Magnets — files where fix commits dominate the history, with a delta metric flagging the ones getting worse.
Orphan Candidates — files past --orphan-days (default 365).
Test Coverage Gaps — high-churn source files with no test commits.
Team Velocity — monthly histogram with accelerating / stable / decelerating verdict.
Line Ownership — opt-in via --include-blame; surviving-line ownership per author.

The HTML report is self-contained: no CDN, no fonts loaded from elsewhere, no external CSS or JS. The team can email it, attach it to a JIRA ticket, archive it as a build artifact.

The bug I'm proudest of fixing

On a real-world repo with 700 orphan candidates and 400 test gaps, the HTML report took 18 seconds to render in the browser. Useless as a sharable artifact.

The fix was two changes, neither glamorous:

Extract the template out of the bash heredoc into its own file, so the bash code stays readable and the HTML can be edited as HTML.
Replace innerHTML += with array.push() then a single el.innerHTML = arr.join('') at the end. The runtime went from 18 seconds to 280 milliseconds. The git-intel commit message reads: "perf(intel-html): replace O(N²) innerHTML += in OR/TG loops with array.join".

What 14 tools needed in common, and how the suite stays composable

Every tool resolved into the same four phases:

Argument parsing — repo path, --since, --branch, --author, --exclude, output format, output path. Lives in lib/args-common.sh, shared by every tool.
Git log mining — the same handful of git log invocations, parametrised by filter flags. Lives in lib/git-helpers.sh.
Analysis — the tool-specific logic. The shortest is git-age at ~80 lines; the longest is git-intel at ~700.
Rendering — JSON, table, CSV, HTML. Shared rendering helpers in lib/render-*.sh for each format.

Result: adding a fifteenth tool is mostly writing the analysis phase. The shape of each tool — wizard, --yes, four output formats, JSON schema — was designed once and copied.

Outcomes

Outcome	Note
14 tools shipped	npm `git-xray-tools` v0.1.0, MIT licence
Days from scaffold to release-ready	8 (May 2 → May 10, 2026), 51 commits
Runtime model calls	0. Deterministic analysis only
Output formats per tool	4: JSON, table, CSV, HTML
HTML report rendering time (700-orphan repo)	18 s → 280 ms after the O(N²) fix
Lines of bash extracted from the largest heredoc	~1,300, into a standalone template
Configurable filters	`--since`, `--branch`, `--path`, `--author`, `--exclude`
CI integration	`--yes` reads `.git-<tool>.json`, exits 1 on threshold