A code janitor for AI coding agents. Point it at any repo and it draws an interactive architecture map, scores every module 0β100 for technical debt, and helps you pay down the cruft β incrementally, one commit at a time.
Every codebase accumulates cruft over time β monkeypatches, silent fallbacks, dead "legacy" paths, half-finished stubs, copy-pasted duplication, god-files, and valueless glue. codemap surfaces that rot, ranks it, and hands an AI agent a clear punch-list to fix it β with a regression-gated fix loop: a change is accepted only when an independent check shows your tests still pass.
codemap follows the open Agent Skills standard (a folder with a SKILL.md), now shared
by Claude Code, Codex, and Cursor. Clone it into the tool's skills/ folder and it
auto-discovers as /codemap β no extra config:
# Claude Code β global skills folder
git clone https://github.com/Asixa/codemap-skill ~/.claude/skills/codemap
# OpenAI Codex β global skills folder
git clone https://github.com/Asixa/codemap-skill ~/.codex/skills/codemap
# Cursor β global skills folder
git clone https://github.com/Asixa/codemap-skill ~/.cursor/skills/codemapRestart the tool (or start a new session) and type /codemap. All three also accept a
per-project install β drop the ~/ (e.g. .cursor/skills/codemap at the repo root).
Cursor additionally reads the Claude/Codex skills dirs, and Codex + Cursor also share a
tool-neutral ~/.agents/skills/, so one global install can cover several tools. An agent
without Skills support: clone it anywhere and point it at the repo's AGENTS.md.
The skill folder (the tool) is separate from each project's
<project>/.codemap/folder, where codemap writes its output.Windows PowerShell: replace
~with$env:USERPROFILE(e.g.$env:USERPROFILE\.claude\skills\codemap).
Most "architecture diagram" tools draw files and imports. codemap is different:
- Functional modules, not files. It groups code into the capabilities that actually matter (a store, a handler group, a feature, a plugin) and lays them out along the real data-flow.
- It grades the rot. Every module gets a health score (0β100) and grade (AβF) plus
concrete
file:linefindings, hunting specifically for the smells that make code unmaintainable:monkeypatch,fallback,silent-except,legacy/dead code,stub,fake-output,dual-format,bloat,duplication,glue,god-component, β¦ - Independent, honest scoring. Each module is audited by a separate AI subagent against a fixed rubric β no single pass rubber-stamping the whole repo.
- Incremental + git-aware. A per-module content hash + the last-run commit mean re-runs
only re-audit what changed, and
updateshows you the commits since last time and which modules they touched. - Regression-gated cleanup.
fixruns a four-role loop β lock a test baseline β fix β an independent acceptance check must show the pre-fix tests still pass β re-score.
It's the maintenance pass you never have time to do, turned into something an agent can run on a schedule.
What it is (and isn't). codemap is an agent-orchestration framework that makes the map + audit consistent and reviewable β deterministic scripts handle LoC, hashing, staleness, filtering and rendering, and a fixed rubric forces
file:lineevidence and an independent audit per module. But the module decomposition and the scores are model judgments, not the output of a deterministic static analyzer. Treat the map as a high-quality, reviewable starting point β and commitmodules.jsonso every score is diffable in PRs.
Want to see it before installing? Open
examples/sample-project/codemap.html β a
fully rendered demo (the sample used for the screenshots).
Click any module to highlight what it calls (downstream) and what depends on it
(upstream), with its score, smell tags, and file:line findings:
The Audit report β averages, grade spread, worst offenders, smell-tag frequency, and cross-cutting themes:
- Health vs coupling color modes β problems pop amber/red, healthy modules recede to a muted green (colorblind-friendly; the cue is saturation, not just hue).
- Filter by grade (β€ B/C/D/F) or by issue tag; jump straight to the worst offenders.
- Editable Standard page β change descriptions, add your own issue tags to capture
your definition of a problem, and Export to
standard.json; future audits use it. - i18n β English or Chinese UI (
meta.lang); module names are never translated. - Copy-fix button on each module β copies
/codemap fix <module>to paste into your agent.
Language-agnostic. LoC and hashing work on any text source and paths are plain globs,
so it covers Python, TypeScript/JS, Rust, C#/.NET, C/C++, Go, Java, Swift, and more.
- Python 3 β standard library only. No
pip install, no external packages. - An AI coding agent to drive the audit/fix/test steps β Claude Code, Codex, Cursor, or any agent that reads instructions and spawns sub-tasks (see Install).
- A browser to open the generated HTML. That's it.
Talk to your agent in plain language, or use the subcommands (shown as Claude Code slash
commands β say the same verb to any other agent). On the first run, codemap asks your
preferences (UI language, output location, project title) and saves them to
<project>/.codemap/config.json. Everything it produces lives in <project>/.codemap/.
| Command | Does |
|---|---|
/codemap init |
first build: ask prefs β decompose into modules β scan β audit every module β render |
/codemap check |
read-only: is the map stale? shows commits since last run + drifted / new / deleted modules |
/codemap update |
incremental + git-aware: re-audit only changed modules, re-render |
/codemap test <module> |
generate a regression-net of tests for a module |
/codemap fix <module> |
regression-gated cleanup: lock baseline β fix β independent acceptance β re-score |
modules.json ββscan.pyβββΆ + LoC, content hash & git diff (stale = hash != auditedHash)
β (decomposition + module descriptions: authored by the agent)
βββapply_audit.pyββ one INDEPENDENT subagent's score per module (fixed rubric)
βββquery.pyββββββββββ token-cheap targeting (by grade / tag / severity / staleness)
βββrender.pyβββββββββΆ codemap.html + codemap.md
modules.json is the source of truth (commit it for an audit history); the HTML/MD are
pure projections, regenerated by render.py. Four separate subagent roles, never
merged: auditor (scores), test-author (writes tests), fixer (changes code),
acceptance/verifier (proves no regression). Tests are the regression net and are kept
out of a module's own audit scope.
The scoring standard is data, not code (reference/standard.json: rubric, severities,
coupling, and issue tags with descriptions). Open the Standard page in the map β Edit
β tweak descriptions, add your own tags, then Export to
<project>/.codemap/standard.json. Custom tags flow through the whole map and are used by
future audits. The prose version + the exact subagent prompt live in reference/STANDARDS.md.
codemap/
SKILL.md # the orchestration the agent reads
AGENTS.md # entry point for agents without Skills support
README.md
LICENSE # MIT
reference/
STANDARDS.md # scoring rubric, smell taxonomy, severities, subagent prompts
DATA_MODEL.md # modules.json schema
standard.json # the machine-readable default standard (overridable per project)
scripts/ # deterministic, stdlib-only Python
scan.py # LoC + content hash + git diff + staleness
query.py # filter modules (grade/tag/severity/β¦) β ids/paths/findings
apply_audit.py # validate + merge one subagent's audit into the state
render.py # modules.json β HTML + report
assets/
template.html # the interactive map shell (data injected at render time)
tests/ # stdlib unittest golden tests for the scripts
examples/
01-map.png β¦ # the screenshots above
sample-project/ # a fully rendered demo (modules.json + codemap.html/md)
.github/workflows/test.yml # CI: py_compile + unittest + render + JS syntax check
MIT Β© 2026 Xingyu Chen.
Keywords: code quality Β· technical debt Β· refactoring Β· code janitor Β· legacy code cleanup Β· architecture visualization Β· dependency graph Β· static analysis Β· code audit Β· Claude Code skill Β· Codex Β· AI agents Β· code rot Β· cruft Β· code smells.


