-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Background
This issue evaluates three tools proposed for code quality: Pylint (design smell checks), Radon/Xenon (complexity budgets), and mutmut (mutation testing). Based on the current state of the codebase, only the first two are recommended now. mutmut is explicitly deferred — see rationale below.
Depends on / complements #84 (ruff migration).
Current State
Coverage
- parser-core: 92% (gate: 91%) — solid but not near-complete
- Known weak spot:
exclusion_detector.py~59% (pdfplumber API dependency)
Existing lint tooling
black/isort/flake8/mypyin CI (being replaced by ruff + mypy in tooling: replace black/isort/flake8 with ruff in CI and dev dependencies #84)- No complexity analysis, no design smell checks, no mutation testing
- No
[tool.pylint],[tool.radon], or[tool.xenon]config exists anywhere
Complexity hotspots already visible in codebase
A branch-count scan of packages/parser-core/src surfaced these functions exceeding 8 branches:
| Branches | Args | Function | File |
|---|---|---|---|
| 27 | 2 | _detect_text_based_table() |
analysis/table_detector.py |
| 22 | 3 | init_directories() |
commands/init.py |
| 17 | 3 | get_detection_explanation() |
templates/template_detector.py |
| 16 | 2 | _detect_recurring_charges() |
services/expense_analysis.py |
| 15 | 1 | analyze() |
commands/analyze_pdf.py |
| 14 | 3 | detect_template() |
templates/template_detector.py |
| 14 | 4 | detect() |
templates/detectors/iban_detector.py |
| 14 | 3 | merge_continuation_lines() |
services/row_merger.py |
| 13 | 3 | _assign_column_names() |
analysis/column_analyzer.py |
| 13 | 2 | from_multiple_directories() |
templates/template_registry.py |
These are real refactor candidates — not hypothetical. A complexity gate in CI would prevent new functions from silently joining this list.
Tool 1 — Pylint (design checks only)
Recommendation: Add, scoped to R-category codes only
Pylint has broad overlap with Ruff for style and lint. Running the full suite alongside Ruff creates redundancy and noise. The non-overlapping value is its design metric checks (R-category / refactor codes):
| Pylint check | Default threshold | What it catches |
|---|---|---|
too-many-branches |
>12 | Functions like _detect_text_based_table (27 branches) |
too-many-statements |
>50 | Overly long functions |
too-many-locals |
>15 | Functions juggling too much state |
too-many-arguments |
>5 | Interfaces with too many parameters |
too-many-public-methods |
>20 | Classes doing too much |
too-many-instance-attributes |
>7 | Data classes masquerading as services |
Run mode: pylint src --disable=all --enable=R — disables all style/convention/warning codes that Ruff already covers; enables only design smell detection.
What is NOT being replaced
Pylint runs alongside Ruff for a different purpose:
- Ruff: style, formatting, imports, common lint patterns (fast, per-line)
- Pylint: structural design metrics (slower, per-function/class)
Impact
- No change to formatting or import behaviour
- Flags the 20+ functions already identified as complexity hotspots
- CI fails on new functions exceeding thresholds (prevents regressions)
- Existing violations need a one-time baseline suppression pass or refactor
Tool 2 — Radon + Xenon (complexity budgets)
Recommendation: Add both
Radon computes:
- Cyclomatic complexity (CC): number of linearly independent paths through a function. Grade A (1-5) to F (>25)
- Halstead metrics: volume, effort, estimated bug count based on operators/operands
- Maintainability Index (MI): composite score 0-100; below 25 = unmaintainable
Xenon wraps Radon with CI thresholds — exits non-zero when complexity exceeds configured budgets.
Proposed Xenon thresholds for CI
xenon --max-absolute C --max-modules B --max-average A packages/parser-core/src
Meaning:
- No single function worse than grade C (CC 11-15) — lenient enough to not retroactively fail
- No module average worse than grade B (CC 6-10)
- No project average worse than grade A (CC 1-5)
The threshold is intentionally set to catch new regressions, not fail on existing hotspots. Existing violations (grade D-F) are tracked separately for refactoring.
Radon in CI (reporting only, non-blocking)
- name: Radon complexity report (C-grade and worse)
run: radon cc packages/parser-core/src -s -n CProduces a human-readable report as a CI step summary — useful for tracking debt trend over time without blocking the build.
Tool 3 — mutmut (mutation testing)
Recommendation: Defer — not yet appropriate
Mutation testing works by injecting small code changes (mutations) and verifying the test suite detects them. It is most valuable when:
- Coverage is near-complete (98%+) — at 92%, untested lines produce trivially surviving mutants with no signal value
- The test suite is stable and fast — mutmut reruns the full suite for every mutation; 1,420 tests against 16,560 lines takes 30-90 minutes
- Known weak spots are closed —
exclusion_detector.pyat ~59% would generate large volumes of false-signal survivors
Revisit when:
- Coverage is consistently above 95% across all modules (not just the aggregate)
exclusion_detector.pycoverage gap is closed (feat: Docker integration test — run snapshot test against built image in CI #59 / pdfplumber dependency resolved)- CI run-time budget allows an optional long-running quality job
This is not a rejection — it is the right tool at the wrong time. A comment will be added to this issue when the preconditions are met.
What Is Not Being Replaced
| Tool | Status | Reason |
|---|---|---|
mypy |
Kept | Static type checker — different domain from design metrics; already in CI |
bandit / safety |
Kept | Security scanning — separate concern |
stubtest |
Kept | pdfplumber stub validation — no Radon/Pylint equivalent |
ruff |
Kept | Style/lint/format — see #84 |
Files to Change
| File | Change |
|---|---|
packages/parser-core/pyproject.toml |
Add [tool.pylint] config (R-codes only, custom thresholds); add pylint, radon, xenon to [dev] deps |
.github/workflows/ci.yml |
Add complexity job (radon report + xenon gate); extend lint-core with scoped pylint step; add complexity to ci-pass needs |
Makefile |
Add make complexity (radon + xenon) and make design-check (pylint R-codes) targets |
CONTRIBUTING.md |
Document complexity thresholds, how to run locally, and how to suppress legitimate violations |
Proposed CI additions
New complexity job:
complexity:
name: Complexity — radon + xenon
runs-on: ubuntu-latest
timeout-minutes: 5
needs: changes
if: needs.changes.outputs.core == 'true'
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: '3.12'
- run: pip install radon xenon
- name: Radon complexity report (C-grade and worse)
run: radon cc packages/parser-core/src -s -n C
- name: Xenon complexity gate
run: xenon --max-absolute C --max-modules B --max-average A packages/parser-core/srcExtension to lint-core:
- name: Pylint design checks (R-codes only)
run: pylint packages/parser-core/src --disable=all --enable=R --fail-under=8.0Test Plan
- Run
radon cc src -s -n Clocally — review all C-grade-and-worse functions, document known violations - Run
xenon --max-absolute C --max-modules B --max-average A srclocally — confirm passes with the proposed lenient initial thresholds - Run
pylint src --disable=all --enable=Rlocally — suppress legitimate exceptions with# pylint: disable=too-many-brancheswhere justified (e.g. dispatch tables), fix the rest - Run full test suite (
pytest) — confirm 1420+ tests still pass (no production code changes required for this issue) - Push to PR branch — confirm new CI jobs (
complexity, updatedlint-core) go green - Confirm
ci-passrequired status check still passes after addingcomplexityto itsneedslist - Update
CONTRIBUTING.mdwith complexity threshold documentation andmake complexityinstructions - Update
MEMORY.mdto document complexity tooling and the known hotspot files
Out of Scope
- Refactoring existing high-complexity functions — separate work; this issue only adds the gate
- mutmut — deferred; will be revisited when coverage exceeds 95% consistently
- Applying complexity checks to
parser-free— minimal logic, low value