tooling: add Pylint design checks + Radon/Xenon complexity budgets to CI

## Background

This issue evaluates three tools proposed for code quality: **Pylint** (design smell checks), **Radon/Xenon** (complexity budgets), and **mutmut** (mutation testing). Based on the current state of the codebase, only the first two are recommended now. mutmut is explicitly deferred — see rationale below.

Depends on / complements #84 (ruff migration).

---

## Current State

### Coverage
- parser-core: **92%** (gate: 91%) — solid but not near-complete
- Known weak spot: `exclusion_detector.py` ~59% (pdfplumber API dependency)

### Existing lint tooling
- `black` / `isort` / `flake8` / `mypy` in CI (being replaced by ruff + mypy in #84)
- No complexity analysis, no design smell checks, no mutation testing
- No `[tool.pylint]`, `[tool.radon]`, or `[tool.xenon]` config exists anywhere

### Complexity hotspots already visible in codebase

A branch-count scan of `packages/parser-core/src` surfaced these functions exceeding 8 branches:

| Branches | Args | Function | File |
|----------|------|----------|------|
| 27 | 2 | `_detect_text_based_table()` | `analysis/table_detector.py` |
| 22 | 3 | `init_directories()` | `commands/init.py` |
| 17 | 3 | `get_detection_explanation()` | `templates/template_detector.py` |
| 16 | 2 | `_detect_recurring_charges()` | `services/expense_analysis.py` |
| 15 | 1 | `analyze()` | `commands/analyze_pdf.py` |
| 14 | 3 | `detect_template()` | `templates/template_detector.py` |
| 14 | 4 | `detect()` | `templates/detectors/iban_detector.py` |
| 14 | 3 | `merge_continuation_lines()` | `services/row_merger.py` |
| 13 | 3 | `_assign_column_names()` | `analysis/column_analyzer.py` |
| 13 | 2 | `from_multiple_directories()` | `templates/template_registry.py` |

These are real refactor candidates — not hypothetical. A complexity gate in CI would prevent new functions from silently joining this list.

---

## Tool 1 — Pylint (design checks only)

### Recommendation: Add, scoped to R-category codes only

Pylint has broad overlap with Ruff for style and lint. Running the full suite alongside Ruff creates redundancy and noise. The non-overlapping value is its design metric checks (R-category / refactor codes):

| Pylint check | Default threshold | What it catches |
|---|---|---|
| `too-many-branches` | >12 | Functions like `_detect_text_based_table` (27 branches) |
| `too-many-statements` | >50 | Overly long functions |
| `too-many-locals` | >15 | Functions juggling too much state |
| `too-many-arguments` | >5 | Interfaces with too many parameters |
| `too-many-public-methods` | >20 | Classes doing too much |
| `too-many-instance-attributes` | >7 | Data classes masquerading as services |

**Run mode**: `pylint src --disable=all --enable=R` — disables all style/convention/warning codes that Ruff already covers; enables only design smell detection.

### What is NOT being replaced
Pylint runs alongside Ruff for a different purpose:
- Ruff: style, formatting, imports, common lint patterns (fast, per-line)
- Pylint: structural design metrics (slower, per-function/class)

### Impact
- No change to formatting or import behaviour
- Flags the 20+ functions already identified as complexity hotspots
- CI fails on new functions exceeding thresholds (prevents regressions)
- Existing violations need a one-time baseline suppression pass or refactor

---

## Tool 2 — Radon + Xenon (complexity budgets)

### Recommendation: Add both

**Radon** computes:
- **Cyclomatic complexity** (CC): number of linearly independent paths through a function. Grade A (1-5) to F (>25)
- **Halstead metrics**: volume, effort, estimated bug count based on operators/operands
- **Maintainability Index** (MI): composite score 0-100; below 25 = unmaintainable

**Xenon** wraps Radon with CI thresholds — exits non-zero when complexity exceeds configured budgets.

### Proposed Xenon thresholds for CI

```
xenon --max-absolute C --max-modules B --max-average A packages/parser-core/src
```

Meaning:
- No single function worse than grade C (CC 11-15) — lenient enough to not retroactively fail
- No module average worse than grade B (CC 6-10)
- No project average worse than grade A (CC 1-5)

The threshold is intentionally set to catch **new** regressions, not fail on existing hotspots. Existing violations (grade D-F) are tracked separately for refactoring.

### Radon in CI (reporting only, non-blocking)

```yaml
- name: Radon complexity report (C-grade and worse)
  run: radon cc packages/parser-core/src -s -n C
```

Produces a human-readable report as a CI step summary — useful for tracking debt trend over time without blocking the build.

---

## Tool 3 — mutmut (mutation testing)

### Recommendation: Defer — not yet appropriate

Mutation testing works by injecting small code changes (mutations) and verifying the test suite detects them. It is most valuable when:
1. Coverage is near-complete (98%+) — at 92%, untested lines produce trivially surviving mutants with no signal value
2. The test suite is stable and fast — mutmut reruns the full suite for every mutation; 1,420 tests against 16,560 lines takes 30-90 minutes
3. Known weak spots are closed — `exclusion_detector.py` at ~59% would generate large volumes of false-signal survivors

**Revisit when:**
- Coverage is consistently above 95% across all modules (not just the aggregate)
- `exclusion_detector.py` coverage gap is closed (#59 / pdfplumber dependency resolved)
- CI run-time budget allows an optional long-running quality job

This is not a rejection — it is the right tool at the wrong time. A comment will be added to this issue when the preconditions are met.

---

## What Is Not Being Replaced

| Tool | Status | Reason |
|------|--------|--------|
| `mypy` | Kept | Static type checker — different domain from design metrics; already in CI |
| `bandit` / `safety` | Kept | Security scanning — separate concern |
| `stubtest` | Kept | pdfplumber stub validation — no Radon/Pylint equivalent |
| `ruff` | Kept | Style/lint/format — see #84 |

---

## Files to Change

| File | Change |
|------|--------|
| `packages/parser-core/pyproject.toml` | Add `[tool.pylint]` config (R-codes only, custom thresholds); add `pylint`, `radon`, `xenon` to `[dev]` deps |
| `.github/workflows/ci.yml` | Add `complexity` job (radon report + xenon gate); extend `lint-core` with scoped pylint step; add `complexity` to `ci-pass` needs |
| `Makefile` | Add `make complexity` (radon + xenon) and `make design-check` (pylint R-codes) targets |
| `CONTRIBUTING.md` | Document complexity thresholds, how to run locally, and how to suppress legitimate violations |

---

## Proposed CI additions

New `complexity` job:

```yaml
complexity:
  name: Complexity — radon + xenon
  runs-on: ubuntu-latest
  timeout-minutes: 5
  needs: changes
  if: needs.changes.outputs.core == 'true'
  steps:
    - uses: actions/checkout@v6
    - uses: actions/setup-python@v6
      with:
        python-version: '3.12'
    - run: pip install radon xenon
    - name: Radon complexity report (C-grade and worse)
      run: radon cc packages/parser-core/src -s -n C
    - name: Xenon complexity gate
      run: xenon --max-absolute C --max-modules B --max-average A packages/parser-core/src
```

Extension to `lint-core`:

```yaml
- name: Pylint design checks (R-codes only)
  run: pylint packages/parser-core/src --disable=all --enable=R --fail-under=8.0
```

---

## Test Plan

- [ ] Run `radon cc src -s -n C` locally — review all C-grade-and-worse functions, document known violations
- [ ] Run `xenon --max-absolute C --max-modules B --max-average A src` locally — confirm passes with the proposed lenient initial thresholds
- [ ] Run `pylint src --disable=all --enable=R` locally — suppress legitimate exceptions with `# pylint: disable=too-many-branches` where justified (e.g. dispatch tables), fix the rest
- [ ] Run full test suite (`pytest`) — confirm 1420+ tests still pass (no production code changes required for this issue)
- [ ] Push to PR branch — confirm new CI jobs (`complexity`, updated `lint-core`) go green
- [ ] Confirm `ci-pass` required status check still passes after adding `complexity` to its `needs` list
- [ ] Update `CONTRIBUTING.md` with complexity threshold documentation and `make complexity` instructions
- [ ] Update `MEMORY.md` to document complexity tooling and the known hotspot files

---

## Out of Scope

- Refactoring existing high-complexity functions — separate work; this issue only adds the gate
- mutmut — deferred; will be revisited when coverage exceeds 95% consistently
- Applying complexity checks to `parser-free` — minimal logic, low value


Branches	Args	Function	File
27	2	`_detect_text_based_table()`	`analysis/table_detector.py`
22	3	`init_directories()`	`commands/init.py`
17	3	`get_detection_explanation()`	`templates/template_detector.py`
16	2	`_detect_recurring_charges()`	`services/expense_analysis.py`
15	1	`analyze()`	`commands/analyze_pdf.py`
14	3	`detect_template()`	`templates/template_detector.py`
14	4	`detect()`	`templates/detectors/iban_detector.py`
14	3	`merge_continuation_lines()`	`services/row_merger.py`
13	3	`_assign_column_names()`	`analysis/column_analyzer.py`
13	2	`from_multiple_directories()`	`templates/template_registry.py`

File	Change
`packages/parser-core/pyproject.toml`	Add `[tool.pylint]` config (R-codes only, custom thresholds); add `pylint`, `radon`, `xenon` to `[dev]` deps
`.github/workflows/ci.yml`	Add `complexity` job (radon report + xenon gate); extend `lint-core` with scoped pylint step; add `complexity` to `ci-pass` needs
`Makefile`	Add `make complexity` (radon + xenon) and `make design-check` (pylint R-codes) targets
`CONTRIBUTING.md`	Document complexity thresholds, how to run locally, and how to suppress legitimate violations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tooling: add Pylint design checks + Radon/Xenon complexity budgets to CI #85

Background

Current State

Coverage

Existing lint tooling

Complexity hotspots already visible in codebase

Tool 1 — Pylint (design checks only)

Recommendation: Add, scoped to R-category codes only

What is NOT being replaced

Impact

Tool 2 — Radon + Xenon (complexity budgets)

Recommendation: Add both

Proposed Xenon thresholds for CI

Radon in CI (reporting only, non-blocking)

Tool 3 — mutmut (mutation testing)

Recommendation: Defer — not yet appropriate

What Is Not Being Replaced

Files to Change

Proposed CI additions

Test Plan

Out of Scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pylint check	Default threshold	What it catches
`too-many-branches`	>12	Functions like `_detect_text_based_table` (27 branches)
`too-many-statements`	>50	Overly long functions
`too-many-locals`	>15	Functions juggling too much state
`too-many-arguments`	>5	Interfaces with too many parameters
`too-many-public-methods`	>20	Classes doing too much
`too-many-instance-attributes`	>7	Data classes masquerading as services

Tool	Status	Reason
`mypy`	Kept	Static type checker — different domain from design metrics; already in CI
`bandit` / `safety`	Kept	Security scanning — separate concern
`stubtest`	Kept	pdfplumber stub validation — no Radon/Pylint equivalent
`ruff`	Kept	Style/lint/format — see #84

tooling: add Pylint design checks + Radon/Xenon complexity budgets to CI #85

Description

Background

Current State

Coverage

Existing lint tooling

Complexity hotspots already visible in codebase

Tool 1 — Pylint (design checks only)

Recommendation: Add, scoped to R-category codes only

What is NOT being replaced

Impact

Tool 2 — Radon + Xenon (complexity budgets)

Recommendation: Add both

Proposed Xenon thresholds for CI

Radon in CI (reporting only, non-blocking)

Tool 3 — mutmut (mutation testing)

Recommendation: Defer — not yet appropriate

What Is Not Being Replaced

Files to Change

Proposed CI additions

Test Plan

Out of Scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions