Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions docs/CODE_HEALTH.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ reproduced by `local-stash/calibrate_health_weights.py` and documented in

The final score is clamped to `[1.0, 10.0]`. The three repo-level KPIs:

- **Hotspot Health** — NLOC-weighted average over the top-25 % hotspot files.
- **Hotspot Health** — NLOC-weighted average over the files classified as
hotspots by the git layer (high churn percentile plus minimum-activity
floors), not a fixed top-N slice.
- **Average Health** — NLOC-weighted average over all files.
- **Worst Performer** — single lowest-scoring file.

Expand Down Expand Up @@ -318,17 +320,20 @@ Per-file overrides live in `.repowise/health-rules.json`:
"disabled_biomarkers": ["primitive_obsession"],
"rules": [
{
"glob": "tests/**/*.py",
"path": "tests/**/*.py",
"disabled_biomarkers": ["large_method", "complex_method"]
},
{
"glob": "src/legacy/**",
"path": "src/legacy/**",
"disabled_biomarkers": ["dry_violation"]
}
]
}
```

`path` holds an fnmatch-style glob over the repo-relative POSIX path
(`path_glob` and `glob` are accepted aliases).

## Incremental updates

`repowise update` only re-scores the changed files. Findings and metrics for
Expand Down
84 changes: 53 additions & 31 deletions docs/COMMERCIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,18 +119,18 @@ the items that matter most to you can be prioritized.
| Local dashboard (incl. local security pattern scan) | ✅ | ✅ |
| Auto-sync (hooks, watcher, webhooks) | ✅ | ✅ |
| Auto-generated CLAUDE.md | ✅ | ✅ |
| Graph-aware enhanced security scanning | — | ✅ *(dev)* |
| Graph-aware enhanced security scanning | — | ✅ *(GA on hosted)* |
| Language-specific security rulesets | — | ✅ *(dev)* |
| CVE-aware dependency analysis (KEV / EPSS / priority-scored) | — | ✅ *(GA on hosted)* |
| Usage-aware CVE triage (imports × dead code) | — | ✅ *(GA on hosted)* |
| Function-level reachability triage | — | ✅ *(planned)* |
| Function-level reachability triage | — | ✅ *(GA on hosted — per-language coverage)* |
| Secret detection across full git history | — | ✅ *(GA on hosted)* |
| SBOM generation (CycloneDX) + diffs | — | ✅ *(GA on hosted)* |
| Compliance reporting (PCI-DSS / SOC 2) | — | ✅ *(planned)* |
| Audit trail (in-product + JSON / CSV export) | — | ✅ *(GA on hosted — security surface)* |
| SBOM generation (CycloneDX) + VEX export + diffs | — | ✅ *(GA on hosted)* |
| Compliance reporting (PCI-DSS / SOC 2) | — | ✅ *(GA on hosted — Teams)* |
| Audit trail (in-product + JSON / CSV export + webhook stream) | — | ✅ *(GA on hosted — security surface)* |
| Jira / Confluence integration | — | ✅ *(rolling out)* |
| GitHub Enterprise / Azure DevOps / GitLab / Bitbucket | — | ✅ *(rolling out)* |
| Slack / Teams alerting | — | ✅ *(rolling out)* |
| Slack / Teams security alerting (signed webhooks) | — | ✅ *(GA on hosted — Teams)* |
| SAML / OIDC SSO + SCIM | — | ✅ *(rolling out)* |
| RBAC + multi-tenant | — | ✅ *(planned)* |
| Air-gapped install bundle | — | ✅ *(planned)* |
Expand All @@ -146,13 +146,15 @@ the items that matter most to you can be prioritized.

### 5.1 Security & Compliance

- **Security scanning layer** *(GA: local pattern scan; dev: graph-aware
enrichment)* — pattern-based detection for dangerous APIs (`eval`/`exec`,
`pickle.loads`, `shell=True`, `os.system`, hardcoded secrets, concat / f-string
SQL, `verify=False`, weak hashes) runs locally today in the dashboard's Security
view. Graph-aware enrichment — linking findings to graph nodes and surfacing them
through `get_risk` so AI agents see security context before modifying a file — is
in development.
- **Security scanning layer** *(GA: local pattern scan; GA on hosted:
graph-aware enrichment)* — pattern-based detection for dangerous APIs
(`eval`/`exec`, `pickle.loads`, `shell=True`, `os.system`, hardcoded secrets,
concat / f-string SQL, `verify=False`, weak hashes) runs locally today in the
dashboard's Security view. On the hosted platform, findings are graph-aware:
every vulnerable import site carries hotspot and centrality context from the
code graph (feeding a bounded priority bump), and AI agents see security
state before modifying a file through the hosted `get_security` MCP tool and
the security section `get_risk` attaches.
- **Language-specific security rulesets** *(dev)* — rulesets built on top of the
per-language dynamic-hint extractors and framework edges. For .NET, planned checks
include `[Authorize]` coverage on controllers and Minimal API endpoints,
Expand All @@ -172,24 +174,39 @@ the items that matter most to you can be prioritized.
sites as clickable evidence), cross-referencing the existing parse and
dead-code layers. Unmappable packages are labeled `unknown` honestly — never
guessed.
- **Function-level reachability triage** *(planned)* — the next precision step:
classifying CVEs by whether the vulnerable *function* is reachable through the
resolved call graph. Precision is language- and pattern-dependent; we report it
honestly per language rather than quoting one global number.
- **SBOM generation** *(available on the hosted platform, Pro+)* — CycloneDX 1.6
output per snapshot with per-dependency license detection and license-risk
classification, downloadable in-product, plus dependency diffs between any two
snapshots. SPDX and cross-format conversion on the extended roadmap.
- **Compliance reporting** *(planned)* — framework-mapping reports tying findings
back to specific files, owners, and decisions. Initial scope: **PCI-DSS** and
**SOC 2** control coverage. ISO 27001 Annex A and GDPR / data-residency mappings on
the extended roadmap — we'd rather ship two solid mappings than four shallow ones.
- **Function-level reachability triage** *(available on the hosted platform,
Pro+)* — classifies CVEs by whether the advisory's affected packages or
symbols are actually imported, crossing OSV symbol data with the per-import
names the indexer captures. Coverage is per-ecosystem and reported honestly
in-product: Go is import-path-reliable (the Go vulndb lists affected
packages per advisory), PyPI / npm / cargo are assessed only when both the
advisory and the code name symbols, and other ecosystems stay at
package-level triage. Provably-unreachable findings are discounted, never
hidden; nothing is ever claimed without evidence on both sides.
- **SBOM generation + VEX export** *(available on the hosted platform, Pro+)* —
CycloneDX 1.6 SBOM per snapshot with per-dependency license detection and
license-risk classification, downloadable in-product, plus dependency diffs
between any two snapshots — and a CycloneDX 1.6 **VEX** document that maps
the platform's triage, reachability, and human status decisions onto the
standard impact-analysis states, generated fresh at download so it always
reflects current triage. SPDX and cross-format conversion on the extended
roadmap.
- **Compliance reporting** *(available on the hosted platform, Teams+)* —
**PCI-DSS 4.0** and **SOC 2** control-coverage reports derived from the live
security findings, with per-control evidence drill-ins and JSON / Markdown
export. Framed honestly in-product and in every export: coverage signals,
not an audit or certification — controls automated findings cannot evidence
are marked for manual attestation rather than silently passed. ISO 27001
Annex A and GDPR / data-residency mappings on the extended roadmap — we'd
rather ship two solid mappings than four shallow ones.
- **Audit trail** *(available on the hosted platform for the security surface,
Teams+)* — security reads and actions (scans triggered, vulnerability and
secret views, SBOM exports, finding-status changes) logged insert-only with
user, IP, and timestamp, queryable in-product and exportable to JSON / CSV.
Coverage beyond the security surface (decisions, overrides) is in development;
streaming export to SIEM (Splunk / Datadog / Elastic / syslog) on the roadmap.
secret views, SBOM / VEX exports, compliance views, finding-status changes,
and MCP reads by AI agents) logged insert-only with user, IP, and timestamp,
queryable in-product and exportable to JSON / CSV — plus an opt-in
SIEM-lite stream that forwards audit events to any HTTPS endpoint as signed
webhooks. Coverage beyond the security surface (decisions, overrides) is in
development; native Splunk / Datadog / Elastic connectors on the roadmap.
- **Secret-in-code detection** *(available on the hosted platform, Pro+)* —
scanning across full git history (not just `HEAD`), with live-at-`HEAD`
flagging and incremental re-scans. Only a fingerprint and a redacted preview
Expand All @@ -211,8 +228,13 @@ integrations beyond this list are available on request.
PR-comment bot that posts blast-radius and reviewer suggestions, and a
branch-protection check that blocks merges touching hotspots without a reviewer
from the ownership list.
- **Slack & Microsoft Teams** — alerts on hotspot drift, bus-factor warnings,
decision staleness, and security findings, routed by ownership.
- **Slack & Microsoft Teams** — security alerting is available today on the
hosted platform (Teams+) as HMAC-signed webhooks with a Slack-compatible
format (works with Slack, Microsoft Teams, and Mattermost inbound
webhooks): new critical CVEs, live secrets, failed scans, and
rotation-overdue reminders, plus the opt-in audit-event stream. Alerts on
hotspot drift, bus-factor warnings, and decision staleness are rolling out
on the same plumbing, routed by ownership.
- **SAML / OIDC SSO** — Okta, Entra ID, Auth0, Google Workspace, generic SAML 2.0.
- **SCIM provisioning** — automatic user / group lifecycle.

Expand Down
25 changes: 17 additions & 8 deletions docs/architecture/code-health.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ analysis/health/
├── base.py # Biomarker Protocol + FileContext + BiomarkerResult
├── registry.py # detector list + detect_all()
├── brain_method.py
├── low_cohesion.py
├── god_class.py
├── nested_complexity.py
├── bumpy_road.py
├── complex_method.py
Expand All @@ -101,6 +103,7 @@ analysis/health/
├── dry_violation.py
├── untested_hotspot.py
├── coverage_gap.py
├── coverage_gradient.py
├── developer_congestion.py
├── knowledge_loss.py
├── hidden_coupling.py
Expand All @@ -111,6 +114,9 @@ analysis/health/
├── churn_risk.py
├── change_entropy.py
├── co_change_scatter.py
├── prior_defect.py
├── large_assertion_block.py
├── duplicated_assertion_block.py
└── error_handling.py
```

Expand Down Expand Up @@ -749,19 +755,21 @@ User-authored (the **only** JSON file in the layer). Loaded by
"disabled_biomarkers": ["primitive_obsession"],
"rules": [
{
"glob": "tests/**/*.py",
"path": "tests/**/*.py",
"disabled_biomarkers": ["large_method", "complex_method"]
},
{
"glob": "src/legacy/**",
"path": "src/legacy/**",
"disabled_biomarkers": ["dry_violation"]
}
]
}
```

`to_analyzer_config(file_paths)` resolves globs to per-file disabled
sets, which the engine honors in `_evaluate_file()`.
`path` is an fnmatch glob over the repo-relative POSIX path; `path_glob`
and `glob` are accepted aliases. `to_analyzer_config(file_paths)` resolves
globs to per-file disabled sets, which the engine honors in
`_evaluate_file()`.

---

Expand Down Expand Up @@ -859,10 +867,11 @@ phases may revisit; the constraints kept v1 shippable.
- **No `complexity_estimate` propagation backfill.** The walker writes
the field as a side effect during the current run; old indexes don't
get touched until a re-index.
- **No PR-mode delta in v1.** `get_risk(changed_files=...)` returns the
current health score, not before/after. Phase 5.
- **No predictive ML.** `Predicted Decline` is a 3-snapshot direction
check, not a model. Phase 5.
- **No predictive ML on trends.** `Predicted Decline` is a 3-snapshot
direction check, not a model. (Commit-level change risk is a separate,
shipped surface: the `analysis/change_risk/` package behind
`repowise risk` scores a commit or base..head range with a calibrated
logistic model.)

---

Expand Down
10 changes: 6 additions & 4 deletions packages/core/src/repowise/core/analysis/health/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Code Health analysis layer

Fifth intelligence layer alongside Graph, Git, Docs, and Decisions. Computes a
per-file health score (1.0–10.0) from twelve deterministic biomarkers, ingests
test-coverage data, tracks repo-level KPIs over time, and surfaces refactoring
targets ranked by impact-per-effort.
per-file health score (1.0–10.0) from twenty-six deterministic biomarkers,
ingests test-coverage data, tracks repo-level KPIs over time, and surfaces
refactoring targets ranked by impact-per-effort.

**Zero LLM calls.** Pure Python over tree-sitter + git data. Designed to finish
in under 30 s on a 3 000-file repo (see `tests/integration/test_health_perf_benchmark.py`).
Expand Down Expand Up @@ -87,7 +87,9 @@ expose NLOC-weighted module aggregates and accept `module:foo` targets.
- `duplication/` — Rabin–Karp over tree-sitter tokens. Co-change correlation
via `git_meta_map[path]["co_change_partners_json"]`.
- `biomarkers/` — one detector per file. Implements the `Biomarker`
Protocol from `biomarkers/base.py`. Twelve total in v1.
Protocol from `biomarkers/base.py`. Twenty-six registered (see
`biomarkers/registry.py` and `biomarkers/README.md` for the full list),
plus three governance findings written by a separate additive pass.

Each sub-package has its own `README.md` covering inputs, outputs, and
extension points.
Expand Down
5 changes: 4 additions & 1 deletion packages/core/src/repowise/core/analysis/health/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,10 @@ def from_dict(cls, raw: object) -> HealthConfig:
for entry in raw.get("rules") or []:
if not isinstance(entry, dict):
continue
glob = entry.get("path") or entry.get("path_glob")
# ``path`` is canonical; ``path_glob`` and ``glob`` are accepted
# aliases (the docs showed ``glob`` for a while, so a silent
# rejection here would invalidate working configs users copied).
glob = entry.get("path") or entry.get("path_glob") or entry.get("glob")
if not isinstance(glob, str) or not glob:
continue
disabled_for = [
Expand Down
18 changes: 18 additions & 0 deletions tests/unit/health/test_health_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,24 @@ def test_to_analyzer_config_shape(tmp_path: Path):
assert "test/bar.py" not in pfd


def test_glob_and_path_glob_aliases_accepted(tmp_path: Path):
"""``glob`` (shown in older docs) and ``path_glob`` work like ``path``."""
_write(
tmp_path,
{
"rules": [
{"glob": "tests/**", "disabled_biomarkers": ["large_method"]},
{"path_glob": "src/legacy/*", "disabled_biomarkers": ["dry_violation"]},
]
},
)
cfg = HealthConfig.load(tmp_path)
assert [r.path_glob for r in cfg.rules] == ["tests/**", "src/legacy/*"]
pfd = cfg.per_file_disabled(["tests/unit/test_x.py", "src/legacy/old.py"])
assert pfd["tests/unit/test_x.py"] == {"large_method"}
assert pfd["src/legacy/old.py"] == {"dry_violation"}


def test_malformed_file_falls_back_silently(tmp_path: Path):
repowise = tmp_path / ".repowise"
repowise.mkdir()
Expand Down
Loading