From 8eba77a1d3e675296bfb774347289b1f6ef3e324 Mon Sep 17 00:00:00 2001 From: RaghavChamadiya Date: Thu, 11 Jun 2026 16:00:59 +0530 Subject: [PATCH 1/2] fix(health): accept glob alias in health-rules and correct stale docs - config.py: accept "glob" (and keep "path_glob") as aliases for the canonical "path" key in .repowise/health-rules.json rules. Both doc examples showed "glob", which the parser silently ignored, so configs copied from the docs never applied; now they work, and the examples use the canonical key with the aliases documented. Test added. - analysis/health/README.md: "twelve biomarkers" was stale; the registry holds 26, plus 3 additive governance findings. - docs/architecture/code-health.md: add the 6 biomarker files missing from the layer file tree; drop the stale "no PR-mode delta in v1" non-goal (the change_risk package and repowise risk shipped). - docs/CODE_HEALTH.md: hotspot health averages over files the git layer classifies as hotspots, not a fixed top-25% slice. No behavior change other than the new key aliases. --- docs/CODE_HEALTH.md | 11 +++++--- docs/architecture/code-health.md | 25 +++++++++++++------ .../repowise/core/analysis/health/README.md | 10 +++++--- .../repowise/core/analysis/health/config.py | 5 +++- tests/unit/health/test_health_config.py | 18 +++++++++++++ 5 files changed, 53 insertions(+), 16 deletions(-) diff --git a/docs/CODE_HEALTH.md b/docs/CODE_HEALTH.md index 588aa063f..2bd1407aa 100644 --- a/docs/CODE_HEALTH.md +++ b/docs/CODE_HEALTH.md @@ -63,7 +63,9 @@ reproduced by `local-stash/calibrate_health_weights.py` and documented in The final score is clamped to `[1.0, 10.0]`. The three repo-level KPIs: -- **Hotspot Health** — NLOC-weighted average over the top-25 % hotspot files. +- **Hotspot Health** — NLOC-weighted average over the files classified as + hotspots by the git layer (high churn percentile plus minimum-activity + floors), not a fixed top-N slice. - **Average Health** — NLOC-weighted average over all files. - **Worst Performer** — single lowest-scoring file. @@ -318,17 +320,20 @@ Per-file overrides live in `.repowise/health-rules.json`: "disabled_biomarkers": ["primitive_obsession"], "rules": [ { - "glob": "tests/**/*.py", + "path": "tests/**/*.py", "disabled_biomarkers": ["large_method", "complex_method"] }, { - "glob": "src/legacy/**", + "path": "src/legacy/**", "disabled_biomarkers": ["dry_violation"] } ] } ``` +`path` holds an fnmatch-style glob over the repo-relative POSIX path +(`path_glob` and `glob` are accepted aliases). + ## Incremental updates `repowise update` only re-scores the changed files. Findings and metrics for diff --git a/docs/architecture/code-health.md b/docs/architecture/code-health.md index 44507a025..6faca45b7 100644 --- a/docs/architecture/code-health.md +++ b/docs/architecture/code-health.md @@ -93,6 +93,8 @@ analysis/health/ ├── base.py # Biomarker Protocol + FileContext + BiomarkerResult ├── registry.py # detector list + detect_all() ├── brain_method.py + ├── low_cohesion.py + ├── god_class.py ├── nested_complexity.py ├── bumpy_road.py ├── complex_method.py @@ -101,6 +103,7 @@ analysis/health/ ├── dry_violation.py ├── untested_hotspot.py ├── coverage_gap.py + ├── coverage_gradient.py ├── developer_congestion.py ├── knowledge_loss.py ├── hidden_coupling.py @@ -111,6 +114,9 @@ analysis/health/ ├── churn_risk.py ├── change_entropy.py ├── co_change_scatter.py + ├── prior_defect.py + ├── large_assertion_block.py + ├── duplicated_assertion_block.py └── error_handling.py ``` @@ -749,19 +755,21 @@ User-authored (the **only** JSON file in the layer). Loaded by "disabled_biomarkers": ["primitive_obsession"], "rules": [ { - "glob": "tests/**/*.py", + "path": "tests/**/*.py", "disabled_biomarkers": ["large_method", "complex_method"] }, { - "glob": "src/legacy/**", + "path": "src/legacy/**", "disabled_biomarkers": ["dry_violation"] } ] } ``` -`to_analyzer_config(file_paths)` resolves globs to per-file disabled -sets, which the engine honors in `_evaluate_file()`. +`path` is an fnmatch glob over the repo-relative POSIX path; `path_glob` +and `glob` are accepted aliases. `to_analyzer_config(file_paths)` resolves +globs to per-file disabled sets, which the engine honors in +`_evaluate_file()`. --- @@ -859,10 +867,11 @@ phases may revisit; the constraints kept v1 shippable. - **No `complexity_estimate` propagation backfill.** The walker writes the field as a side effect during the current run; old indexes don't get touched until a re-index. -- **No PR-mode delta in v1.** `get_risk(changed_files=...)` returns the - current health score, not before/after. Phase 5. -- **No predictive ML.** `Predicted Decline` is a 3-snapshot direction - check, not a model. Phase 5. +- **No predictive ML on trends.** `Predicted Decline` is a 3-snapshot + direction check, not a model. (Commit-level change risk is a separate, + shipped surface: the `analysis/change_risk/` package behind + `repowise risk` scores a commit or base..head range with a calibrated + logistic model.) --- diff --git a/packages/core/src/repowise/core/analysis/health/README.md b/packages/core/src/repowise/core/analysis/health/README.md index 9c0204cbe..593e0f186 100644 --- a/packages/core/src/repowise/core/analysis/health/README.md +++ b/packages/core/src/repowise/core/analysis/health/README.md @@ -1,9 +1,9 @@ # Code Health analysis layer Fifth intelligence layer alongside Graph, Git, Docs, and Decisions. Computes a -per-file health score (1.0–10.0) from twelve deterministic biomarkers, ingests -test-coverage data, tracks repo-level KPIs over time, and surfaces refactoring -targets ranked by impact-per-effort. +per-file health score (1.0–10.0) from twenty-six deterministic biomarkers, +ingests test-coverage data, tracks repo-level KPIs over time, and surfaces +refactoring targets ranked by impact-per-effort. **Zero LLM calls.** Pure Python over tree-sitter + git data. Designed to finish in under 30 s on a 3 000-file repo (see `tests/integration/test_health_perf_benchmark.py`). @@ -87,7 +87,9 @@ expose NLOC-weighted module aggregates and accept `module:foo` targets. - `duplication/` — Rabin–Karp over tree-sitter tokens. Co-change correlation via `git_meta_map[path]["co_change_partners_json"]`. - `biomarkers/` — one detector per file. Implements the `Biomarker` - Protocol from `biomarkers/base.py`. Twelve total in v1. + Protocol from `biomarkers/base.py`. Twenty-six registered (see + `biomarkers/registry.py` and `biomarkers/README.md` for the full list), + plus three governance findings written by a separate additive pass. Each sub-package has its own `README.md` covering inputs, outputs, and extension points. diff --git a/packages/core/src/repowise/core/analysis/health/config.py b/packages/core/src/repowise/core/analysis/health/config.py index 5f9fc0462..eeeafcb3c 100644 --- a/packages/core/src/repowise/core/analysis/health/config.py +++ b/packages/core/src/repowise/core/analysis/health/config.py @@ -80,7 +80,10 @@ def from_dict(cls, raw: object) -> HealthConfig: for entry in raw.get("rules") or []: if not isinstance(entry, dict): continue - glob = entry.get("path") or entry.get("path_glob") + # ``path`` is canonical; ``path_glob`` and ``glob`` are accepted + # aliases (the docs showed ``glob`` for a while, so a silent + # rejection here would invalidate working configs users copied). + glob = entry.get("path") or entry.get("path_glob") or entry.get("glob") if not isinstance(glob, str) or not glob: continue disabled_for = [ diff --git a/tests/unit/health/test_health_config.py b/tests/unit/health/test_health_config.py index 79f6fee20..a225a043f 100644 --- a/tests/unit/health/test_health_config.py +++ b/tests/unit/health/test_health_config.py @@ -75,6 +75,24 @@ def test_to_analyzer_config_shape(tmp_path: Path): assert "test/bar.py" not in pfd +def test_glob_and_path_glob_aliases_accepted(tmp_path: Path): + """``glob`` (shown in older docs) and ``path_glob`` work like ``path``.""" + _write( + tmp_path, + { + "rules": [ + {"glob": "tests/**", "disabled_biomarkers": ["large_method"]}, + {"path_glob": "src/legacy/*", "disabled_biomarkers": ["dry_violation"]}, + ] + }, + ) + cfg = HealthConfig.load(tmp_path) + assert [r.path_glob for r in cfg.rules] == ["tests/**", "src/legacy/*"] + pfd = cfg.per_file_disabled(["tests/unit/test_x.py", "src/legacy/old.py"]) + assert pfd["tests/unit/test_x.py"] == {"large_method"} + assert pfd["src/legacy/old.py"] == {"dry_violation"} + + def test_malformed_file_falls_back_silently(tmp_path: Path): repowise = tmp_path / ".repowise" repowise.mkdir() From 54747630fe614c312e36ce0f0d276a194250e919 Mon Sep 17 00:00:00 2001 From: RaghavChamadiya Date: Fri, 12 Jun 2026 09:11:53 +0530 Subject: [PATCH 2/2] docs(commercial): security layer v2 statuses Graph-aware scanning, function-level reachability (per-ecosystem coverage), VEX export, PCI-DSS and SOC 2 compliance reporting, signed Slack-compatible security webhooks, and the audit-event stream are now live on the hosted platform; matrix rows and section 5 prose updated to match. --- docs/COMMERCIAL.md | 84 +++++++++++++++++++++++++++++----------------- 1 file changed, 53 insertions(+), 31 deletions(-) diff --git a/docs/COMMERCIAL.md b/docs/COMMERCIAL.md index d538a2a8a..7b9e53b93 100644 --- a/docs/COMMERCIAL.md +++ b/docs/COMMERCIAL.md @@ -119,18 +119,18 @@ the items that matter most to you can be prioritized. | Local dashboard (incl. local security pattern scan) | ✅ | ✅ | | Auto-sync (hooks, watcher, webhooks) | ✅ | ✅ | | Auto-generated CLAUDE.md | ✅ | ✅ | -| Graph-aware enhanced security scanning | — | ✅ *(dev)* | +| Graph-aware enhanced security scanning | — | ✅ *(GA on hosted)* | | Language-specific security rulesets | — | ✅ *(dev)* | | CVE-aware dependency analysis (KEV / EPSS / priority-scored) | — | ✅ *(GA on hosted)* | | Usage-aware CVE triage (imports × dead code) | — | ✅ *(GA on hosted)* | -| Function-level reachability triage | — | ✅ *(planned)* | +| Function-level reachability triage | — | ✅ *(GA on hosted — per-language coverage)* | | Secret detection across full git history | — | ✅ *(GA on hosted)* | -| SBOM generation (CycloneDX) + diffs | — | ✅ *(GA on hosted)* | -| Compliance reporting (PCI-DSS / SOC 2) | — | ✅ *(planned)* | -| Audit trail (in-product + JSON / CSV export) | — | ✅ *(GA on hosted — security surface)* | +| SBOM generation (CycloneDX) + VEX export + diffs | — | ✅ *(GA on hosted)* | +| Compliance reporting (PCI-DSS / SOC 2) | — | ✅ *(GA on hosted — Teams)* | +| Audit trail (in-product + JSON / CSV export + webhook stream) | — | ✅ *(GA on hosted — security surface)* | | Jira / Confluence integration | — | ✅ *(rolling out)* | | GitHub Enterprise / Azure DevOps / GitLab / Bitbucket | — | ✅ *(rolling out)* | -| Slack / Teams alerting | — | ✅ *(rolling out)* | +| Slack / Teams security alerting (signed webhooks) | — | ✅ *(GA on hosted — Teams)* | | SAML / OIDC SSO + SCIM | — | ✅ *(rolling out)* | | RBAC + multi-tenant | — | ✅ *(planned)* | | Air-gapped install bundle | — | ✅ *(planned)* | @@ -146,13 +146,15 @@ the items that matter most to you can be prioritized. ### 5.1 Security & Compliance -- **Security scanning layer** *(GA: local pattern scan; dev: graph-aware - enrichment)* — pattern-based detection for dangerous APIs (`eval`/`exec`, - `pickle.loads`, `shell=True`, `os.system`, hardcoded secrets, concat / f-string - SQL, `verify=False`, weak hashes) runs locally today in the dashboard's Security - view. Graph-aware enrichment — linking findings to graph nodes and surfacing them - through `get_risk` so AI agents see security context before modifying a file — is - in development. +- **Security scanning layer** *(GA: local pattern scan; GA on hosted: + graph-aware enrichment)* — pattern-based detection for dangerous APIs + (`eval`/`exec`, `pickle.loads`, `shell=True`, `os.system`, hardcoded secrets, + concat / f-string SQL, `verify=False`, weak hashes) runs locally today in the + dashboard's Security view. On the hosted platform, findings are graph-aware: + every vulnerable import site carries hotspot and centrality context from the + code graph (feeding a bounded priority bump), and AI agents see security + state before modifying a file through the hosted `get_security` MCP tool and + the security section `get_risk` attaches. - **Language-specific security rulesets** *(dev)* — rulesets built on top of the per-language dynamic-hint extractors and framework edges. For .NET, planned checks include `[Authorize]` coverage on controllers and Minimal API endpoints, @@ -172,24 +174,39 @@ the items that matter most to you can be prioritized. sites as clickable evidence), cross-referencing the existing parse and dead-code layers. Unmappable packages are labeled `unknown` honestly — never guessed. -- **Function-level reachability triage** *(planned)* — the next precision step: - classifying CVEs by whether the vulnerable *function* is reachable through the - resolved call graph. Precision is language- and pattern-dependent; we report it - honestly per language rather than quoting one global number. -- **SBOM generation** *(available on the hosted platform, Pro+)* — CycloneDX 1.6 - output per snapshot with per-dependency license detection and license-risk - classification, downloadable in-product, plus dependency diffs between any two - snapshots. SPDX and cross-format conversion on the extended roadmap. -- **Compliance reporting** *(planned)* — framework-mapping reports tying findings - back to specific files, owners, and decisions. Initial scope: **PCI-DSS** and - **SOC 2** control coverage. ISO 27001 Annex A and GDPR / data-residency mappings on - the extended roadmap — we'd rather ship two solid mappings than four shallow ones. +- **Function-level reachability triage** *(available on the hosted platform, + Pro+)* — classifies CVEs by whether the advisory's affected packages or + symbols are actually imported, crossing OSV symbol data with the per-import + names the indexer captures. Coverage is per-ecosystem and reported honestly + in-product: Go is import-path-reliable (the Go vulndb lists affected + packages per advisory), PyPI / npm / cargo are assessed only when both the + advisory and the code name symbols, and other ecosystems stay at + package-level triage. Provably-unreachable findings are discounted, never + hidden; nothing is ever claimed without evidence on both sides. +- **SBOM generation + VEX export** *(available on the hosted platform, Pro+)* — + CycloneDX 1.6 SBOM per snapshot with per-dependency license detection and + license-risk classification, downloadable in-product, plus dependency diffs + between any two snapshots — and a CycloneDX 1.6 **VEX** document that maps + the platform's triage, reachability, and human status decisions onto the + standard impact-analysis states, generated fresh at download so it always + reflects current triage. SPDX and cross-format conversion on the extended + roadmap. +- **Compliance reporting** *(available on the hosted platform, Teams+)* — + **PCI-DSS 4.0** and **SOC 2** control-coverage reports derived from the live + security findings, with per-control evidence drill-ins and JSON / Markdown + export. Framed honestly in-product and in every export: coverage signals, + not an audit or certification — controls automated findings cannot evidence + are marked for manual attestation rather than silently passed. ISO 27001 + Annex A and GDPR / data-residency mappings on the extended roadmap — we'd + rather ship two solid mappings than four shallow ones. - **Audit trail** *(available on the hosted platform for the security surface, Teams+)* — security reads and actions (scans triggered, vulnerability and - secret views, SBOM exports, finding-status changes) logged insert-only with - user, IP, and timestamp, queryable in-product and exportable to JSON / CSV. - Coverage beyond the security surface (decisions, overrides) is in development; - streaming export to SIEM (Splunk / Datadog / Elastic / syslog) on the roadmap. + secret views, SBOM / VEX exports, compliance views, finding-status changes, + and MCP reads by AI agents) logged insert-only with user, IP, and timestamp, + queryable in-product and exportable to JSON / CSV — plus an opt-in + SIEM-lite stream that forwards audit events to any HTTPS endpoint as signed + webhooks. Coverage beyond the security surface (decisions, overrides) is in + development; native Splunk / Datadog / Elastic connectors on the roadmap. - **Secret-in-code detection** *(available on the hosted platform, Pro+)* — scanning across full git history (not just `HEAD`), with live-at-`HEAD` flagging and incremental re-scans. Only a fingerprint and a redacted preview @@ -211,8 +228,13 @@ integrations beyond this list are available on request. PR-comment bot that posts blast-radius and reviewer suggestions, and a branch-protection check that blocks merges touching hotspots without a reviewer from the ownership list. -- **Slack & Microsoft Teams** — alerts on hotspot drift, bus-factor warnings, - decision staleness, and security findings, routed by ownership. +- **Slack & Microsoft Teams** — security alerting is available today on the + hosted platform (Teams+) as HMAC-signed webhooks with a Slack-compatible + format (works with Slack, Microsoft Teams, and Mattermost inbound + webhooks): new critical CVEs, live secrets, failed scans, and + rotation-overdue reminders, plus the opt-in audit-event stream. Alerts on + hotspot drift, bus-factor warnings, and decision staleness are rolling out + on the same plumbing, routed by ownership. - **SAML / OIDC SSO** — Okta, Entra ID, Auth0, Google Workspace, generic SAML 2.0. - **SCIM provisioning** — automatic user / group lifecycle.