Skip to content

feat: substrate-backed red-team grounding checker#2214

Merged
Aureliolo merged 5 commits into
mainfrom
feat/redteam-grounding-checker
Jun 4, 2026
Merged

feat: substrate-backed red-team grounding checker#2214
Aureliolo merged 5 commits into
mainfrom
feat/redteam-grounding-checker

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds KnowledgeSubstrateGroundingChecker for the red-team gate's GROUNDING surface. It extracts assertive factual claims from a deliverable, resolves each against the project-scoped knowledge corpus via LLM claim-extraction plus semantic entailment (not bare lexical search), and flags unsupported claims as UngroundedClaim(source="knowledge_substrate", ...) across the [SUBSTRATE_DROP_FLOOR, 1.0] confidence band.

  • Pluggable: protocol + factory + RedTeamConfig.grounding_checker_kind discriminator ("heuristic" default | "knowledge_substrate"). HeuristicGroundingChecker stays the safe default.
  • Escalates correctly: substrate findings map confidence to severity (>=0.85 HIGH, 0.65-0.85 MEDIUM, 0.45-0.65 LOW, <0.45 dropped), capped at HIGH (never CRITICAL). A HIGH finding BLOCKs at every autonomy level; MEDIUM blocks only under LOCKED/SUPERVISED.
  • Lazy resolution: the checker is built before the knowledge substrate wires, so a GroundingSubstrateResolver closure (threaded from workers/_red_team_runtime.py) resolves the live KnowledgeService + provider at check time and degrades to the heuristic when the substrate is absent.
  • Precision-first: empty corpus never flags; entailment biased toward supported/uncertain; per-claim search/entailment failures skip that claim (fail-soft); provider/substrate absence degrades to heuristic rather than blocking on no evidence.
  • Ghost-wiring: ENFORCED line added to scripts/_ghost_wiring_manifest.txt; the construction chain (config -> builder -> factory -> resolver -> boot) is reachable and gate-verified.
  • SEC-1 prompt safety: deliverable/claim wrapped via wrap_untrusted(TAG_TASK_DATA, ...), evidence via TAG_UNTRUSTED_ARTIFACT; the LLM's free-text reason is never surfaced; extracted claims are control-char-stripped and length-capped.

Test plan

  • Unit: precision (supported/uncertain/below-floor/empty-corpus not flagged), escalation (unsupported high-confidence flagged + project-scoped search), degradation (resolver None / knowledge service None / extraction failure -> heuristic), per-claim resilience (search & entailment failures skip only that claim, unparseable verdict skips, multi-survivor), drop-floor boundary (0.45 emits / 0.44 not), source-aware claim_to_finding, routing band boundaries, and the production resolver closure against a fake app_state.
  • Direct parser tests (test_grounding_llm.py): bool-as-numeric guard, confidence clamping, control-char strip, MAX_CLAIMS/dedup caps, tool-refusal paths, and SEC-1 fence-tag presence.
  • Integration (simulation harness): a planted ungrounded claim BLOCKs the gate even when the agent files a clean report; a grounded claim PASSes; a MEDIUM-band claim blocks under SUPERVISED but not FULL autonomy.
  • uv run ruff check, uv run mypy, and all convention gates (ghost-wiring, frozen-model, magic-number, module-size, dependency-inversion, import-linter, no-review-origin, doc-count) pass.

Review coverage

Pre-reviewed by 20 agents (15 specialist reviewers + 5 audit mini-pass). Findings addressed this round: expanded behavioural test coverage (parsers, per-claim failure paths, autonomy-gated routing), documentation/comment accuracy now that the substrate checker ships (design spec, config docs, event-domain inventory, stale "future/stub" docstrings), and observability hardening (build-time degrade logged at WARNING, provider-fallback warning, distinct degrade reasons, extraction-empty / unparseable-verdict / truncation debug events, search_limit guard, resolver try/except). Two behaviour-change suggestions (TaskGroup parallelisation, RetryExhaustedError propagation) were reviewed and declined with reasons: the gate is not a hot path and the degrade-to-heuristic / fail-open contract is intentional.

Closes #2206

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a sophisticated grounding subsystem for the red-team gate, replacing the existing deterministic heuristic with a more precise, LLM-backed approach. By utilizing semantic entailment against a project-scoped knowledge corpus, the new checker can identify ungrounded claims with higher accuracy and escalate them to HIGH severity based on confidence levels. The implementation includes robust fail-open logic, ensuring that the gate remains operational even if substrate services are transiently unavailable, and adds comprehensive observability and testing to support this new capability.

Highlights

  • New Grounding Checker: Added the KnowledgeSubstrateGroundingChecker to perform semantic entailment checks against a project-scoped knowledge corpus.
  • Pluggable Architecture: Implemented a pluggable grounding checker system that allows switching between heuristic and substrate-backed implementations via configuration.
  • Severity Escalation: Configured dynamic severity escalation for findings based on confidence scores, allowing substrate findings to block deliverables.
  • Lazy Dependency Resolution: Introduced a lazy resolver for substrate services to ensure compatibility with the boot-time construction of the red-team runtime.
  • Resilience and Safety: Added fail-soft mechanisms that degrade to the heuristic checker if substrate services are unavailable, alongside prompt safety hardening.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview June 3, 2026 22:07 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a substrate-backed grounding checker (KnowledgeSubstrateGroundingChecker) that uses LLM-based claim extraction and semantic entailment to verify deliverable claims against a project-scoped knowledge corpus. It also introduces a lazy dependency resolver to handle late-wired services, updates routing logic to map claim confidence to finding severity, and adds comprehensive tests. The review feedback recommends adding defensive checks to prevent potential runtime crashes: specifically, guarding against a TypeError when unpacking tool call arguments in the LLM helper, and preventing an IndexError in the worker runtime if the provider registry is empty.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +274 to +276
for call in response.tool_calls:
if call.name == tool_name:
return {**call.arguments}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Unpacking call.arguments using {**call.arguments} can raise a TypeError if call.arguments is None or not a dictionary. Since LLM tool calls can occasionally return malformed or empty arguments under certain provider/parsing failures, it is safer to defensively check if call.arguments is a dictionary before converting it.

Suggested change
for call in response.tool_calls:
if call.name == tool_name:
return {**call.arguments}
for call in response.tool_calls:
if call.name == tool_name:
return dict(call.arguments) if isinstance(call.arguments, dict) else None

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applying this one: ToolCall.arguments is typed dict[str, JsonValue] with default_factory=dict on a frozen Pydantic model (extra="forbid"), so it is guaranteed to be a dict at runtime — {**call.arguments} cannot raise TypeError. A malformed provider payload fails validation at ToolCall construction, upstream of this helper. The else None branch would also be statically unreachable, which trips our warn_unreachable = true mypy gate. Leaving as-is.

Comment on lines +73 to +77
available = registry.list_providers()
if provider_name in available:
name = provider_name
else:
name = available[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If registry.list_providers() returns an empty list, accessing available[0] will raise an IndexError. Adding a defensive check to ensure available is not empty before indexing prevents potential runtime crashes.

        available = registry.list_providers()
        if not available:
            return None
        if provider_name in available:
            name = provider_name
        else:
            name = available[0]

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applying this one: available[0] cannot IndexError here. Line 71 already returns early on registry is None or len(registry) == 0, and ProviderRegistry.__len__ and list_providers() both derive from the same self._drivers (consistent cardinality, no await between them), so available is non-empty by the time available[0] runs. An explicit if not available: return None guard would be statically unreachable, which trips our warn_unreachable = true mypy gate. Leaving as-is.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 313d86e7-9986-40fd-b30e-d31349b06591

📥 Commits

Reviewing files that changed from the base of the PR and between f4548d0 and 48b01af.

📒 Files selected for processing (4)
  • data/codebase_map.json
  • data/feature_index.json
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/workers/_red_team_runtime.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Integration (shard 2)
  • GitHub Check: Test Integration (shard 4)
  • GitHub Check: Test Integration (shard 1)
  • GitHub Check: Test Integration (shard 3)
  • GitHub Check: Test Unit (shard 4)
  • GitHub Check: Test Unit (shard 3)
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit (shard 1)
  • GitHub Check: Test Unit (shard 2)
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Runtime Stats Freshness Gate
  • GitHub Check: Lighthouse Site
  • GitHub Check: pyright (advisory)
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Non-persistence modules must not import sqlite/psycopg or emit raw SQL; use repository protocols from persistence/_generics.py

Files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env; no os.environ.get outside startup
Numerics must live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants; enforced by scripts/check_no_magic_numbers.py
Module-size budget per # module-kind: header: controller 400, service/orchestrator 600, complex_service 1100, repository 500, adapter/integration 700, feature 100, code 500, tests 800, declarative exempt, generated glob-exempt; enforced by check_module_size_budget.py and check_no_growth_in_god_modules.py
Use declarative .importlinter contracts (forbidden-only, direct-imports, blessed back-edges; NO total-order layers); enforced by lint-imports (pre-push + CI) alongside 3 custom AST gates
Graph-level smells (fan-in >=30, LCOM4, budget-pressure within 20% of tier cap) gated by check_architecture_drift.py vs data/architecture_report.json; regenerate via scripts/architecture_report.py
No from __future__ import annotations (3.14 has PEP 649); use PEP 758 syntax for exception handling; type-only imports at module level (not in if TYPE_CHECKING: except for genuine import-cycle breakers)
Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines
Define errors as <Domain><Condition>Error inheriting from DomainError, never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py
Use Pydantic v2 frozen + extra="forbid" on every frozen model in src/synthorg/ AND tests/; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt; NotBlankStr for identifiers
Use model_copy(update=...) or copy.deepcopy() for immutability; deepcopy at system boundaries
Use asyncio.TaskGroup fo...

Files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
🧠 Learnings (16)
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/api/**/*.py : Runtime services: use ONE provider-present switch returning `RuntimeServices` pair (worker execution service + multi-agent coordinator) from shared boot `AgentEngine`; empty company rejects task creation (AgentRuntimeNotConfiguredError, 4014) and 503s on /coordinate; `swap_*` functions hold locks

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/providers/**/*.py : Project uses src/synthorg/providers/ for LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD, provider families, discovery SSRF allowlist, health tracking, active health probing, routing/

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-05-29T08:50:58.380Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2160
File: src/synthorg/persistence/sqlite/escalation_repo.py:370-370
Timestamp: 2026-05-29T08:50:58.380Z
Learning: In this repo, Ruff flake8-unused-arguments (ARG002) already suppresses unused-argument warnings on parameters of methods decorated with `override` (from `typing`). Therefore, if you see `# noqa: ARG002` (or equivalent) on parameters of an `override`-decorated method, treat it as stale/unused and remove it. Do not recommend re-adding `# noqa: ARG002` in these cases, because Ruff will flag the redundant directive (RUF100) and fail the Ruff CI gate.

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-05-31T18:00:32.445Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: src/synthorg/engine/intervention/models.py:182-203
Timestamp: 2026-05-31T18:00:32.445Z
Learning: In this repository, `NotBlankStr` is a Pydantic `Annotated[str, ...]` type alias (defined in `synthorg/core/types.py`). At runtime, calling `NotBlankStr(value)` acts like an identity/cast to `str(value)` and does not execute the `StringConstraints` or `AfterValidator(...)`. Therefore, during code review, do not treat `NotBlankStr(x)` used inside non-Pydantic model methods as a place that would raise `ValidationError`; it won’t. Similarly, when `tuple[NotBlankStr, ...]` values are involved, `NotBlankStr` erases to `str` at runtime, so membership tests/comparisons can be done with raw `str` values.

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-06-03T11:43:13.104Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2200
File: tests/unit/engine/artifacts/test_service.py:42-45
Timestamp: 2026-06-03T11:43:13.104Z
Learning: For the D7 protocol method `save_returning_outcome(artifact: Artifact) -> bool` defined in `src/synthorg/persistence/artifact_protocol.py`, any implementation—including fake/stub/test doubles—must use the exact same parameter name `artifact` (i.e., `save_returning_outcome(self, artifact=...)` / `save_returning_outcome(self, artifact: Artifact)`), not `entity`. This name must match for typeguard positional-or-keyword name conformance. Do not suggest renaming the protocol method’s parameter to `entity`.

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, `data/codebase_map.json` is a generated artifact regenerated by `scripts/generate_feature_index.py`; it mirrors source/module inventory and should not be hand-edited as the source of truth for module size fixes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:22.269Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14985-14988
Timestamp: 2026-06-02T05:42:22.269Z
Learning: For `src/synthorg/providers/drivers/litellm_driver.py`, module-size reviews must account for the approved baseline in `scripts/_module_size_baseline.json`. The file is an approved baselined offender at 740 LOC, so a PR that keeps it below that baseline or net-shrinks it can pass ADR-0006 even if `data/codebase_map.json` shows `loc` above the generic `loc_cap` of 700.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:24.286Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14929-14931
Timestamp: 2026-06-02T05:42:24.286Z
Learning: For module-size reviews in this repository, `src/synthorg/providers/cassette/provider.py` is an approved existing offender with a baseline of 516 LOC in `scripts/_module_size_baseline.json`; under ADR-0006, PRs are acceptable when this file does not grow past that baseline and preferably net-shrinks, even if `data/codebase_map.json` shows `loc` above the nominal `loc_cap` of 500.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, module-size enforcement is performed by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`, which honor pre-existing offenders baselined in `scripts/_module_size_baseline.json`; raw `loc`/`loc_cap` pairs in generated `data/codebase_map.json` do not encode those baseline exemptions and can over-flag generated-map refreshes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Comments WHY only; no reviewer citations / issue back-refs / migration framing; enforced by `check_no_review_origin_in_code.py` + `check_no_migration_framing.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.290Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.290Z
Learning: Applies to src/synthorg/**/*.py : Module-size budget per `# module-kind:` header: `controller` 400, `service`/`orchestrator` 600, `complex_service` 1100, `repository` 500, `adapter`/`integration` 700, `feature` 100, `code` 500, `tests` 800, `declarative` exempt, `generated` glob-exempt; enforced by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`

Applied to files:

  • data/codebase_map.json
🔇 Additional comments (4)
src/synthorg/security/redteam/gate.py (1)

210-210: LGTM!

Also applies to: 367-371

src/synthorg/workers/_red_team_runtime.py (1)

71-75: LGTM!

data/feature_index.json (1)

1229-1229: LGTM!

data/codebase_map.json (1)

18667-18667: LGTM!


Walkthrough

This PR adds a knowledge-substrate grounding checker and supporting plumbing: a lazy GroundingSubstrateResolver and context model, LLM extraction and entailment helpers, KnowledgeSubstrateGroundingChecker that searches a project-scoped corpus and flags unsupported claims above a drop-floor, confidence→severity mapping, factory swap with heuristic fallback, runtime wiring of the resolver, gate integration that merges grounding findings with agent findings, expanded observability events and ghost-wiring manifest, documentation updates, and comprehensive unit/integration tests exercising precision, escalation, degradation, and per-claim resilience.

Suggested labels

prio:medium, type:tech-debt

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 3, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing feat/redteam-grounding-checker (48b01af) with main (02ae05c)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@coderabbitai coderabbitai Bot added prio:medium Should do, but not blocking type:tech-debt labels Jun 3, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 97.47899% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.87%. Comparing base (02ae05c) to head (48b01af).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...c/synthorg/security/redteam/grounding/substrate.py 92.95% 5 Missing ⚠️
src/synthorg/security/redteam/builder.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2214      +/-   ##
==========================================
+ Coverage   88.85%   88.87%   +0.01%     
==========================================
  Files        2584     2587       +3     
  Lines      138757   138980     +223     
==========================================
+ Hits       123295   123512     +217     
- Misses      15462    15468       +6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/security/redteam/_grounding_findings.py`:
- Around line 81-85: The current branch for claim.source ==
"knowledge_substrate" forces a non-None severity by using "or
HEURISTIC_GROUNDING_MAX_SEVERITY", which converts below-floor substrate
severities into LOW; change this to return the raw
substrate_severity_for_confidence(claim.confidence) (i.e., remove the "or
HEURISTIC_GROUNDING_MAX_SEVERITY" fallback) and ensure callers that build
RedTeamFinding objects (or the code path that aggregates findings) explicitly
filter out None severities before constructing RedTeamFinding so below-floor
substrate claims are dropped.

In `@src/synthorg/security/redteam/grounding/substrate.py`:
- Around line 177-184: Change serial awaits in
KnowledgeSubstrateGroundingChecker.check to fan out per-claim evaluations using
an asyncio.TaskGroup bounded to MAX_CLAIMS: create an asyncio.TaskGroup, use an
asyncio.Semaphore(MAX_CLAIMS) (or an async-limited task starter) so at most
MAX_CLAIMS tasks run concurrently, schedule _evaluate_claim(context, claim,
execution_id, project_id) for each claim as a task, have each task catch and
swallow all exceptions except asyncio.CancelledError (preserving the current
fail-soft behavior), collect task results when the group finishes, filter out
None results and return them as a tuple (replacing the current flagged list
append loop). Ensure the unique symbols referenced are
KnowledgeSubstrateGroundingChecker.check, _evaluate_claim, and MAX_CLAIMS.

In `@tests/unit/security/redteam/test_grounding_substrate.py`:
- Around line 118-121: The _checker function currently hardcodes the production
heuristic fallback which couples tests to changing heuristic logic; modify
_checker(context: GroundingSubstrateContext | None, fallback:
Optional[GroundingChecker] = None) -> KnowledgeSubstrateGroundingChecker to
accept an injectable fallback parameter and pass it into
KnowledgeSubstrateGroundingChecker(resolver=lambda: context, fallback=fallback
or existingDefaultFallback) so tests can inject a deterministic stub (e.g.
_StubFallbackChecker) and assert exact tuples for the degradation-paths (such as
expectations around _NUMERIC_DELIVERABLE) instead of relying on the production
heuristic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 0845d1b8-3f4f-4aff-8c51-31e8f3040699

📥 Commits

Reviewing files that changed from the base of the PR and between b010264 and 5b9d4c9.

📒 Files selected for processing (32)
  • data/codebase_map.json
  • data/feature_index.json
  • docs/design/security.md
  • docs/reference/conventions.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/config.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/workers/_red_team_runtime.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/unit/security/redteam/test_models.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Integration (shard 4)
  • GitHub Check: Test Integration (shard 3)
  • GitHub Check: Test Integration (shard 1)
  • GitHub Check: Test Integration (shard 2)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers, mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/reference/conventions.md
  • docs/design/security.md
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Non-persistence modules must not import sqlite/psycopg or emit raw SQL; use repository protocols from persistence/_generics.py

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env; no os.environ.get outside startup
Numerics must live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants; enforced by scripts/check_no_magic_numbers.py
Module-size budget per # module-kind: header: controller 400, service/orchestrator 600, complex_service 1100, repository 500, adapter/integration 700, feature 100, code 500, tests 800, declarative exempt, generated glob-exempt; enforced by check_module_size_budget.py and check_no_growth_in_god_modules.py
Use declarative .importlinter contracts (forbidden-only, direct-imports, blessed back-edges; NO total-order layers); enforced by lint-imports (pre-push + CI) alongside 3 custom AST gates
Graph-level smells (fan-in >=30, LCOM4, budget-pressure within 20% of tier cap) gated by check_architecture_drift.py vs data/architecture_report.json; regenerate via scripts/architecture_report.py
No from __future__ import annotations (3.14 has PEP 649); use PEP 758 syntax for exception handling; type-only imports at module level (not in if TYPE_CHECKING: except for genuine import-cycle breakers)
Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines
Define errors as <Domain><Condition>Error inheriting from DomainError, never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py
Use Pydantic v2 frozen + extra="forbid" on every frozen model in src/synthorg/ AND tests/; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt; NotBlankStr for identifiers
Use model_copy(update=...) or copy.deepcopy() for immutability; deepcopy at system boundaries
Use asyncio.TaskGroup fo...

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Test doubles: use ladder in conventions.md section 12.1; FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline)
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
API test client: HTTP tests use async_test_client fixture (LoopAsyncClient, portal-free); WebSocket tests use sync ws_test_client (TestClient); Windows socket.socketpair retry wrapper in tests/conftest.py is PERMANENT guard for CPython 122797
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests, fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default; every event property must be in _ALLOWED_PROPERTIES per telemetry.md

Files:

  • src/synthorg/observability/events/red_team.py
🧠 Learnings (38)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • docs/reference/conventions.md
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/gate.py
  • data/codebase_map.json
  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.

Applied to files:

  • docs/reference/conventions.md
  • src/synthorg/security/redteam/__init__.py
  • data/codebase_map.json
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/reference/conventions.md
  • src/synthorg/security/feature.py
  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • docs/reference/conventions.md
  • data/codebase_map.json
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
📚 Learning: 2026-05-29T08:50:58.380Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2160
File: src/synthorg/persistence/sqlite/escalation_repo.py:370-370
Timestamp: 2026-05-29T08:50:58.380Z
Learning: In this repo, Ruff flake8-unused-arguments (ARG002) already suppresses unused-argument warnings on parameters of methods decorated with `override` (from `typing`). Therefore, if you see `# noqa: ARG002` (or equivalent) on parameters of an `override`-decorated method, treat it as stale/unused and remove it. Do not recommend re-adding `# noqa: ARG002` in these cases, because Ruff will flag the redundant directive (RUF100) and fail the Ruff CI gate.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
📚 Learning: 2026-05-31T18:00:32.445Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: src/synthorg/engine/intervention/models.py:182-203
Timestamp: 2026-05-31T18:00:32.445Z
Learning: In this repository, `NotBlankStr` is a Pydantic `Annotated[str, ...]` type alias (defined in `synthorg/core/types.py`). At runtime, calling `NotBlankStr(value)` acts like an identity/cast to `str(value)` and does not execute the `StringConstraints` or `AfterValidator(...)`. Therefore, during code review, do not treat `NotBlankStr(x)` used inside non-Pydantic model methods as a place that would raise `ValidationError`; it won’t. Similarly, when `tuple[NotBlankStr, ...]` values are involved, `NotBlankStr` erases to `str` at runtime, so membership tests/comparisons can be done with raw `str` values.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
📚 Learning: 2026-06-03T11:43:13.104Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2200
File: tests/unit/engine/artifacts/test_service.py:42-45
Timestamp: 2026-06-03T11:43:13.104Z
Learning: For the D7 protocol method `save_returning_outcome(artifact: Artifact) -> bool` defined in `src/synthorg/persistence/artifact_protocol.py`, any implementation—including fake/stub/test doubles—must use the exact same parameter name `artifact` (i.e., `save_returning_outcome(self, artifact=...)` / `save_returning_outcome(self, artifact: Artifact)`), not `entity`. This name must match for typeguard positional-or-keyword name conformance. Do not suggest renaming the protocol method’s parameter to `entity`.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/config.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : No `from __future__ import annotations` (3.14 has PEP 649); use PEP 758 syntax for exception handling; type-only imports at module level (not in `if TYPE_CHECKING:` except for genuine import-cycle breakers)

Applied to files:

  • src/synthorg/security/redteam/__init__.py
📚 Learning: 2026-05-23T12:24:00.128Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2080
File: tests/_shared/test_postgres_proxy.py:19-48
Timestamp: 2026-05-23T12:24:00.128Z
Learning: When creating test doubles for Python typing.Protocols in tests, prefer a hand-written Protocol fake (a concrete class that explicitly implements the Protocol) over `mock_of[T]` if the Protocol only defines annotation-only attributes (e.g., `username: str`, `password: str`, `dbname: str`) with no class-level values/assignments. This is because `mock_of[T]` relies on `create_autospec(..., spec_set=True)`, which enumerates members via `dir(spec)`; annotation-only attributes are not included, so `mock_of`’s kwarg-based attribute setting can raise `AttributeError: attribute not present on spec type`. In that annotation-only case, don’t recommend `mock_of[T]`—use an explicit fake class instead.

Applied to files:

  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/security/redteam/test_models.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
📚 Learning: 2026-05-19T22:00:55.710Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 0
File: :0-0
Timestamp: 2026-05-19T22:00:55.710Z
Learning: In this codebase, the established SEC-1 pattern for prompt-injection protection is "wrap-at-LLM-prompt-site, not wrap-at-ingestion." `wrap_untrusted(tag: str, content: str) -> str` (defined in `src/synthorg/engine/prompt_safety.py:126`) is always called at prompt-construction time with a `TAG_*` constant as the first argument. Wrapping at ingestion/adapter boundaries (e.g., inside `IntakeEntryAdapter._build_raw_intent`) would cause double-wrapping because downstream LLM sites (`agent_intake.py:203–207`, `llm_prompt.py:202`) already fence the same field. Never suggest adding `wrap_untrusted` at ingestion boundaries in this project.

Applied to files:

  • tests/unit/security/redteam/test_grounding_llm.py
📚 Learning: 2026-05-19T22:00:55.710Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 0
File: :0-0
Timestamp: 2026-05-19T22:00:55.710Z
Learning: In Aureliolo/synthorg, the SEC-1 untrusted-content protection pattern is "wrap-at-LLM-prompt-site, never at ingestion boundary." `wrap_untrusted` (defined in `src/synthorg/engine/prompt_safety.py:126` as `wrap_untrusted(tag: str, content: str) -> str`) is always called at the prompt-construction site with a `TAG_*` constant as the first argument. Both LLM-reachable sites downstream of `WorkItem.raw_intent` — `src/synthorg/engine/intake/strategies/agent_intake.py:203–207` and `src/synthorg/engine/decomposition/llm_prompt.py:202` — already wrap the field. Adding `wrap_untrusted` at ingestion adapters like `IntakeEntryAdapter._build_raw_intent` would double-wrap content (nested fences) and is explicitly incorrect for this codebase. Never suggest wrapping at ingestion boundaries in this project.

Applied to files:

  • tests/unit/security/redteam/test_grounding_llm.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to tests/unit/api/conftest.py : Boundary `suppress_type_checks` on `api.app.create_app` at `tests/unit/api/conftest.py` due to import cycles; typeguard is a pure test dep

Applied to files:

  • tests/unit/security/redteam/test_grounding_factory.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/api/**/*.py : Runtime services: use ONE provider-present switch returning `RuntimeServices` pair (worker execution service + multi-agent coordinator) from shared boot `AgentEngine`; empty company rejects task creation (AgentRuntimeNotConfiguredError, 4014) and 503s on /coordinate; `swap_*` functions hold locks

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/security/redteam/models.py
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, `data/codebase_map.json` is a generated artifact regenerated by `scripts/generate_feature_index.py`; it mirrors source/module inventory and should not be hand-edited as the source of truth for module size fixes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:22.269Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14985-14988
Timestamp: 2026-06-02T05:42:22.269Z
Learning: For `src/synthorg/providers/drivers/litellm_driver.py`, module-size reviews must account for the approved baseline in `scripts/_module_size_baseline.json`. The file is an approved baselined offender at 740 LOC, so a PR that keeps it below that baseline or net-shrinks it can pass ADR-0006 even if `data/codebase_map.json` shows `loc` above the generic `loc_cap` of 700.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:24.286Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14929-14931
Timestamp: 2026-06-02T05:42:24.286Z
Learning: For module-size reviews in this repository, `src/synthorg/providers/cassette/provider.py` is an approved existing offender with a baseline of 516 LOC in `scripts/_module_size_baseline.json`; under ADR-0006, PRs are acceptable when this file does not grow past that baseline and preferably net-shrinks, even if `data/codebase_map.json` shows `loc` above the nominal `loc_cap` of 500.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, module-size enforcement is performed by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`, which honor pre-existing offenders baselined in `scripts/_module_size_baseline.json`; raw `loc`/`loc_cap` pairs in generated `data/codebase_map.json` do not encode those baseline exemptions and can over-flag generated-map refreshes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Comments WHY only; no reviewer citations / issue back-refs / migration framing; enforced by `check_no_review_origin_in_code.py` + `check_no_migration_framing.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Graph-level smells (fan-in >=30, LCOM4, budget-pressure within 20% of tier cap) gated by `check_architecture_drift.py` vs `data/architecture_report.json`; regenerate via `scripts/architecture_report.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-05-21T22:55:24.387Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:24.387Z
Learning: In the Aureliolo/synthorg repo (Python 3.14+), `scripts/check_no_magic_numbers.py` explicitly excludes raw call-site arguments from its numeric-literal scan — it only checks module-level assignments and function/method parameter defaults. Therefore, numeric literals used as arguments to Pydantic `Field()` calls (e.g., `Field(ge=0, le=100)`, `Field(ge=1, le=50)`) are not flagged by the magic-number gate and should NOT be flagged in code review. This is an established idiom across the repo for Pydantic validator bounds.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/observability/**/*.py : Telemetry: opt-in, off by default; every event property must be in `_ALLOWED_PROPERTIES` per telemetry.md

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.290Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.290Z
Learning: Applies to src/synthorg/**/*.py : Module-size budget per `# module-kind:` header: `controller` 400, `service`/`orchestrator` 600, `complex_service` 1100, `repository` 500, `adapter`/`integration` 700, `feature` 100, `code` 500, `tests` 800, `declarative` exempt, `generated` glob-exempt; enforced by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/core/**/*.py : Project uses src/synthorg/core/ for shared domain models, base classes, resilience config

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/engine/**/*.py : Project uses src/synthorg/engine/ for orchestration, execution loops, task engine, coordination, checkpoint recovery, approval/review gates, stagnation detection, context budget, compaction, hybrid loop, workspace/, workflow/

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/providers/**/*.py : Project uses src/synthorg/providers/ for LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD, provider families, discovery SSRF allowlist, health tracking, active health probing, routing/

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/api/**/*.py : API startup has two phases: construction (wire synchronous services) then on_startup (wire services needing persistence); follow ordering invariants for `agent_registry`, `auto_wire_meetings`, `tunnel_provider`, `SettingsService`, `WorkflowExecutionObserver`, `OntologyService`, cost-dial services, knowledge substrate, `EnvironmentService`, mid-flight steering (INBOX vs SERVICE split), and red-team report repo

Applied to files:

  • scripts/_ghost_wiring_manifest.txt
🔇 Additional comments (17)
data/codebase_map.json (1)

6067-6067: LGTM!

Also applies to: 12549-12549, 16014-16014, 16098-16098, 16105-16105, 16119-16119, 16133-16133, 16140-16140, 16147-16150, 16154-16154, 16161-16161, 16168-16168, 16175-16175, 16181-16185, 16188-16192, 16196-16196, 16224-16224, 18667-18667

src/synthorg/security/redteam/grounding/_llm.py (1)

253-259: ⚡ Quick win

Rejecting non-finite confidences here isn’t necessary—NaN/±inf are dropped earlier.

parse_grounding_verdict() clamps raw_confidence, but the provider tool-call ingestion path already rejects non-finite floats: src/synthorg/providers/drivers/mappers.py:_parse_arguments() and src/synthorg/providers/drivers/litellm_tool_accumulator.py:_ToolCallAccumulator.build() both require json.dumps(..., allow_nan=False) to succeed, and UngroundedClaim also has allow_inf_nan=False. As a result, float("nan")/±inf should not reach this clamp, so the proposed “turn malformed into valid confidence” correctness issue doesn’t apply. [optional] Add an math.isfinite() check only if you want defense-in-depth against any non-standard tool-call construction paths.

			> Likely an incorrect or invalid review comment.
src/synthorg/security/redteam/__init__.py (1)

45-49: LGTM!

Also applies to: 81-86

src/synthorg/security/redteam/grounding/__init__.py (1)

8-13: LGTM!

Also applies to: 16-19, 23-29, 33-37

src/synthorg/security/feature.py (1)

36-36: LGTM!

src/synthorg/observability/events/red_team.py (1)

30-31: LGTM!

Also applies to: 43-49, 51-91

docs/design/security.md (1)

739-753: LGTM!

Also applies to: 762-768, 787-793

docs/reference/conventions.md (1)

423-423: LGTM!

scripts/_ghost_wiring_manifest.txt (1)

73-73: LGTM!

data/feature_index.json (1)

1094-1094: LGTM!

Also applies to: 1229-1229

tests/integration/security/redteam/test_gate_substrate_grounding.py (1)

1-262: LGTM!

tests/unit/security/redteam/test_grounding_factory.py (1)

6-60: LGTM!

tests/unit/security/redteam/test_grounding_findings.py (1)

1-79: LGTM!

tests/unit/security/redteam/test_grounding_llm.py (1)

1-215: LGTM!

tests/unit/security/redteam/test_models.py (1)

209-231: LGTM!

tests/unit/security/redteam/test_routing.py (1)

16-24: LGTM!

Also applies to: 161-199

tests/unit/workers/test_red_team_runtime_resolver.py (1)

1-137: LGTM!

Comment thread src/synthorg/security/redteam/_grounding_findings.py Outdated
Comment on lines +177 to +184
flagged: list[UngroundedClaim] = []
for claim in claims:
ungrounded = await self._evaluate_claim(
context, claim, execution_id, project_id
)
if ungrounded is not None:
flagged.append(ungrounded)
return tuple(flagged)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/synthorg/security/redteam/grounding/substrate.py"

# Show surrounding lines around 140-240, including loop and any TaskGroup usage
nl -ba "$FILE" | sed -n '120,240p'

# Search for TaskGroup usage in file
rg -n "TaskGroup" "$FILE" || true

# Show definitions of _evaluate_claim and check methods to see semantics and error handling
rg -n "def check|async def check|_evaluate_claim" "$FILE"

Repository: Aureliolo/synthorg

Length of output: 105


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/synthorg/security/redteam/grounding/substrate.py"

echo "== File context 120-240 (with line numbers via awk) =="
awk 'NR>=120 && NR<=240 {printf "%d\t%s\n", NR, $0}' "$FILE"

echo
echo "== TaskGroup usage =="
rg -n "TaskGroup" "$FILE" || true

echo
echo "== check / _evaluate_claim definitions =="
rg -n "def check|async def check|_evaluate_claim" "$FILE"

echo
echo "== _evaluate_claim body (best-effort range) =="
# Print a smaller window around any _evaluate_claim match
rg -n "_evaluate_claim" "$FILE" | while IFS=: read -r ln _; do
  start=$((ln-40)); end=$((ln+120));
  echo "--- around line $ln ($start-$end) ---"
  awk -v s="$start" -v e="$end" 'NR>=s && NR<=e {printf "%d\t%s\n", NR, $0}' "$FILE"
done

Repository: Aureliolo/synthorg

Length of output: 17828


Fan out per-claim evaluation with a bounded asyncio.TaskGroup instead of serial awaits.

KnowledgeSubstrateGroundingChecker.check currently does for claim in claims: await self._evaluate_claim(...), so claim search/entailment runs sequentially up to MAX_CLAIMS. This makes worst-case latency scale linearly with claim count and risks missing/under-finding later claims due to timeout pressure. Switch this to a bounded asyncio.TaskGroup fan-out/fan-in while preserving the existing fail-soft behavior from _evaluate_claim (only letting asyncio.CancelledError propagate).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/security/redteam/grounding/substrate.py` around lines 177 - 184,
Change serial awaits in KnowledgeSubstrateGroundingChecker.check to fan out
per-claim evaluations using an asyncio.TaskGroup bounded to MAX_CLAIMS: create
an asyncio.TaskGroup, use an asyncio.Semaphore(MAX_CLAIMS) (or an async-limited
task starter) so at most MAX_CLAIMS tasks run concurrently, schedule
_evaluate_claim(context, claim, execution_id, project_id) for each claim as a
task, have each task catch and swallow all exceptions except
asyncio.CancelledError (preserving the current fail-soft behavior), collect task
results when the group finishes, filter out None results and return them as a
tuple (replacing the current flagged list append loop). Ensure the unique
symbols referenced are KnowledgeSubstrateGroundingChecker.check,
_evaluate_claim, and MAX_CLAIMS.

Comment thread tests/unit/security/redteam/test_grounding_substrate.py Outdated
Aureliolo added 4 commits June 4, 2026 00:44
CI: regenerate runtime_stats.yaml (test count crossed 35,000+) for the Runtime Stats Freshness Gate; raise the aiohttp override floor to 3.14.0 (patches request-smuggling CVE-2026-34993/47265) for the Python Security Audit gate.

CodeRabbit: drop below-floor substrate claims instead of coercing them to LOW (severity helpers return None, gate filters); fan out per-claim search+entailment in substrate.check via a bounded asyncio.TaskGroup, preserving order and fail-soft semantics; inject a deterministic fallback into the degradation tests and make the per-claim resilience mocks order-independent for the concurrent fan-out.

Gemini: both guards target states the types and early-returns already preclude and would be unreachable under warn_unreachable; not applied, replied on-thread.
@Aureliolo Aureliolo force-pushed the feat/redteam-grounding-checker branch from 5b9d4c9 to f4548d0 Compare June 3, 2026 23:05
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview June 3, 2026 23:07 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/security/redteam/gate.py (1)

208-211: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale docstring: now merges grounding, not just heuristic, findings.

This PR generalized the wording from "heuristic" to "grounding" everywhere else (see lines 13-18). The _archive_report merged_report arg doc still says agent + heuristic findings.

📝 Proposed doc fix
-            merged_report: The merged report (agent + heuristic findings).
+            merged_report: The merged report (agent + grounding findings).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/security/redteam/gate.py` around lines 208 - 211, Update the
stale docstring for the _archive_report function: change the merged_report
argument description from "agent + heuristic findings" to "agent + grounding
findings" (or similar wording matching the rest of the PR that uses "grounding")
so the docstring accurately reflects that merged_report contains agent and
grounding findings; update only the text in the _archive_report docstring to use
the new terminology.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/workers/_red_team_runtime.py`:
- Around line 69-83: The _resolve() function assumes registry.list_providers()
returns a non-empty list; add a defensive check after calling
registry.list_providers() to handle an empty returned list (e.g., if not
available: log a warning using RED_TEAM_GROUNDING_SUBSTRATE_DEGRADED and return
None or otherwise gracefully degrade). Specifically, in _resolve() after
registry = app_state.slice(ProvidersStateSlice).registry and available =
registry.list_providers(), check for an empty available and bail out before
referencing available[0]; keep existing behavior for when provider_name is in
available and use logger.warning/provider_name fallback only when available is
non-empty.

---

Outside diff comments:
In `@src/synthorg/security/redteam/gate.py`:
- Around line 208-211: Update the stale docstring for the _archive_report
function: change the merged_report argument description from "agent + heuristic
findings" to "agent + grounding findings" (or similar wording matching the rest
of the PR that uses "grounding") so the docstring accurately reflects that
merged_report contains agent and grounding findings; update only the text in the
_archive_report docstring to use the new terminology.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d1e047f5-1c11-415b-ad09-14afc3219913

📥 Commits

Reviewing files that changed from the base of the PR and between 5b9d4c9 and f4548d0.

📒 Files selected for processing (33)
  • data/codebase_map.json
  • data/feature_index.json
  • data/runtime_stats.yaml
  • docs/design/security.md
  • docs/reference/conventions.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/config.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/workers/_red_team_runtime.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • tests/unit/security/redteam/test_models.py
  • tests/unit/security/redteam/test_routing.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: Build Backend
  • GitHub Check: Test Unit (shard 2)
  • GitHub Check: Test Integration (shard 1)
  • GitHub Check: Test Unit (shard 3)
  • GitHub Check: Test Integration (shard 2)
  • GitHub Check: Test Unit (shard 4)
  • GitHub Check: Test Integration (shard 4)
  • GitHub Check: Test Integration (shard 3)
  • GitHub Check: Test Unit (shard 1)
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Test E2E
  • GitHub Check: Runtime Stats Freshness Gate
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Build Preview
  • GitHub Check: pyright (advisory)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers, mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/reference/conventions.md
  • docs/design/security.md
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Non-persistence modules must not import sqlite/psycopg or emit raw SQL; use repository protocols from persistence/_generics.py

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env; no os.environ.get outside startup
Numerics must live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants; enforced by scripts/check_no_magic_numbers.py
Module-size budget per # module-kind: header: controller 400, service/orchestrator 600, complex_service 1100, repository 500, adapter/integration 700, feature 100, code 500, tests 800, declarative exempt, generated glob-exempt; enforced by check_module_size_budget.py and check_no_growth_in_god_modules.py
Use declarative .importlinter contracts (forbidden-only, direct-imports, blessed back-edges; NO total-order layers); enforced by lint-imports (pre-push + CI) alongside 3 custom AST gates
Graph-level smells (fan-in >=30, LCOM4, budget-pressure within 20% of tier cap) gated by check_architecture_drift.py vs data/architecture_report.json; regenerate via scripts/architecture_report.py
No from __future__ import annotations (3.14 has PEP 649); use PEP 758 syntax for exception handling; type-only imports at module level (not in if TYPE_CHECKING: except for genuine import-cycle breakers)
Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines
Define errors as <Domain><Condition>Error inheriting from DomainError, never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py
Use Pydantic v2 frozen + extra="forbid" on every frozen model in src/synthorg/ AND tests/; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt; NotBlankStr for identifiers
Use model_copy(update=...) or copy.deepcopy() for immutability; deepcopy at system boundaries
Use asyncio.TaskGroup fo...

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Test doubles: use ladder in conventions.md section 12.1; FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline)
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
API test client: HTTP tests use async_test_client fixture (LoopAsyncClient, portal-free); WebSocket tests use sync ws_test_client (TestClient); Windows socket.socketpair retry wrapper in tests/conftest.py is PERMANENT guard for CPython 122797
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests, fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/security/redteam/test_models.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_routing.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • tests/unit/security/redteam/test_grounding_substrate.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/security/redteam/test_models.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_routing.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • tests/unit/security/redteam/test_grounding_substrate.py
src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default; every event property must be in _ALLOWED_PROPERTIES per telemetry.md

Files:

  • src/synthorg/observability/events/red_team.py
🧠 Learnings (44)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • docs/reference/conventions.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/security/redteam/builder.py
  • data/codebase_map.json
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.

Applied to files:

  • docs/reference/conventions.md
  • data/codebase_map.json
  • src/synthorg/security/redteam/gate.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/reference/conventions.md
  • src/synthorg/security/feature.py
  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • docs/reference/conventions.md
  • data/codebase_map.json
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/reference/conventions.md
  • docs/design/security.md
📚 Learning: 2026-05-30T13:52:57.602Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2170
File: docs/reference/cli-commands.md:32-32
Timestamp: 2026-05-30T13:52:57.602Z
Learning: In Aureliolo/synthorg, data/runtime_stats.yaml is auto-generated by generate_runtime_stats.py from code scans. Hand-adding keys to this file would fail the check_runtime_stats_freshness gate. Therefore, never suggest adding RS marker keys to data/runtime_stats.yaml for values that are external defaults (e.g., Docker Compose's own built-in 10s shutdown timeout) — those are not SynthOrg-generated runtime stats and cannot be tracked this way.

Applied to files:

  • data/runtime_stats.yaml
📚 Learning: 2026-06-03T18:57:05.290Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.290Z
Learning: Applies to README.md : Numerics in README and public docs must be sourced from `data/runtime_stats.yaml` via `<!--RS:NAME-->` markers per data/README.md

Applied to files:

  • data/runtime_stats.yaml
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/security/redteam/test_routing.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/security/redteam/test_routing.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-05-29T08:50:58.380Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2160
File: src/synthorg/persistence/sqlite/escalation_repo.py:370-370
Timestamp: 2026-05-29T08:50:58.380Z
Learning: In this repo, Ruff flake8-unused-arguments (ARG002) already suppresses unused-argument warnings on parameters of methods decorated with `override` (from `typing`). Therefore, if you see `# noqa: ARG002` (or equivalent) on parameters of an `override`-decorated method, treat it as stale/unused and remove it. Do not recommend re-adding `# noqa: ARG002` in these cases, because Ruff will flag the redundant directive (RUF100) and fail the Ruff CI gate.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/security/redteam/test_routing.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-05-31T18:00:32.445Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: src/synthorg/engine/intervention/models.py:182-203
Timestamp: 2026-05-31T18:00:32.445Z
Learning: In this repository, `NotBlankStr` is a Pydantic `Annotated[str, ...]` type alias (defined in `synthorg/core/types.py`). At runtime, calling `NotBlankStr(value)` acts like an identity/cast to `str(value)` and does not execute the `StringConstraints` or `AfterValidator(...)`. Therefore, during code review, do not treat `NotBlankStr(x)` used inside non-Pydantic model methods as a place that would raise `ValidationError`; it won’t. Similarly, when `tuple[NotBlankStr, ...]` values are involved, `NotBlankStr` erases to `str` at runtime, so membership tests/comparisons can be done with raw `str` values.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-06-03T11:43:13.104Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2200
File: tests/unit/engine/artifacts/test_service.py:42-45
Timestamp: 2026-06-03T11:43:13.104Z
Learning: For the D7 protocol method `save_returning_outcome(artifact: Artifact) -> bool` defined in `src/synthorg/persistence/artifact_protocol.py`, any implementation—including fake/stub/test doubles—must use the exact same parameter name `artifact` (i.e., `save_returning_outcome(self, artifact=...)` / `save_returning_outcome(self, artifact: Artifact)`), not `entity`. This name must match for typeguard positional-or-keyword name conformance. Do not suggest renaming the protocol method’s parameter to `entity`.

Applied to files:

  • src/synthorg/engine/review_gate_inputs.py
  • src/synthorg/security/feature.py
  • src/synthorg/security/redteam/grounding/protocol.py
  • src/synthorg/security/redteam/grounding/heuristic.py
  • tests/unit/security/redteam/test_models.py
  • src/synthorg/security/redteam/grounding/resolver.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • src/synthorg/security/redteam/__init__.py
  • src/synthorg/workers/_red_team_runtime.py
  • src/synthorg/security/redteam/grounding/models.py
  • src/synthorg/observability/events/red_team.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • src/synthorg/security/redteam/routing.py
  • src/synthorg/security/redteam/_grounding_findings.py
  • tests/unit/security/redteam/test_routing.py
  • src/synthorg/security/redteam/builder.py
  • src/synthorg/security/redteam/models.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • src/synthorg/security/redteam/grounding/substrate.py
  • src/synthorg/security/config.py
  • src/synthorg/security/redteam/gate.py
  • src/synthorg/security/redteam/grounding/__init__.py
  • src/synthorg/security/redteam/grounding/_llm.py
  • tests/unit/security/redteam/test_grounding_substrate.py
  • src/synthorg/security/redteam/grounding/factory.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/api/**/*.py : API startup has two phases: construction (wire synchronous services) then on_startup (wire services needing persistence); follow ordering invariants for `agent_registry`, `auto_wire_meetings`, `tunnel_provider`, `SettingsService`, `WorkflowExecutionObserver`, `OntologyService`, cost-dial services, knowledge substrate, `EnvironmentService`, mid-flight steering (INBOX vs SERVICE split), and red-team report repo

Applied to files:

  • scripts/_ghost_wiring_manifest.txt
📚 Learning: 2026-05-23T12:24:00.128Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2080
File: tests/_shared/test_postgres_proxy.py:19-48
Timestamp: 2026-05-23T12:24:00.128Z
Learning: When creating test doubles for Python typing.Protocols in tests, prefer a hand-written Protocol fake (a concrete class that explicitly implements the Protocol) over `mock_of[T]` if the Protocol only defines annotation-only attributes (e.g., `username: str`, `password: str`, `dbname: str`) with no class-level values/assignments. This is because `mock_of[T]` relies on `create_autospec(..., spec_set=True)`, which enumerates members via `dir(spec)`; annotation-only attributes are not included, so `mock_of`’s kwarg-based attribute setting can raise `AttributeError: attribute not present on spec type`. In that annotation-only case, don’t recommend `mock_of[T]`—use an explicit fake class instead.

Applied to files:

  • tests/unit/security/redteam/test_models.py
  • tests/unit/security/redteam/test_grounding_factory.py
  • tests/unit/security/redteam/test_grounding_llm.py
  • tests/unit/security/redteam/test_grounding_findings.py
  • tests/unit/workers/test_red_team_runtime_resolver.py
  • tests/unit/security/redteam/test_routing.py
  • tests/integration/security/redteam/test_gate_substrate_grounding.py
  • tests/unit/security/redteam/test_grounding_substrate.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to tests/unit/api/conftest.py : Boundary `suppress_type_checks` on `api.app.create_app` at `tests/unit/api/conftest.py` due to import cycles; typeguard is a pure test dep

Applied to files:

  • tests/unit/security/redteam/test_grounding_factory.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : No `from __future__ import annotations` (3.14 has PEP 649); use PEP 758 syntax for exception handling; type-only imports at module level (not in `if TYPE_CHECKING:` except for genuine import-cycle breakers)

Applied to files:

  • src/synthorg/security/redteam/__init__.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/api/**/*.py : Runtime services: use ONE provider-present switch returning `RuntimeServices` pair (worker execution service + multi-agent coordinator) from shared boot `AgentEngine`; empty company rejects task creation (AgentRuntimeNotConfiguredError, 4014) and 503s on /coordinate; `swap_*` functions hold locks

Applied to files:

  • src/synthorg/workers/_red_team_runtime.py
📚 Learning: 2026-05-19T22:00:55.710Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 0
File: :0-0
Timestamp: 2026-05-19T22:00:55.710Z
Learning: In this codebase, the established SEC-1 pattern for prompt-injection protection is "wrap-at-LLM-prompt-site, not wrap-at-ingestion." `wrap_untrusted(tag: str, content: str) -> str` (defined in `src/synthorg/engine/prompt_safety.py:126`) is always called at prompt-construction time with a `TAG_*` constant as the first argument. Wrapping at ingestion/adapter boundaries (e.g., inside `IntakeEntryAdapter._build_raw_intent`) would cause double-wrapping because downstream LLM sites (`agent_intake.py:203–207`, `llm_prompt.py:202`) already fence the same field. Never suggest adding `wrap_untrusted` at ingestion boundaries in this project.

Applied to files:

  • tests/unit/security/redteam/test_grounding_llm.py
📚 Learning: 2026-05-19T22:00:55.710Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 0
File: :0-0
Timestamp: 2026-05-19T22:00:55.710Z
Learning: In Aureliolo/synthorg, the SEC-1 untrusted-content protection pattern is "wrap-at-LLM-prompt-site, never at ingestion boundary." `wrap_untrusted` (defined in `src/synthorg/engine/prompt_safety.py:126` as `wrap_untrusted(tag: str, content: str) -> str`) is always called at the prompt-construction site with a `TAG_*` constant as the first argument. Both LLM-reachable sites downstream of `WorkItem.raw_intent` — `src/synthorg/engine/intake/strategies/agent_intake.py:203–207` and `src/synthorg/engine/decomposition/llm_prompt.py:202` — already wrap the field. Adding `wrap_untrusted` at ingestion adapters like `IntakeEntryAdapter._build_raw_intent` would double-wrap content (nested fences) and is explicitly incorrect for this codebase. Never suggest wrapping at ingestion boundaries in this project.

Applied to files:

  • tests/unit/security/redteam/test_grounding_llm.py
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, `data/codebase_map.json` is a generated artifact regenerated by `scripts/generate_feature_index.py`; it mirrors source/module inventory and should not be hand-edited as the source of truth for module size fixes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:22.269Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14985-14988
Timestamp: 2026-06-02T05:42:22.269Z
Learning: For `src/synthorg/providers/drivers/litellm_driver.py`, module-size reviews must account for the approved baseline in `scripts/_module_size_baseline.json`. The file is an approved baselined offender at 740 LOC, so a PR that keeps it below that baseline or net-shrinks it can pass ADR-0006 even if `data/codebase_map.json` shows `loc` above the generic `loc_cap` of 700.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-02T05:42:24.286Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2196
File: data/codebase_map.json:14929-14931
Timestamp: 2026-06-02T05:42:24.286Z
Learning: For module-size reviews in this repository, `src/synthorg/providers/cassette/provider.py` is an approved existing offender with a baseline of 516 LOC in `scripts/_module_size_baseline.json`; under ADR-0006, PRs are acceptable when this file does not grow past that baseline and preferably net-shrinks, even if `data/codebase_map.json` shows `loc` above the nominal `loc_cap` of 500.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-05-31T17:59:40.191Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2180
File: data/codebase_map.json:523-524
Timestamp: 2026-05-31T17:59:40.191Z
Learning: In this repository, module-size enforcement is performed by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`, which honor pre-existing offenders baselined in `scripts/_module_size_baseline.json`; raw `loc`/`loc_cap` pairs in generated `data/codebase_map.json` do not encode those baseline exemptions and can over-flag generated-map refreshes.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Comments WHY only; no reviewer citations / issue back-refs / migration framing; enforced by `check_no_review_origin_in_code.py` + `check_no_migration_framing.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Graph-level smells (fan-in >=30, LCOM4, budget-pressure within 20% of tier cap) gated by `check_architecture_drift.py` vs `data/architecture_report.json`; regenerate via `scripts/architecture_report.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-05-21T22:55:24.387Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:24.387Z
Learning: In the Aureliolo/synthorg repo (Python 3.14+), `scripts/check_no_magic_numbers.py` explicitly excludes raw call-site arguments from its numeric-literal scan — it only checks module-level assignments and function/method parameter defaults. Therefore, numeric literals used as arguments to Pydantic `Field()` calls (e.g., `Field(ge=0, le=100)`, `Field(ge=1, le=50)`) are not flagged by the magic-number gate and should NOT be flagged in code review. This is an established idiom across the repo for Pydantic validator bounds.

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/observability/**/*.py : Telemetry: opt-in, off by default; every event property must be in `_ALLOWED_PROPERTIES` per telemetry.md

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-06-03T18:57:05.290Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.290Z
Learning: Applies to src/synthorg/**/*.py : Module-size budget per `# module-kind:` header: `controller` 400, `service`/`orchestrator` 600, `complex_service` 1100, `repository` 500, `adapter`/`integration` 700, `feature` 100, `code` 500, `tests` 800, `declarative` exempt, `generated` glob-exempt; enforced by `check_module_size_budget.py` and `check_no_growth_in_god_modules.py`

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/core/**/*.py : Project uses src/synthorg/core/ for shared domain models, base classes, resilience config

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/engine/**/*.py : Project uses src/synthorg/engine/ for orchestration, execution loops, task engine, coordination, checkpoint recovery, approval/review gates, stagnation detection, context budget, compaction, hybrid loop, workspace/, workflow/

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-04-03T11:58:31.103Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-03T11:58:31.103Z
Learning: Applies to src/synthorg/providers/**/*.py : Project uses src/synthorg/providers/ for LLM provider abstraction, presets, model auto-discovery, capabilities, runtime CRUD, provider families, discovery SSRF allowlist, health tracking, active health probing, routing/

Applied to files:

  • data/codebase_map.json
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • src/synthorg/security/redteam/grounding/substrate.py
📚 Learning: 2026-06-03T18:57:05.291Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-06-03T18:57:05.291Z
Learning: Applies to src/synthorg/**/*.py : Use `asyncio.TaskGroup` for fan-out/fan-in; helpers catch `Exception` (re-raise `MemoryError`/`RecursionError`)

Applied to files:

  • src/synthorg/security/redteam/grounding/substrate.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`

Applied to files:

  • src/synthorg/security/redteam/grounding/substrate.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • src/synthorg/security/redteam/grounding/substrate.py
📚 Learning: 2026-04-08T19:14:11.270Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T19:14:11.270Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code

Applied to files:

  • src/synthorg/security/redteam/grounding/substrate.py
🔇 Additional comments (24)
tests/unit/security/redteam/test_grounding_factory.py (1)

1-61: LGTM!

tests/unit/security/redteam/test_grounding_findings.py (1)

1-82: LGTM!

tests/unit/security/redteam/test_grounding_llm.py (1)

1-215: LGTM!

tests/unit/security/redteam/test_models.py (1)

219-231: LGTM!

tests/unit/security/redteam/test_routing.py (1)

16-23: LGTM!

Also applies to: 161-199

tests/unit/security/redteam/test_grounding_substrate.py (1)

1-489: LGTM!

tests/unit/workers/test_red_team_runtime_resolver.py (1)

1-138: LGTM!

tests/integration/security/redteam/test_gate_substrate_grounding.py (1)

1-262: LGTM!

src/synthorg/security/redteam/routing.py (1)

70-122: LGTM!

src/synthorg/security/redteam/_grounding_findings.py (1)

68-128: LGTM! Below-floor substrate claims now correctly propagate None and are dropped rather than coerced into LOW findings, resolving the earlier routing-contract concern.

src/synthorg/security/redteam/gate.py (2)

155-162: LGTM!


367-371: Confirm project_id is wired end-to-end (input → grounding checker).

RedTeamReviewInput defines project_id: NotBlankStr | None, and GroundingChecker.check accepts project_id: NotBlankStr | None = None as a keyword-only argument, so project_id=review_input.project_id matches the protocol signature and should typecheck under mypy strict.

src/synthorg/security/redteam/grounding/factory.py (1)

1-81: LGTM!

src/synthorg/security/redteam/grounding/heuristic.py (1)

126-157: LGTM!

src/synthorg/security/redteam/grounding/__init__.py (1)

1-41: LGTM!

src/synthorg/security/redteam/__init__.py (1)

43-52: LGTM!

Also applies to: 76-113

src/synthorg/observability/events/red_team.py (1)

27-91: LGTM!

docs/design/security.md (1)

739-793: LGTM!

docs/reference/conventions.md (1)

423-423: LGTM!

src/synthorg/security/redteam/builder.py (1)

42-44: LGTM!

Also applies to: 128-128, 153-158, 193-196

src/synthorg/workers/_red_team_runtime.py (1)

8-17: LGTM!

Also applies to: 45-50, 84-91, 116-118, 131-131, 136-139

src/synthorg/security/feature.py (1)

36-36: LGTM!

scripts/_ghost_wiring_manifest.txt (1)

73-73: LGTM!

data/codebase_map.json (1)

6067-6067: LGTM!

Also applies to: 12549-12549, 16014-16014, 16098-16098, 16105-16105, 16119-16119, 16133-16133, 16140-16140, 16147-16150, 16154-16154, 16161-16161, 16168-16168, 16175-16175, 16182-16185, 16189-16192, 16196-16196, 16224-16224, 18667-18667

Comment thread src/synthorg/workers/_red_team_runtime.py
CodeRabbit: guard the substrate resolver on the provider list directly (if not available) instead of the len(registry) proxy, so it degrades cleanly even if a future registry impl diverges len from list_providers; fix stale _archive_report docstring (agent + heuristic findings to agent + grounding findings) missed by the heuristic-to-grounding generalisation. Regenerate feature index for the changed source.
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview June 3, 2026 23:34 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit dbb50db into main Jun 4, 2026
91 checks passed
@Aureliolo Aureliolo deleted the feat/redteam-grounding-checker branch June 4, 2026 05:43
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview June 4, 2026 05:43 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

prio:medium Should do, but not blocking type:tech-debt

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: substrate-backed red-team grounding checker

1 participant