feat(map-review-codex): port map-review skill to the Codex provider#322
Conversation
…nfigs Port predictor.md.jinja and evaluator.md.jinja to 3-key Codex TOML schema, condensed while preserving tiered-triage/risk enums and the seven-dimension weighted rubric with minimality-gated deletion lens.
… config.toml Add [agents.predictor] and [agents.evaluator] entries so spawn_agent can reference them; matches the existing decomposer/monitor/researcher entry shape.
Skeleton with name+description frontmatter, $map-review invocation, and mode-routing hooks (normal/adversarial/cross-ai) pointing at review-reference.md and adversarial-reference.md, which land in ST-006 and ST-005 respectively. Zero Claude-only API tokens.
…p-review skill Positive test only for map-review (not a generic Claude->Codex parity gate); skipped until review-reference.md/adversarial-reference.md land via ST-005/ST-006.
…ss-ai wiring adversarial-reference.md.jinja ports Blind Hunter / Edge Case Hunter / Acceptance Auditor as sequential in-session passes (no spawn_agent()), with an explicit unverified-assumption note about ad-hoc agent spawning. SKILL.md.jinja gains --cross-ai handling that reuses run_cross_ai_review verbatim (no reimplemented secret-scan/injection logic), preserves fall-through-never-hard-stop, and keeps --ci/--auto/--detached plus the --compare-orderings/--shuffle-sections mutual exclusion.
…dispatch review-reference.md.jinja ports section rubrics, compare-orderings, and the minimality-gated what-to-delete lens. SKILL.md.jinja gains the normal-mode 4-section walkthrough (## Architecture/Code Quality/Tests/ Performance) with spawn_agent(agent_type=monitor|predictor|evaluator) dispatch, a single PROCEED|REVISE|BLOCK Final Verdict, and verbatim reuse of the 15 provider-neutral review CLI verbs. Also refreshes the tests/fixtures/codex/config.toml golden fixture that went stale after ST-001/ST-002 registered predictor/evaluator.
Add $map-review row to the Codex AGENTS.md skill table, a CHANGELOG entry for the port, and correct the stale RELEASING.md upgrade note that referenced a nonexistent .codex/skills/ path (Codex skills install under .agents/skills/).
Records three new lessons from the 7-subtask /map-efficient run: Monitor scope-correction via same-thread re-argument for forward- reference false positives, three-way spec/source/output header text drift resolution, and shared golden-fixture staleness surfacing at a later subtask than its origin.
|
Warning Review limit reached
Next review available in: 26 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR ports the ChangesMap-review Codex Port
Estimated code review effort: 3 (Moderate) | ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
src/mapify_cli/templates/codex/agents/evaluator.toml (1)
230-231: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winScoring floor contradicts calibration text.
Line 231 states scores are "integers 1-10 (never 0, ...)" but the per-dimension rubric repeatedly defines a "0-1" bucket (e.g. Lines 46, 68, 87, 96, 107) implying 0 is a valid conceptual score. This ambiguity could cause the evaluator to either violate the "never 0" rule when a dimension is genuinely non-functional, or clamp a legitimately terrible score up to 1 inconsistently with the rubric text.
✏️ Suggested clarification
Apply the same floor clarification to the other "0-1" bucket descriptions (Lines 68, 87, 96, 107).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mapify_cli/templates/codex/agents/evaluator.toml` around lines 230 - 231, The scoring rules in evaluator.toml are inconsistent: the validation text says dimension scores are integers 1-10 and never 0, while the rubric buckets still describe a 0-1 range. Update the validation/rubric language so the floor is unambiguous, and make the bucket descriptions in the affected scoring sections (including the ones with the 0-1 bucket) match the intended minimum score. Keep the terminology consistent across the relevant rubric entries and the overall validation rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.agents/skills/map-review/adversarial-reference.md:
- Around line 105-108: The validation step for truncated adversarial output is
using the wrong monitor identifier, so it may not exercise the documented
reviewer monitor path. Update the `map_step_runner.py
detect_truncated_agent_output` invocation to use `monitor` instead of
`review-monitor`, matching the expected monitor kind used elsewhere in the
`map-review` flow.
In @.agents/skills/map-review/SKILL.md:
- Around line 181-184: The truncation check is invoking
detect_truncated_agent_output with the wrong agent identifier, so update the
SKILL.md command to use the documented monitor path via the existing
map_step_runner call. Keep the pipeline structure unchanged, but replace the
review-monitor argument with the correct monitor value so the gate uses the
intended parsing behavior.
In @.codex/agents/evaluator.toml:
- Around line 40-114: The scoring rubric is inconsistent because each dimension
includes a 0-1 bucket while the output rule says scores must be integers 1-10.
Update the rubric in evaluator.toml so the lowest bucket starts at 1-2 or
otherwise aligns with the 1-10 constraint, and make sure the summary text for
each dimension (like Functionality, Completeness, Security, etc.) no longer
implies 0 is valid.
In
`@src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja`:
- Around line 20-27: The review-mode JSON is being built by interpolating
SIBLING_HINT and REVIEW_MODE directly into an echo string, so unescaped
commit-message text can break parsing. Update the review-mode.json generation
step in the review-reference template to serialize the payload safely the same
way the other workflow steps do, instead of hand-assembling JSON. Keep the
existing SIBLING_HINT extraction from git log, but ensure the final write to
.map/$BRANCH/review-mode.json escapes quotes, backslashes, and newlines
correctly.
- Around line 92-121: The dispatch guidance in the review reference is
inconsistent with the documented full vs lightweight mode split, and the bare
code fence also triggers markdownlint. Update the section around the
`spawn_agent(...)` examples so it clearly marks `monitor` as always run and
`predictor`/`evaluator` as full-mode only, matching the `Dispatch` prose and the
`COMPLEXITY_LENS_ENABLED` conditional. Also add a language tag to the fenced
block so the markdown stays lint-clean while preserving the referenced agent
symbols and output-contract wording.
In `@src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja`:
- Around line 55-68: Clarify the execution rule around launchReviewerAgents so
it does not prohibit the required truncated-output retry in Step A.2b. Update
the wording in the map-review SKILL template to state that reviewer agents are
launched once per review run initially, but a single retry is still allowed when
truncated output is detected. Keep the order and intent of the phase list intact
while making the retry exception explicit.
- Around line 141-168: The dispatch example in SKILL.md.jinja contradicts the
lightweight/full mode description, so update the example to show mode-dependent
branching: monitor only for lightweight mode, and monitor plus predictor plus
evaluator for full mode, using the same spawn_agent(agent_type=...) symbols
already referenced. Keep the complexity_lens dispatch under its existing
COMPLEXITY_LENS_ENABLED guard, and add a language identifier to the fenced code
block to match the markdownlint expectation.
In `@tests/test_mapify_cli.py`:
- Around line 1728-1778: The AC number for this map-review skill existence test
is reused and conflicts with the existing AC-9 section, making failures
ambiguous. Update the criterion identifier in
test_ac09_codex_map_review_skill_exists to a distinct AC number that is not
already used elsewhere in tests/test_mapify_cli.py, and keep the assertion logic
for the templates_dir and agents_skills_dir checks unchanged.
---
Nitpick comments:
In `@src/mapify_cli/templates/codex/agents/evaluator.toml`:
- Around line 230-231: The scoring rules in evaluator.toml are inconsistent: the
validation text says dimension scores are integers 1-10 and never 0, while the
rubric buckets still describe a 0-1 range. Update the validation/rubric language
so the floor is unambiguous, and make the bucket descriptions in the affected
scoring sections (including the ones with the 0-1 bucket) match the intended
minimum score. Keep the terminology consistent across the relevant rubric
entries and the overall validation rules.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8eb5d994-901a-4952-8b1c-d79e795542be
📒 Files selected for processing (27)
.agents/skills/map-review/SKILL.md.agents/skills/map-review/adversarial-reference.md.agents/skills/map-review/review-reference.md.claude/rules/learned/architecture-patterns.md.claude/rules/learned/testing-strategies.md.codex/AGENTS.md.codex/agents/evaluator.toml.codex/agents/predictor.toml.codex/config.tomlCHANGELOG.mdRELEASING.mdsrc/mapify_cli/templates/codex/AGENTS.mdsrc/mapify_cli/templates/codex/agents/evaluator.tomlsrc/mapify_cli/templates/codex/agents/predictor.tomlsrc/mapify_cli/templates/codex/config.tomlsrc/mapify_cli/templates/codex/skills/map-review/SKILL.mdsrc/mapify_cli/templates/codex/skills/map-review/adversarial-reference.mdsrc/mapify_cli/templates/codex/skills/map-review/review-reference.mdsrc/mapify_cli/templates_src/codex/AGENTS.md.jinjasrc/mapify_cli/templates_src/codex/agents/evaluator.toml.jinjasrc/mapify_cli/templates_src/codex/agents/predictor.toml.jinjasrc/mapify_cli/templates_src/codex/config.toml.jinjasrc/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinjasrc/mapify_cli/templates_src/codex/skills/map-review/adversarial-reference.md.jinjasrc/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinjatests/fixtures/codex/config.tomltests/test_mapify_cli.py
- Serialize review-mode.json via python3 -c (env-var passed, no shell string interpolation) instead of hand-built echo JSON, so a commit message containing a quote/backslash in SIBLING_HINT can no longer corrupt the file. Avoid nested double-quotes inside an f-string expression (Python 3.11 rejects that syntax; PEP 701 relaxed it only in 3.12+) by binding the branch to a local variable first. - Make the full/lightweight mode split explicit on the spawn_agent dispatch examples in SKILL.md.jinja and review-reference.md.jinja (predictor/evaluator calls now carry a "full mode only" comment), add a language tag to the previously-bare fence, and restore the dropped full/lightweight qualifier + truncation-retry exception to Execution Rule 6 that ST-003's condensation had silently narrowed to "exactly once". - Note evaluator.toml.jinja's "never 0" scoring rule against its "0-1" rubric bucket floor (inherited from the Claude source's identical wording; scoped a cheap clarifying note to this file only). - Disambiguate the AC-9 label on the new map-review existence test from the pre-existing map-plan AC-9 test in the same file. `--agent review-monitor` (also flagged by the bot) is intentional, not a bug: map_step_runner.py.jinja defines it as a distinct, richer schema from `monitor`, and the Claude source (SKILL.md:342) uses the same value for the same purpose — left unchanged.
Summary
map-reviewskill to the Codex provider ($map-review), reaching feature parity withmap-plan/map-efficient: normal mode dispatchesmonitor/predictor/evaluatorviaspawn_agent(agent_type=...), adversarial mode runs three sequential in-session passes (no new agent-dispatch primitive), and--cross-ai <runtime>reusesrun_cross_ai_reviewverbatim (secret-scan/injection-detection trust boundary untouched).predictor.toml,evaluator.toml, condensed from the canonical Claude prompts) and registers them inconfig.toml; ships the skill asSKILL.md+review-reference.md+adversarial-reference.md, mirroring themap-efficientsplit-reference pattern.map_step_runner.py) is reused unmodified — zero logic changes, verified viagit diffacross the whole change range.$map-reviewrow in CodexAGENTS.md, a CHANGELOG entry, and a correctedRELEASING.mdupgrade note (the Codex skill install path is.agents/skills/, not.codex/skills/)..claude/rules/learned/(Monitor scope-correction via same-thread re-argument, three-way spec/source/output header drift, shared-fixture staleness surfacing at a later subtask than its origin).Test plan
make check-render— zero diff across all generated treespytest tests/test_mapify_cli.py::TestCodexProvider::test_ac10_no_claude_refs_anywhere -q— no Claude-only tokens leaked into Codex outputpytest tests/test_mapify_cli.py -k map_review -v— scoped existence test green (no longer skip-gated)pytest -q— 3236 passed, 0 failed, 3 pre-existing skips (no regressions vs pre-plan baseline)git diffconfirmssrc/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinjauntouched across the whole changeSummary by CodeRabbit
New Features
$map-reviewskill to Codex, including interactive review, adversarial review, and cross-review support.Documentation
Tests
map-reviewskill files are available in both templates and initialized projects.