feat(map-review-codex): port map-review skill to the Codex provider by azalio · Pull Request #322 · azalio/map-framework

azalio · 2026-07-02T12:51:04Z

Summary

Ports the Claude-only map-review skill to the Codex provider ($map-review), reaching feature parity with map-plan/map-efficient: normal mode dispatches monitor/predictor/evaluator via spawn_agent(agent_type=...), adversarial mode runs three sequential in-session passes (no new agent-dispatch primitive), and --cross-ai <runtime> reuses run_cross_ai_review verbatim (secret-scan/injection-detection trust boundary untouched).
Adds two new Codex agent configs (predictor.toml, evaluator.toml, condensed from the canonical Claude prompts) and registers them in config.toml; ships the skill as SKILL.md + review-reference.md + adversarial-reference.md, mirroring the map-efficient split-reference pattern.
The shared provider-neutral runtime (map_step_runner.py) is reused unmodified — zero logic changes, verified via git diff across the whole change range.
Docs: $map-review row in Codex AGENTS.md, a CHANGELOG entry, and a corrected RELEASING.md upgrade note (the Codex skill install path is .agents/skills/, not .codex/skills/).
Captures three new lessons from this workflow into .claude/rules/learned/ (Monitor scope-correction via same-thread re-argument, three-way spec/source/output header drift, shared-fixture staleness surfacing at a later subtask than its origin).

Test plan

make check-render — zero diff across all generated trees
pytest tests/test_mapify_cli.py::TestCodexProvider::test_ac10_no_claude_refs_anywhere -q — no Claude-only tokens leaked into Codex output
pytest tests/test_mapify_cli.py -k map_review -v — scoped existence test green (no longer skip-gated)
Full suite: pytest -q — 3236 passed, 0 failed, 3 pre-existing skips (no regressions vs pre-plan baseline)
git diff confirms src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja untouched across the whole change

Summary by CodeRabbit

New Features
- Added the $map-review skill to Codex, including interactive review, adversarial review, and cross-review support.
- Added new reviewer and evaluator capabilities to the Codex setup.
Documentation
- Added detailed user-facing guidance for review modes, flags, workflow steps, and troubleshooting.
- Updated release and upgrade instructions to point to the new skill layout.
Tests
- Added an acceptance test to verify the new map-review skill files are available in both templates and initialized projects.

…nfigs Port predictor.md.jinja and evaluator.md.jinja to 3-key Codex TOML schema, condensed while preserving tiered-triage/risk enums and the seven-dimension weighted rubric with minimality-gated deletion lens.

… config.toml Add [agents.predictor] and [agents.evaluator] entries so spawn_agent can reference them; matches the existing decomposer/monitor/researcher entry shape.

Skeleton with name+description frontmatter, $map-review invocation, and mode-routing hooks (normal/adversarial/cross-ai) pointing at review-reference.md and adversarial-reference.md, which land in ST-006 and ST-005 respectively. Zero Claude-only API tokens.

…p-review skill Positive test only for map-review (not a generic Claude->Codex parity gate); skipped until review-reference.md/adversarial-reference.md land via ST-005/ST-006.

…ss-ai wiring adversarial-reference.md.jinja ports Blind Hunter / Edge Case Hunter / Acceptance Auditor as sequential in-session passes (no spawn_agent()), with an explicit unverified-assumption note about ad-hoc agent spawning. SKILL.md.jinja gains --cross-ai handling that reuses run_cross_ai_review verbatim (no reimplemented secret-scan/injection logic), preserves fall-through-never-hard-stop, and keeps --ci/--auto/--detached plus the --compare-orderings/--shuffle-sections mutual exclusion.

…dispatch review-reference.md.jinja ports section rubrics, compare-orderings, and the minimality-gated what-to-delete lens. SKILL.md.jinja gains the normal-mode 4-section walkthrough (## Architecture/Code Quality/Tests/ Performance) with spawn_agent(agent_type=monitor|predictor|evaluator) dispatch, a single PROCEED|REVISE|BLOCK Final Verdict, and verbatim reuse of the 15 provider-neutral review CLI verbs. Also refreshes the tests/fixtures/codex/config.toml golden fixture that went stale after ST-001/ST-002 registered predictor/evaluator.

Add $map-review row to the Codex AGENTS.md skill table, a CHANGELOG entry for the port, and correct the stale RELEASING.md upgrade note that referenced a nonexistent .codex/skills/ path (Codex skills install under .agents/skills/).

Records three new lessons from the 7-subtask /map-efficient run: Monitor scope-correction via same-thread re-argument for forward- reference false positives, three-way spec/source/output header text drift resolution, and shared golden-fixture staleness surfacing at a later subtask than its origin.

coderabbitai · 2026-07-02T12:51:48Z

Warning

Review limit reached

@azalio, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 26 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cb6c53ca-fd08-45bb-9eb0-62c85e5eb83e

📥 Commits

Reviewing files that changed from the base of the PR and between 5e15b3a and 522db38.

📒 Files selected for processing (10)

.agents/skills/map-review/SKILL.md
.agents/skills/map-review/review-reference.md
.codex/agents/evaluator.toml
src/mapify_cli/templates/codex/agents/evaluator.toml
src/mapify_cli/templates/codex/skills/map-review/SKILL.md
src/mapify_cli/templates/codex/skills/map-review/review-reference.md
src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja
src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja
tests/test_mapify_cli.py

📝 Walkthrough

Walkthrough

This PR ports the /map-review skill to the Codex provider. It adds SKILL.md, review-reference.md, and adversarial-reference.md documentation, new predictor and evaluator agent TOML configs, wires them into config.toml/AGENTS.md across canonical, templates, and templates_src sources, plus an acceptance test and doc/rule updates.

Changes

Map-review Codex Port

Layer / File(s)	Summary
SKILL.md workflow definition `.agents/skills/map-review/SKILL.md`, `src/mapify_cli/templates/codex/skills/map-review/SKILL.md`, `src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja`	Defines flag parsing, precheck, bundle/prompt building, agent dispatch, truncation/verification gates, cross-AI/adversarial branching, presentation, and handoff artifacts.
Adversarial reference doc `.agents/skills/map-review/adversarial-reference.md`, `src/mapify_cli/templates/codex/skills/map-review/adversarial-reference.md`, `src/mapify_cli/templates_src/codex/skills/map-review/adversarial-reference.md.jinja`	Documents the three-pass sequential adversarial workflow, JSON validation/retry, aggregation, unified report, and verdict rules.
Review reference doc `.agents/skills/map-review/review-reference.md`, `src/mapify_cli/templates/codex/skills/map-review/review-reference.md`, `src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja`	Documents review modes, flag parsing, agent dispatch, verification gate, section ordering, What-To-Delete Lens, compare-orderings, and handoff artifacts.
Evaluator agent config `.codex/agents/evaluator.toml`, `src/mapify_cli/templates/codex/agents/evaluator.toml`, `src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja`	Defines seven-dimension scoring rubric, auto-fail rules, decision tree, complexity lens mode, and JSON output contract.
Predictor agent config `.codex/agents/predictor.toml`, `src/mapify_cli/templates/codex/agents/predictor.toml`, `src/mapify_cli/templates_src/codex/agents/predictor.toml.jinja`	Defines impact-analysis workflow, risk rubric, edge-case checklist, confidence scoring, and required JSON output.
Agent wiring and skill listing `.codex/config.toml`, `src/mapify_cli/templates/codex/config.toml`, `src/mapify_cli/templates_src/codex/config.toml.jinja`, `tests/fixtures/codex/config.toml`, `.codex/AGENTS.md`, `src/mapify_cli/templates/codex/AGENTS.md`, `src/mapify_cli/templates_src/codex/AGENTS.md.jinja`	Registers predictor/evaluator agents and lists the `$map-review` skill.
Tests, changelog, release docs, learned rules `tests/test_mapify_cli.py`, `CHANGELOG.md`, `RELEASING.md`, `.claude/rules/learned/*`	Adds acceptance test for skill files, documents the port, updates upgrade instructions, and appends learned rules.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Poem

A rabbit hops through Codex's den,
Porting skills with a copy-paste pen 🐇
Monitor, Predictor, Evaluator too,
Adversarial hunters chase what's askew,
Three passes done, verdict in sight—
Ship it, revise it, or block it tonight!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly states the main change: porting the map-review skill to the Codex provider.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/map-review-codex-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

src/mapify_cli/templates/codex/agents/evaluator.toml (1)
230-231: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Scoring floor contradicts calibration text.

Line 231 states scores are "integers 1-10 (never 0, ...)" but the per-dimension rubric repeatedly defines a "0-1" bucket (e.g. Lines 46, 68, 87, 96, 107) implying 0 is a valid conceptual score. This ambiguity could cause the evaluator to either violate the "never 0" rule when a dimension is genuinely non-functional, or clamp a legitimately terrible score up to 1 inconsistently with the rubric text.
✏️ Suggested clarification
Apply the same floor clarification to the other "0-1" bucket descriptions (Lines 68, 87, 96, 107).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mapify_cli/templates/codex/agents/evaluator.toml` around lines 230 - 231,
The scoring rules in evaluator.toml are inconsistent: the validation text says
dimension scores are integers 1-10 and never 0, while the rubric buckets still
describe a 0-1 range. Update the validation/rubric language so the floor is
unambiguous, and make the bucket descriptions in the affected scoring sections
(including the ones with the 0-1 bucket) match the intended minimum score. Keep
the terminology consistent across the relevant rubric entries and the overall
validation rules.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/map-review/adversarial-reference.md:
- Around line 105-108: The validation step for truncated adversarial output is
using the wrong monitor identifier, so it may not exercise the documented
reviewer monitor path. Update the `map_step_runner.py
detect_truncated_agent_output` invocation to use `monitor` instead of
`review-monitor`, matching the expected monitor kind used elsewhere in the
`map-review` flow.

In @.agents/skills/map-review/SKILL.md:
- Around line 181-184: The truncation check is invoking
detect_truncated_agent_output with the wrong agent identifier, so update the
SKILL.md command to use the documented monitor path via the existing
map_step_runner call. Keep the pipeline structure unchanged, but replace the
review-monitor argument with the correct monitor value so the gate uses the
intended parsing behavior.

In @.codex/agents/evaluator.toml:
- Around line 40-114: The scoring rubric is inconsistent because each dimension
includes a 0-1 bucket while the output rule says scores must be integers 1-10.
Update the rubric in evaluator.toml so the lowest bucket starts at 1-2 or
otherwise aligns with the 1-10 constraint, and make sure the summary text for
each dimension (like Functionality, Completeness, Security, etc.) no longer
implies 0 is valid.

In
`@src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja`:
- Around line 20-27: The review-mode JSON is being built by interpolating
SIBLING_HINT and REVIEW_MODE directly into an echo string, so unescaped
commit-message text can break parsing. Update the review-mode.json generation
step in the review-reference template to serialize the payload safely the same
way the other workflow steps do, instead of hand-assembling JSON. Keep the
existing SIBLING_HINT extraction from git log, but ensure the final write to
.map/$BRANCH/review-mode.json escapes quotes, backslashes, and newlines
correctly.
- Around line 92-121: The dispatch guidance in the review reference is
inconsistent with the documented full vs lightweight mode split, and the bare
code fence also triggers markdownlint. Update the section around the
`spawn_agent(...)` examples so it clearly marks `monitor` as always run and
`predictor`/`evaluator` as full-mode only, matching the `Dispatch` prose and the
`COMPLEXITY_LENS_ENABLED` conditional. Also add a language tag to the fenced
block so the markdown stays lint-clean while preserving the referenced agent
symbols and output-contract wording.

In `@src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja`:
- Around line 55-68: Clarify the execution rule around launchReviewerAgents so
it does not prohibit the required truncated-output retry in Step A.2b. Update
the wording in the map-review SKILL template to state that reviewer agents are
launched once per review run initially, but a single retry is still allowed when
truncated output is detected. Keep the order and intent of the phase list intact
while making the retry exception explicit.
- Around line 141-168: The dispatch example in SKILL.md.jinja contradicts the
lightweight/full mode description, so update the example to show mode-dependent
branching: monitor only for lightweight mode, and monitor plus predictor plus
evaluator for full mode, using the same spawn_agent(agent_type=...) symbols
already referenced. Keep the complexity_lens dispatch under its existing
COMPLEXITY_LENS_ENABLED guard, and add a language identifier to the fenced code
block to match the markdownlint expectation.

In `@tests/test_mapify_cli.py`:
- Around line 1728-1778: The AC number for this map-review skill existence test
is reused and conflicts with the existing AC-9 section, making failures
ambiguous. Update the criterion identifier in
test_ac09_codex_map_review_skill_exists to a distinct AC number that is not
already used elsewhere in tests/test_mapify_cli.py, and keep the assertion logic
for the templates_dir and agents_skills_dir checks unchanged.

---

Nitpick comments:
In `@src/mapify_cli/templates/codex/agents/evaluator.toml`:
- Around line 230-231: The scoring rules in evaluator.toml are inconsistent: the
validation text says dimension scores are integers 1-10 and never 0, while the
rubric buckets still describe a 0-1 range. Update the validation/rubric language
so the floor is unambiguous, and make the bucket descriptions in the affected
scoring sections (including the ones with the 0-1 bucket) match the intended
minimum score. Keep the terminology consistent across the relevant rubric
entries and the overall validation rules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8eb5d994-901a-4952-8b1c-d79e795542be

📥 Commits

Reviewing files that changed from the base of the PR and between 74148ec and 5e15b3a.

📒 Files selected for processing (27)

.agents/skills/map-review/SKILL.md
.agents/skills/map-review/adversarial-reference.md
.agents/skills/map-review/review-reference.md
.claude/rules/learned/architecture-patterns.md
.claude/rules/learned/testing-strategies.md
.codex/AGENTS.md
.codex/agents/evaluator.toml
.codex/agents/predictor.toml
.codex/config.toml
CHANGELOG.md
RELEASING.md
src/mapify_cli/templates/codex/AGENTS.md
src/mapify_cli/templates/codex/agents/evaluator.toml
src/mapify_cli/templates/codex/agents/predictor.toml
src/mapify_cli/templates/codex/config.toml
src/mapify_cli/templates/codex/skills/map-review/SKILL.md
src/mapify_cli/templates/codex/skills/map-review/adversarial-reference.md
src/mapify_cli/templates/codex/skills/map-review/review-reference.md
src/mapify_cli/templates_src/codex/AGENTS.md.jinja
src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja
src/mapify_cli/templates_src/codex/agents/predictor.toml.jinja
src/mapify_cli/templates_src/codex/config.toml.jinja
src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
src/mapify_cli/templates_src/codex/skills/map-review/adversarial-reference.md.jinja
src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja
tests/fixtures/codex/config.toml
tests/test_mapify_cli.py

- Serialize review-mode.json via python3 -c (env-var passed, no shell string interpolation) instead of hand-built echo JSON, so a commit message containing a quote/backslash in SIBLING_HINT can no longer corrupt the file. Avoid nested double-quotes inside an f-string expression (Python 3.11 rejects that syntax; PEP 701 relaxed it only in 3.12+) by binding the branch to a local variable first. - Make the full/lightweight mode split explicit on the spawn_agent dispatch examples in SKILL.md.jinja and review-reference.md.jinja (predictor/evaluator calls now carry a "full mode only" comment), add a language tag to the previously-bare fence, and restore the dropped full/lightweight qualifier + truncation-retry exception to Execution Rule 6 that ST-003's condensation had silently narrowed to "exactly once". - Note evaluator.toml.jinja's "never 0" scoring rule against its "0-1" rubric bucket floor (inherited from the Claude source's identical wording; scoped a cheap clarifying note to this file only). - Disambiguate the AC-9 label on the new map-review existence test from the pre-existing map-plan AC-9 test in the same file. `--agent review-monitor` (also flagged by the bot) is intentional, not a bug: map_step_runner.py.jinja defines it as a distinct, richer schema from `monitor`, and the Claude source (SKILL.md:342) uses the same value for the same purpose — left unchanged.

azalio added 8 commits July 2, 2026 14:05

feat(map-review-codex): ST-001 add predictor/evaluator Codex agent co…

31938ba

…nfigs Port predictor.md.jinja and evaluator.md.jinja to 3-key Codex TOML schema, condensed while preserving tiered-triage/risk enums and the seven-dimension weighted rubric with minimality-gated deletion lens.

feat(map-review-codex): ST-002 register predictor/evaluator agents in…

25c2fa8

… config.toml Add [agents.predictor] and [agents.evaluator] entries so spawn_agent can reference them; matches the existing decomposer/monitor/researcher entry shape.

test(map-review-codex): ST-004 add scoped existence test for Codex ma…

836ed38

…p-review skill Positive test only for map-review (not a generic Claude->Codex parity gate); skipped until review-reference.md/adversarial-reference.md land via ST-005/ST-006.

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

azalio merged commit 6442317 into main Jul 2, 2026
7 checks passed

azalio mentioned this pull request Jul 2, 2026

fix: flaky git-mutation-check test false-positives on pytest tmp paths #323

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(map-review-codex): port map-review skill to the Codex provider#322

feat(map-review-codex): port map-review skill to the Codex provider#322
azalio merged 9 commits into
mainfrom
feat/map-review-codex-port

azalio commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

azalio commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

azalio commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading