Skip to content

Superpowers-style TDD enforcement: delete code written before test, adversarial spec compliance #285

Description

@azalio

Summary

map-framework has map-tdd as a workflow variant, but it is not enforced — it is an optional choice. Superpowers treats TDD as mandatory discipline with physical consequences: code written before a failing test is LITERALLY DELETED. Every task goes through a mandatory two-stage review (spec compliance first, code quality second) in strict order.

What Superpowers does

The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Enforcement mechanism

  1. Physical deletion: "Write code before the test? Delete it. Start over." No exceptions — not "keep as reference," not "adapt it while writing tests."
  2. RED-GREEN-REFACTOR cycle enforced:
    • RED: Write failing test → VERIFY it fails correctly (MANDATORY, never skip)
    • GREEN: Write minimal code → VERIFY it passes (MANDATORY)
    • REFACTOR: Clean up while keeping green
  3. Anticipated excuses countered (11-item rationalization table):
    • "Too simple to test" → "Simple code breaks. Test takes 30 seconds."
    • "Deleting X hours is wasteful" → "Sunk cost fallacy."
    • "TDD is dogmatic" → "TDD IS pragmatic."
  4. Red Flags for self-detection (14 patterns): "Code before test," "Rationalizing just this once," "This is different because..." → STOP and start over.
  5. Verification checklist (8 checkboxes): cannot check all? "You skipped TDD. Start over."

Two-stage per-task review (mandatory)

  1. Spec Compliance Review (MUST come first):
    • Adversarial framing: "The implementer finished suspiciously quickly. DO NOT trust the report."
    • Read actual code, compare to requirements line by line
    • Check for: missing requirements, extra/unneeded work, misunderstandings
    • Output: ✅ or ❌ with file:line references
  2. Code Quality Review (only after spec passes):
    • Named agent superpowers:code-reviewer
    • Checks: file responsibility, unit decomposition, plan conformance, size discipline
    • Output: Strengths, Issues (Critical/Important/Minor), Assessment

Rationalization defense

  • Skills use language tested against real agent loophole-finding behavior
  • "Violating the letter is violating the spirit" — closes entire class of rationalizations
  • "This is different because..." listed as a Red Flag itself

Current map-framework state

  • map-tdd exists as a workflow variant (write tests first, then implement)
  • No enforcement: agent can "skip" TDD by simply not loading the skill
  • No code-before-test deletion mechanism
  • No adversarial review: Monitor is evaluative, not adversarial
  • map-review is interactive and post-hoc, not per-task mandatory gate
  • No rationalization defense in prompt language

Proposed design

map-tdd upgrade: from variant to enforcement

  1. Config flag: .map/config.yaml key tdd.enforce: true
  2. Actor constraint: When enforce=true, Actor MUST follow RED-GREEN-REFACTOR
  3. Monitor gate: After Actor writes code, Monitor checks:
    • Does a test exist that was written BEFORE the implementation?
    • Does the test fail without the implementation?
    • Is the test actually testing the feature (not passing trivially)?
    • If any check fails → verdict: tdd_violation, Actor must restart

Spec compliance review

  1. After each subtask: Spec compliance reviewer (adversarial subagent)
    • Receives: subtask spec, actual diff, NOT the Actor summary
    • Checks: does implementation match spec exactly?
    • Output: ✅ spec-compliant or ❌ with file:line gaps
  2. Gate: Cannot proceed to code quality review until spec passes
  3. Fix loop: Actor fixes → reviewer re-checks → repeat

Code quality review

  1. Separate reviewer: Different prompt, different persona
  2. Checks: error handling, type safety, naming, separation of concerns, plan conformance
  3. Output: Strengths, Issues (Critical/Important/Minor), Assessment

Rationalization table (Monitor additions)

  • Add TDD-avoidance detection patterns to Monitor
  • Red Flags list for Actor self-awareness
  • "This is different because..." → escalate to user

References

  • Superpowers: skills/test-driven-development/SKILL.md (371 lines), skills/subagent-driven-development/SKILL.md (277 lines)
  • Spec reviewer prompt: skills/subagent-driven-development/spec-reviewer-prompt.md
  • Code quality reviewer: skills/subagent-driven-development/code-quality-reviewer-prompt.md
  • Testing anti-patterns: skills/test-driven-development/testing-anti-patterns.md

Acceptance criteria

  • .map/config.yaml key tdd.enforce
  • RED-GREEN-REFACTOR enforcement in Actor when tdd.enforce=true
  • Monitor TDD violation detection (test-before-code check)
  • Spec compliance reviewer subagent (adversarial, per-subtask)
  • Code quality reviewer subagent (separate from spec reviewer)
  • Sequential gate: spec-first, code-second, cannot swap
  • Rationalization table in Monitor prompt

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions