QA Validation Loop

Analysis Date: 2026-02-12

Overview

The QA Validation Loop is an automated quality assurance system that validates implementation against acceptance criteria through iterative review and fix cycles. It uses AI agents to identify issues, apply fixes, and ensure code quality before merging.

Key Benefits:

Automated validation against acceptance criteria
Iterative issue detection and resolution
Recurring issue detection with human escalation
Support for projects with and without test suites
Integration with Linear for progress tracking
E2E testing capabilities via Electron MCP for frontend changes

Architecture

The QA system is organized into modular components:

qa/
├── loop.py           # Main orchestration loop
├── reviewer.py       # QA reviewer agent session
├── fixer.py          # QA fixer agent session
├── report.py         # Issue tracking, reporting, escalation
├── criteria.py       # Acceptance criteria and status management
├── coverage_validator.py   # Test coverage validation
├── pattern_validator.py    # Code pattern validation
└── recovery_metrics.py     # QA performance metrics

Component Responsibilities:

loop.py - Coordinates reviewer/fixer sessions, manages iteration logic
reviewer.py - Runs QA reviewer agent to validate acceptance criteria
fixer.py - Runs QA fixer agent to resolve reported issues
report.py - Tracks iteration history, detects recurring issues
criteria.py - Manages QA signoff status, validates completion

Workflow

1. Initial Validation

After all subtasks are completed, the QA loop automatically starts:

# Build completion triggers QA
python run.py --spec 001

# Or manually trigger QA
python run.py --spec 001 --qa

Preconditions:

All implementation subtasks marked "completed"
implementation_plan.json exists
Build completion check passes

2. QA Reviewer Session

The QA reviewer agent validates the implementation:

Input:

spec.md - Feature specification and requirements
implementation_plan.json - Subtasks and acceptance criteria
Project code - Implemented changes

Validation Process:

Reads acceptance criteria from spec
Inspects implemented code and tests
Runs test commands (if specified)
Performs E2E testing (for Electron apps with MCP enabled)
Checks test coverage requirements
Validates code patterns and conventions

Output:

Status: approved or rejected
Issues found (if rejected)
Test results
QA report: qa_report.md

Example approval:

{
  "status": "approved",
  "qa_session": 1,
  "timestamp": "2026-02-12T10:30:00Z",
  "tests_passed": {
    "unit": true,
    "integration": true,
    "e2e": true
  }
}

Example rejection:

{
  "status": "rejected",
  "qa_session": 1,
  "timestamp": "2026-02-12T10:30:00Z",
  "issues_found": [
    {
      "title": "Missing error handling for network failures",
      "type": "acceptance_criteria",
      "file": "src/api/client.py",
      "line": 42,
      "severity": "high"
    },
    {
      "title": "Unit test coverage below 80%",
      "type": "coverage",
      "severity": "medium"
    }
  ]
}

3. QA Fixer Session (if rejected)

When QA rejects the build, the fixer agent resolves issues:

Input:

QA_FIX_REQUEST.md - Detailed issue report
Current code implementation
Original acceptance criteria

Fix Process:

Analyzes each issue in QA_FIX_REQUEST.md
Creates fix plan with subtasks
Implements fixes
Verifies fixes resolve issues
Updates status to fixes_applied

Output:

Fixed code
Updated implementation_plan.json with fixes_applied status

4. Re-validation Loop

After fixes are applied, QA reviewer runs again:

┌─────────────────┐
│ Build Complete  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ QA Reviewer     │─── Approved ──→ Merge
└────────┬────────┘
         │
         │ Rejected
         ▼
┌─────────────────┐
│ QA Fixer        │
└────────┬────────┘
         │
         │ Fixes Applied
         ▼
         (Loop back to QA Reviewer)

Termination Conditions:

QA approves (success)
Max iterations reached (MAX_QA_ITERATIONS = 50)
Recurring issues detected (escalate to human)
Consecutive errors (MAX_CONSECUTIVE_ERRORS = 3)

Configuration

Environment Variables

Variable	Type	Default	Description
`GRAPHITI_ENABLED`	boolean	`false`	Enable Graphiti memory for QA context
`ELECTRON_MCP_ENABLED`	boolean	`false`	Enable E2E testing for Electron apps
`ELECTRON_DEBUG_PORT`	integer	`9222`	Chrome DevTools Protocol port
`LINEAR_API_KEY`	string	-	Linear API key for progress tracking

Constants

Constant	Value	Description
`MAX_QA_ITERATIONS`	`50`	Maximum QA review cycles
`MAX_CONSECUTIVE_ERRORS`	`3`	Stop after N consecutive errors
`RECURRING_ISSUE_THRESHOLD`	`3`	Escalate if issue appears N times
`ISSUE_SIMILARITY_THRESHOLD`	`0.8`	Similarity score for recurring issues

Verification Strategy

The QA loop adapts its approach based on the verification strategy defined in implementation_plan.json:

{
  "verification_strategy": {
    "risk_level": "critical",
    "acceptance_criteria": [...],
    "unit_tests": {
      "required": true,
      "minimum_coverage": 80
    },
    "integration_tests": {
      "required": true
    },
    "e2e_tests": {
      "required": true,
      "flows": ["user_login", "checkout"]
    },
    "browser_verification": {
      "required": true,
      "pages": ["/dashboard", "/settings"]
    }
  }
}

Risk Levels:

trivial - No automated testing required
low - Basic smoke tests
medium - Unit + integration tests
high - Full test suite + coverage requirements
critical - All tests + E2E + manual verification

Agent Types

QA Reviewer Agent

Location: apps/backend/qa/reviewer.py

Purpose: Validates implementation against acceptance criteria

Model Selection:

# Uses higher thinking budget for thorough analysis
model = get_phase_model("qa")
thinking_budget = get_phase_thinking_budget("qa")  # Typically 16000 tokens

Tools Available:

Bash - Run test commands, linting, type checking
Read - Inspect code, tests, configuration files
Write/Edit - Create test files if needed
Electron MCP (if enabled) - E2E testing for Electron apps
- mcp__electron__take_screenshot
- mcp__electron__send_command_to_electron
- mcp__electron__get_electron_window_info

Validation Steps:

Acceptance Criteria Check
- Reads spec.md for acceptance criteria
- Verifies each criterion is met
- Documents any missing requirements
Test Execution
- Runs unit tests: pytest tests/
- Runs integration tests (if required)
- Validates test coverage (if minimum specified)
Code Quality
- Checks for console.log/debugging statements
- Validates error handling
- Reviews code patterns
E2E Testing (if Electron app)
- Takes screenshots to verify UI
- Interacts with UI elements (click, fill forms)
- Validates user flows

Output: qa_report.md with detailed findings

QA Fixer Agent

Location: apps/backend/qa/fixer.py

Purpose: Resolves issues reported by QA reviewer

Model Selection:

# Standard thinking budget for focused fixes
model = get_phase_model("coder")  # Reuses coder model
thinking_budget = get_phase_thinking_budget("coder")

Tools Available: Same as coder agent (Bash, Read, Write, Edit)

Fix Process:

Read QA_FIX_REQUEST.md
- Parse issue list
- Understand acceptance criteria gaps
- Identify files to modify
Plan Fixes
- Break fixes into small steps
- Identify which files need changes
- Plan test additions if needed
Implement Fixes
- Edit code files
- Add/update tests
- Remove debug statements
- Apply code patterns
Verify Fixes
- Run tests to ensure fixes work
- Check that new issues aren't introduced
- Validate all acceptance criteria are met

Output: Fixed code + status update

Issue Types

The QA system categorizes issues into types:

Type	Description	Example
`acceptance_criteria`	Missing or incomplete requirement	Feature doesn't match spec
`unit_test`	Missing or failing unit tests	No test for error case
`integration_test`	Integration test failures	API endpoint not tested
`e2e_test`	End-to-end test failures	User flow broken
`coverage`	Insufficient test coverage	Coverage below 80%
`code_quality`	Code pattern violations	Debug statements left in
`error_handling`	Missing error handling	No try/catch for network calls
`security`	Security concerns	Unvalidated user input
`performance`	Performance issues	Inefficient query

Recurring Issue Detection

The system detects when the same issue appears across multiple iterations:

How It Works

Normalize Issues
- Combines title + file + line into a key
- Removes common prefixes ("error:", "issue:")
- Lowercases and strips whitespace
Similarity Matching
- Uses SequenceMatcher for fuzzy matching
- Considers issues "same" if similarity >= 0.8
Threshold Check
- Tracks occurrences across iterations
- Escalates to human if issue appears >= 3 times

Example

Iteration 1:

{"title": "Missing error handling", "file": "api.py", "line": 42}

Iteration 2:

{"title": "Error: Missing error handling", "file": "api.py", "line": 42}

Iteration 3:

{"title": "No error handling for network failures", "file": "api.py", "line": 42}

Result: All three match (similarity > 0.8), escalate to human.

Human Escalation

When recurring issues are detected:

Creates QA_HUMAN_ESCALATION.md with:
- Recurring issue summary
- All iterations where issue appeared
- Suggested actions for human

Updates implementation_plan.json:

{
  "qa_signoff": {
    "status": "human_escalation",
    "recurring_issues": [...],
    "escalation_reason": "Issue appeared 3+ times"
  }
}

User must:
- Review QA_HUMAN_ESCALATION.md
- Manually fix the recurring issue
- Delete QA_FIX_REQUEST.md to resume QA

No-Test Projects

The QA system handles projects without test suites:

Detection

Checks for test directory existence
Runs test discovery command
If no tests found, marks as "no_test_project"

Manual Test Plan

When no tests exist, QA creates a manual test plan:

# Manual Test Plan

## Test Cases

1. **User Registration Flow**
   - Navigate to /register
   - Fill valid email and password
   - Submit form
   - Verify redirect to dashboard

2. **Login Validation**
   - Navigate to /login
   - Submit empty form
   - Verify error messages appear

...

Validation Criteria

For no-test projects, QA validates:

Acceptance criteria through code inspection
Manual testing steps are documented
Code quality and patterns
Error handling coverage

E2E Testing with Electron MCP

For Electron frontend projects, QA can perform automated E2E testing:

Setup

Start Electron with debugging:

npm run dev  # Already configured with --remote-debugging-port=9222

Enable Electron MCP in .env:

ELECTRON_MCP_ENABLED=true
ELECTRON_DEBUG_PORT=9222

Run QA:
```
python run.py --spec 001 --qa
```

Available E2E Tools

QA agents automatically get access to Electron MCP tools:

Tool	Purpose
`mcp__electron__take_screenshot`	Capture screenshots for visual verification
`mcp__electron__get_electron_window_info`	Get info about running windows
`mcp__electron__send_command_to_electron`	Interact with the app
`mcp__electron__read_electron_logs`	Read console logs for debugging

Interaction Commands

The send_command_to_electron tool supports:

UI Interaction:

click_by_text - Click buttons/links by visible text
click_by_selector - Click elements by CSS selector
fill_input - Fill form fields by placeholder/selector
select_option - Select dropdown options

Navigation:

navigate_to_hash - Navigate to hash routes (#settings, #create)

Input:

send_keyboard_shortcut - Send shortcuts (Enter, Ctrl+N, etc.)

Inspection:

get_page_structure - Get organized overview of page elements
debug_elements - Get debugging info about buttons/forms
verify_form_state - Check form validation state

Execution:

eval - Execute custom JavaScript code

Example E2E Test Flow

# 1. Take initial screenshot
agent: "Take a screenshot to see the current UI"
# Uses: mcp__electron__take_screenshot

# 2. Inspect page structure
agent: "Get page structure to find available buttons"
# Uses: mcp__electron__send_command_to_electron (command: "get_page_structure")

# 3. Click button to navigate
agent: "Click the 'Create New Spec' button"
# Uses: mcp__electron__send_command_to_electron (command: "click_by_text", args: {text: "Create New Spec"})

# 4. Fill out form
agent: "Fill the task description field"
# Uses: mcp__electron__send_command_to_electron (command: "fill_input", args: {placeholder: "Describe your task", value: "Add login feature"})

# 5. Submit and verify
agent: "Click Submit and verify success"
# Uses: click_by_text → take_screenshot → verify result

When E2E Testing Is Used

Bug Fixes - Reproduce the bug, apply fix, verify it's resolved
New Features - Implement feature, test the UI flow end-to-end
UI Changes - Verify visual changes and interactions work correctly
Form Validation - Test form submission, validation, error handling

Linear Integration

The QA system integrates with Linear for progress tracking:

Status Updates

# When QA starts
linear_qa_started(spec_id, iteration)

# When QA rejects
linear_qa_rejected(spec_id, iteration, issues_count)

# When QA approves
linear_qa_approved(spec_id, total_iterations, duration_seconds)

# When max iterations reached
linear_qa_max_iterations(spec_id, iteration)

Configuration

Set environment variable:

LINEAR_API_KEY=lin_api_...
LINEAR_TEAM_ID=YOUR_TEAM_ID

State Management

QA Signoff States

pending → rejected → fixes_applied → rejected → ... → approved
                    ↑               │
                    └───────────────┘ (loop)

State Descriptions:

pending - QA not yet run
rejected - QA found issues, needs fixes
fixes_applied - Fixer applied changes, ready for re-validation
approved - All acceptance criteria met
human_escalation - Recurring issues detected, human intervention required

Data Storage

implementation_plan.json:

{
  "qa_signoff": {
    "status": "approved",
    "qa_session": 2,
    "timestamp": "2026-02-12T10:30:00Z",
    "tests_passed": {
      "unit": true,
      "integration": true,
      "e2e": false
    }
  },
  "qa_iteration_history": [
    {
      "iteration": 1,
      "status": "rejected",
      "timestamp": "2026-02-12T09:00:00Z",
      "issues": [
        {"title": "Missing test", "type": "unit_test"}
      ]
    },
    {
      "iteration": 2,
      "status": "approved",
      "timestamp": "2026-02-12T10:30:00Z",
      "issues": []
    }
  ],
  "qa_stats": {
    "total_iterations": 2,
    "last_iteration": 2,
    "last_status": "approved",
    "issues_by_type": {
      "unit_test": 1
    }
  }
}

CLI Commands

Run QA Validation

# Automatic (triggered by build completion)
python run.py --spec 001

# Manual trigger
python run.py --spec 001 --qa

# With verbose output
python run.py --spec 001 --qa --verbose

# Use specific model
python run.py --spec 001 --qa --model claude-sonnet-4-5-20250929

Check QA Status

python run.py --spec 001 --qa-status

Output:

Spec: 001-feature

QA Status: REJECTED
QA Sessions: 2
Last Check: 2026-02-12T10:30:00Z

Issues Found:
- Missing error handling (acceptance_criteria, high)
- Test coverage 65% (coverage, medium)

Iteration History:
  Iteration 1 (09:00): Rejected - 2 issues
  Iteration 2 (10:30): Rejected - 1 issue

Customization

Custom Acceptance Criteria

Define project-specific acceptance criteria in spec.md:

## Acceptance Criteria

- [ ] User can register with email/password
- [ ] Password must be 8+ characters with special char
- [ ] Account created after email verification
- [ ] Login with valid credentials works
- [ ] Login with invalid credentials shows error

## Verification

**Unit Tests:**
- `pytest tests/test_auth.py` - All pass
- Coverage: >= 80%

**Integration Tests:**
- `pytest tests/integration/test_api.py` - All pass

**E2E Tests:**
- Manual: Register → Verify email → Login → Logout

Custom QA Prompts

Modify QA behavior by editing agent prompts:

apps/backend/prompts/qa_reviewer.md:

You are a QA reviewer. Focus on:
1. Security vulnerabilities
2. Error handling completeness
3. Test coverage for edge cases
4. Performance considerations

apps/backend/prompts/qa_fixer.md:

You are a QA fixer. When fixing issues:
1. Add comprehensive tests
2. Follow project code patterns
3. Document complex logic
4. Remove all debug statements

Performance Characteristics

Typical Duration:

Simple feature QA: 2-5 minutes
Standard feature QA: 5-15 minutes
Complex feature QA: 15-30 minutes
With E2E testing: Add 5-10 minutes

Resource Usage:

Memory: ~500MB per QA session
CPU: Moderate during test execution
API calls: 10-50 per QA iteration (depends on test count)

Iteration Statistics:

Median iterations to approval: 2
95th percentile: 5 iterations
Max allowed: 50 iterations

Best Practices

For Users

Write Clear Acceptance Criteria
- Be specific about requirements
- Include edge cases
- Define measurable outcomes

Provide Test Commands

{
  "verification_strategy": {
    "unit_tests": {
      "required": true,
      "commands": ["pytest tests/test_feature.py -v"]
    }
  }
}

Set Realistic Coverage Targets
- 80% is a good default
- 100% is rarely practical
- Focus on critical paths
Review QA Reports
- Read qa_report.md after each iteration
- Understand why issues were found
- Learn patterns to avoid future issues

For Developers Extending QA

Add New Issue Types
- Define in report.py
- Add validation logic
- Document in acceptance criteria

Create Custom Validators

# qa/custom_validator.py
def validate_custom_criteria(code: str) -> list[dict]:
    issues = []
    # Your validation logic
    return issues

Integrate Additional Test Frameworks
- Add test runner in reviewer.py
- Parse output format
- Report failures as issues

Troubleshooting

QA Stuck in Loop

Problem: QA rejects with same issue repeatedly

Solutions:

Check for recurring issues - should auto-escalate
Review QA_FIX_REQUEST.md - is issue clear?
Manually fix and delete QA_FIX_REQUEST.md
Increase ISSUE_SIMILARITY_THRESHOLD if issues are too different

Tests Not Running

Problem: QA reports tests but doesn't run them

Solutions:

Verify test commands in verification strategy
Check test files exist in worktree
Ensure dependencies installed in .venv
Check test framework compatibility (pytest, vitest, etc.)

E2E Testing Not Working

Problem: Electron MCP tools not available

Solutions:

Ensure ELECTRON_MCP_ENABLED=true in .env
Start Electron with npm run dev (includes debug port)
Verify ELECTRON_DEBUG_PORT=9222 matches app
Check Electron app is running before starting QA

QA Approves Too Easily

Problem: QA approves without thorough validation

Solutions:

Review qa_reviewer.md prompt - make it stricter
Add more acceptance criteria to spec
Set higher coverage requirements
Enable E2E testing for UI changes
Review qa_report.md to see what was checked

Max Iterations Reached

Problem: QA hit 50 iterations without approval

Solutions:

Review qa_iteration_history for patterns
Manually fix recurring issues
Simplify acceptance criteria if too strict
Break feature into smaller specs
Increase MAX_QA_ITERATIONS if needed (rare)

Testing

Unit Tests

Location: tests/test_qa_loop.py

Run:

pytest tests/test_qa_loop.py -v

Coverage:

QA status management
Iteration tracking
Issue detection logic
State machine transitions

Integration Tests

Test QA system with real projects:

# Create test spec
python spec_runner.py --task "Test feature"

# Run build
python run.py --spec test-feature

# Verify QA runs
python run.py --spec test-feature --qa-status

FilesExpand file tree

QA-LOOP.md

Latest commit

History

QA-LOOP.md

File metadata and controls

QA Validation Loop

Overview

Architecture

Workflow

1. Initial Validation

2. QA Reviewer Session

3. QA Fixer Session (if rejected)

4. Re-validation Loop

Configuration

Environment Variables

Constants

Verification Strategy

Agent Types

QA Reviewer Agent

QA Fixer Agent

Issue Types

Recurring Issue Detection

How It Works

Example

Human Escalation

No-Test Projects

Detection

Manual Test Plan

Validation Criteria

E2E Testing with Electron MCP

Setup

Available E2E Tools

Interaction Commands

Example E2E Test Flow

When E2E Testing Is Used

Linear Integration

Status Updates

Configuration

State Management

QA Signoff States

Data Storage

CLI Commands

Run QA Validation

Check QA Status

Customization

Custom Acceptance Criteria

Custom QA Prompts

Performance Characteristics

Best Practices

For Users

For Developers Extending QA

Troubleshooting

QA Stuck in Loop

Tests Not Running

E2E Testing Not Working

QA Approves Too Easily

Max Iterations Reached

Testing

Unit Tests

Integration Tests

Related Documentation