Analysis Date: 2026-02-12
The QA Validation Loop is an automated quality assurance system that validates implementation against acceptance criteria through iterative review and fix cycles. It uses AI agents to identify issues, apply fixes, and ensure code quality before merging.
Key Benefits:
- Automated validation against acceptance criteria
- Iterative issue detection and resolution
- Recurring issue detection with human escalation
- Support for projects with and without test suites
- Integration with Linear for progress tracking
- E2E testing capabilities via Electron MCP for frontend changes
The QA system is organized into modular components:
qa/
├── loop.py # Main orchestration loop
├── reviewer.py # QA reviewer agent session
├── fixer.py # QA fixer agent session
├── report.py # Issue tracking, reporting, escalation
├── criteria.py # Acceptance criteria and status management
├── coverage_validator.py # Test coverage validation
├── pattern_validator.py # Code pattern validation
└── recovery_metrics.py # QA performance metrics
Component Responsibilities:
- loop.py - Coordinates reviewer/fixer sessions, manages iteration logic
- reviewer.py - Runs QA reviewer agent to validate acceptance criteria
- fixer.py - Runs QA fixer agent to resolve reported issues
- report.py - Tracks iteration history, detects recurring issues
- criteria.py - Manages QA signoff status, validates completion
After all subtasks are completed, the QA loop automatically starts:
# Build completion triggers QA
python run.py --spec 001
# Or manually trigger QA
python run.py --spec 001 --qaPreconditions:
- All implementation subtasks marked "completed"
implementation_plan.jsonexists- Build completion check passes
The QA reviewer agent validates the implementation:
Input:
spec.md- Feature specification and requirementsimplementation_plan.json- Subtasks and acceptance criteria- Project code - Implemented changes
Validation Process:
- Reads acceptance criteria from spec
- Inspects implemented code and tests
- Runs test commands (if specified)
- Performs E2E testing (for Electron apps with MCP enabled)
- Checks test coverage requirements
- Validates code patterns and conventions
Output:
- Status:
approvedorrejected - Issues found (if rejected)
- Test results
- QA report:
qa_report.md
Example approval:
{
"status": "approved",
"qa_session": 1,
"timestamp": "2026-02-12T10:30:00Z",
"tests_passed": {
"unit": true,
"integration": true,
"e2e": true
}
}Example rejection:
{
"status": "rejected",
"qa_session": 1,
"timestamp": "2026-02-12T10:30:00Z",
"issues_found": [
{
"title": "Missing error handling for network failures",
"type": "acceptance_criteria",
"file": "src/api/client.py",
"line": 42,
"severity": "high"
},
{
"title": "Unit test coverage below 80%",
"type": "coverage",
"severity": "medium"
}
]
}When QA rejects the build, the fixer agent resolves issues:
Input:
QA_FIX_REQUEST.md- Detailed issue report- Current code implementation
- Original acceptance criteria
Fix Process:
- Analyzes each issue in QA_FIX_REQUEST.md
- Creates fix plan with subtasks
- Implements fixes
- Verifies fixes resolve issues
- Updates status to
fixes_applied
Output:
- Fixed code
- Updated
implementation_plan.jsonwithfixes_appliedstatus
After fixes are applied, QA reviewer runs again:
┌─────────────────┐
│ Build Complete │
└────────┬────────┘
│
▼
┌─────────────────┐
│ QA Reviewer │─── Approved ──→ Merge
└────────┬────────┘
│
│ Rejected
▼
┌─────────────────┐
│ QA Fixer │
└────────┬────────┘
│
│ Fixes Applied
▼
(Loop back to QA Reviewer)
Termination Conditions:
- QA approves (success)
- Max iterations reached (
MAX_QA_ITERATIONS = 50) - Recurring issues detected (escalate to human)
- Consecutive errors (
MAX_CONSECUTIVE_ERRORS = 3)
| Variable | Type | Default | Description |
|---|---|---|---|
GRAPHITI_ENABLED |
boolean | false |
Enable Graphiti memory for QA context |
ELECTRON_MCP_ENABLED |
boolean | false |
Enable E2E testing for Electron apps |
ELECTRON_DEBUG_PORT |
integer | 9222 |
Chrome DevTools Protocol port |
LINEAR_API_KEY |
string | - | Linear API key for progress tracking |
| Constant | Value | Description |
|---|---|---|
MAX_QA_ITERATIONS |
50 |
Maximum QA review cycles |
MAX_CONSECUTIVE_ERRORS |
3 |
Stop after N consecutive errors |
RECURRING_ISSUE_THRESHOLD |
3 |
Escalate if issue appears N times |
ISSUE_SIMILARITY_THRESHOLD |
0.8 |
Similarity score for recurring issues |
The QA loop adapts its approach based on the verification strategy defined in implementation_plan.json:
{
"verification_strategy": {
"risk_level": "critical",
"acceptance_criteria": [...],
"unit_tests": {
"required": true,
"minimum_coverage": 80
},
"integration_tests": {
"required": true
},
"e2e_tests": {
"required": true,
"flows": ["user_login", "checkout"]
},
"browser_verification": {
"required": true,
"pages": ["/dashboard", "/settings"]
}
}
}Risk Levels:
trivial- No automated testing requiredlow- Basic smoke testsmedium- Unit + integration testshigh- Full test suite + coverage requirementscritical- All tests + E2E + manual verification
Location: apps/backend/qa/reviewer.py
Purpose: Validates implementation against acceptance criteria
Model Selection:
# Uses higher thinking budget for thorough analysis
model = get_phase_model("qa")
thinking_budget = get_phase_thinking_budget("qa") # Typically 16000 tokensTools Available:
- Bash - Run test commands, linting, type checking
- Read - Inspect code, tests, configuration files
- Write/Edit - Create test files if needed
- Electron MCP (if enabled) - E2E testing for Electron apps
mcp__electron__take_screenshotmcp__electron__send_command_to_electronmcp__electron__get_electron_window_info
Validation Steps:
-
Acceptance Criteria Check
- Reads spec.md for acceptance criteria
- Verifies each criterion is met
- Documents any missing requirements
-
Test Execution
- Runs unit tests:
pytest tests/ - Runs integration tests (if required)
- Validates test coverage (if minimum specified)
- Runs unit tests:
-
Code Quality
- Checks for console.log/debugging statements
- Validates error handling
- Reviews code patterns
-
E2E Testing (if Electron app)
- Takes screenshots to verify UI
- Interacts with UI elements (click, fill forms)
- Validates user flows
Output: qa_report.md with detailed findings
Location: apps/backend/qa/fixer.py
Purpose: Resolves issues reported by QA reviewer
Model Selection:
# Standard thinking budget for focused fixes
model = get_phase_model("coder") # Reuses coder model
thinking_budget = get_phase_thinking_budget("coder")Tools Available: Same as coder agent (Bash, Read, Write, Edit)
Fix Process:
-
Read QA_FIX_REQUEST.md
- Parse issue list
- Understand acceptance criteria gaps
- Identify files to modify
-
Plan Fixes
- Break fixes into small steps
- Identify which files need changes
- Plan test additions if needed
-
Implement Fixes
- Edit code files
- Add/update tests
- Remove debug statements
- Apply code patterns
-
Verify Fixes
- Run tests to ensure fixes work
- Check that new issues aren't introduced
- Validate all acceptance criteria are met
Output: Fixed code + status update
The QA system categorizes issues into types:
| Type | Description | Example |
|---|---|---|
acceptance_criteria |
Missing or incomplete requirement | Feature doesn't match spec |
unit_test |
Missing or failing unit tests | No test for error case |
integration_test |
Integration test failures | API endpoint not tested |
e2e_test |
End-to-end test failures | User flow broken |
coverage |
Insufficient test coverage | Coverage below 80% |
code_quality |
Code pattern violations | Debug statements left in |
error_handling |
Missing error handling | No try/catch for network calls |
security |
Security concerns | Unvalidated user input |
performance |
Performance issues | Inefficient query |
The system detects when the same issue appears across multiple iterations:
-
Normalize Issues
- Combines title + file + line into a key
- Removes common prefixes ("error:", "issue:")
- Lowercases and strips whitespace
-
Similarity Matching
- Uses
SequenceMatcherfor fuzzy matching - Considers issues "same" if similarity >= 0.8
- Uses
-
Threshold Check
- Tracks occurrences across iterations
- Escalates to human if issue appears >= 3 times
Iteration 1:
{"title": "Missing error handling", "file": "api.py", "line": 42}Iteration 2:
{"title": "Error: Missing error handling", "file": "api.py", "line": 42}Iteration 3:
{"title": "No error handling for network failures", "file": "api.py", "line": 42}Result: All three match (similarity > 0.8), escalate to human.
When recurring issues are detected:
-
Creates
QA_HUMAN_ESCALATION.mdwith:- Recurring issue summary
- All iterations where issue appeared
- Suggested actions for human
-
Updates
implementation_plan.json:{ "qa_signoff": { "status": "human_escalation", "recurring_issues": [...], "escalation_reason": "Issue appeared 3+ times" } } -
User must:
- Review
QA_HUMAN_ESCALATION.md - Manually fix the recurring issue
- Delete
QA_FIX_REQUEST.mdto resume QA
- Review
The QA system handles projects without test suites:
- Checks for test directory existence
- Runs test discovery command
- If no tests found, marks as "no_test_project"
When no tests exist, QA creates a manual test plan:
# Manual Test Plan
## Test Cases
1. **User Registration Flow**
- Navigate to /register
- Fill valid email and password
- Submit form
- Verify redirect to dashboard
2. **Login Validation**
- Navigate to /login
- Submit empty form
- Verify error messages appear
...For no-test projects, QA validates:
- Acceptance criteria through code inspection
- Manual testing steps are documented
- Code quality and patterns
- Error handling coverage
For Electron frontend projects, QA can perform automated E2E testing:
-
Start Electron with debugging:
npm run dev # Already configured with --remote-debugging-port=9222 -
Enable Electron MCP in
.env:ELECTRON_MCP_ENABLED=true ELECTRON_DEBUG_PORT=9222
-
Run QA:
python run.py --spec 001 --qa
QA agents automatically get access to Electron MCP tools:
| Tool | Purpose |
|---|---|
mcp__electron__take_screenshot |
Capture screenshots for visual verification |
mcp__electron__get_electron_window_info |
Get info about running windows |
mcp__electron__send_command_to_electron |
Interact with the app |
mcp__electron__read_electron_logs |
Read console logs for debugging |
The send_command_to_electron tool supports:
UI Interaction:
click_by_text- Click buttons/links by visible textclick_by_selector- Click elements by CSS selectorfill_input- Fill form fields by placeholder/selectorselect_option- Select dropdown options
Navigation:
navigate_to_hash- Navigate to hash routes (#settings, #create)
Input:
send_keyboard_shortcut- Send shortcuts (Enter, Ctrl+N, etc.)
Inspection:
get_page_structure- Get organized overview of page elementsdebug_elements- Get debugging info about buttons/formsverify_form_state- Check form validation state
Execution:
eval- Execute custom JavaScript code
# 1. Take initial screenshot
agent: "Take a screenshot to see the current UI"
# Uses: mcp__electron__take_screenshot
# 2. Inspect page structure
agent: "Get page structure to find available buttons"
# Uses: mcp__electron__send_command_to_electron (command: "get_page_structure")
# 3. Click button to navigate
agent: "Click the 'Create New Spec' button"
# Uses: mcp__electron__send_command_to_electron (command: "click_by_text", args: {text: "Create New Spec"})
# 4. Fill out form
agent: "Fill the task description field"
# Uses: mcp__electron__send_command_to_electron (command: "fill_input", args: {placeholder: "Describe your task", value: "Add login feature"})
# 5. Submit and verify
agent: "Click Submit and verify success"
# Uses: click_by_text → take_screenshot → verify result- Bug Fixes - Reproduce the bug, apply fix, verify it's resolved
- New Features - Implement feature, test the UI flow end-to-end
- UI Changes - Verify visual changes and interactions work correctly
- Form Validation - Test form submission, validation, error handling
The QA system integrates with Linear for progress tracking:
# When QA starts
linear_qa_started(spec_id, iteration)
# When QA rejects
linear_qa_rejected(spec_id, iteration, issues_count)
# When QA approves
linear_qa_approved(spec_id, total_iterations, duration_seconds)
# When max iterations reached
linear_qa_max_iterations(spec_id, iteration)Set environment variable:
LINEAR_API_KEY=lin_api_...
LINEAR_TEAM_ID=YOUR_TEAM_IDpending → rejected → fixes_applied → rejected → ... → approved
↑ │
└───────────────┘ (loop)
State Descriptions:
pending- QA not yet runrejected- QA found issues, needs fixesfixes_applied- Fixer applied changes, ready for re-validationapproved- All acceptance criteria methuman_escalation- Recurring issues detected, human intervention required
implementation_plan.json:
{
"qa_signoff": {
"status": "approved",
"qa_session": 2,
"timestamp": "2026-02-12T10:30:00Z",
"tests_passed": {
"unit": true,
"integration": true,
"e2e": false
}
},
"qa_iteration_history": [
{
"iteration": 1,
"status": "rejected",
"timestamp": "2026-02-12T09:00:00Z",
"issues": [
{"title": "Missing test", "type": "unit_test"}
]
},
{
"iteration": 2,
"status": "approved",
"timestamp": "2026-02-12T10:30:00Z",
"issues": []
}
],
"qa_stats": {
"total_iterations": 2,
"last_iteration": 2,
"last_status": "approved",
"issues_by_type": {
"unit_test": 1
}
}
}# Automatic (triggered by build completion)
python run.py --spec 001
# Manual trigger
python run.py --spec 001 --qa
# With verbose output
python run.py --spec 001 --qa --verbose
# Use specific model
python run.py --spec 001 --qa --model claude-sonnet-4-5-20250929python run.py --spec 001 --qa-statusOutput:
Spec: 001-feature
QA Status: REJECTED
QA Sessions: 2
Last Check: 2026-02-12T10:30:00Z
Issues Found:
- Missing error handling (acceptance_criteria, high)
- Test coverage 65% (coverage, medium)
Iteration History:
Iteration 1 (09:00): Rejected - 2 issues
Iteration 2 (10:30): Rejected - 1 issue
Define project-specific acceptance criteria in spec.md:
## Acceptance Criteria
- [ ] User can register with email/password
- [ ] Password must be 8+ characters with special char
- [ ] Account created after email verification
- [ ] Login with valid credentials works
- [ ] Login with invalid credentials shows error
## Verification
**Unit Tests:**
- `pytest tests/test_auth.py` - All pass
- Coverage: >= 80%
**Integration Tests:**
- `pytest tests/integration/test_api.py` - All pass
**E2E Tests:**
- Manual: Register → Verify email → Login → LogoutModify QA behavior by editing agent prompts:
apps/backend/prompts/qa_reviewer.md:
You are a QA reviewer. Focus on:
1. Security vulnerabilities
2. Error handling completeness
3. Test coverage for edge cases
4. Performance considerationsapps/backend/prompts/qa_fixer.md:
You are a QA fixer. When fixing issues:
1. Add comprehensive tests
2. Follow project code patterns
3. Document complex logic
4. Remove all debug statementsTypical Duration:
- Simple feature QA: 2-5 minutes
- Standard feature QA: 5-15 minutes
- Complex feature QA: 15-30 minutes
- With E2E testing: Add 5-10 minutes
Resource Usage:
- Memory: ~500MB per QA session
- CPU: Moderate during test execution
- API calls: 10-50 per QA iteration (depends on test count)
Iteration Statistics:
- Median iterations to approval: 2
- 95th percentile: 5 iterations
- Max allowed: 50 iterations
-
Write Clear Acceptance Criteria
- Be specific about requirements
- Include edge cases
- Define measurable outcomes
-
Provide Test Commands
{ "verification_strategy": { "unit_tests": { "required": true, "commands": ["pytest tests/test_feature.py -v"] } } } -
Set Realistic Coverage Targets
- 80% is a good default
- 100% is rarely practical
- Focus on critical paths
-
Review QA Reports
- Read
qa_report.mdafter each iteration - Understand why issues were found
- Learn patterns to avoid future issues
- Read
-
Add New Issue Types
- Define in
report.py - Add validation logic
- Document in acceptance criteria
- Define in
-
Create Custom Validators
# qa/custom_validator.py def validate_custom_criteria(code: str) -> list[dict]: issues = [] # Your validation logic return issues
-
Integrate Additional Test Frameworks
- Add test runner in
reviewer.py - Parse output format
- Report failures as issues
- Add test runner in
Problem: QA rejects with same issue repeatedly
Solutions:
- Check for recurring issues - should auto-escalate
- Review
QA_FIX_REQUEST.md- is issue clear? - Manually fix and delete
QA_FIX_REQUEST.md - Increase
ISSUE_SIMILARITY_THRESHOLDif issues are too different
Problem: QA reports tests but doesn't run them
Solutions:
- Verify test commands in verification strategy
- Check test files exist in worktree
- Ensure dependencies installed in
.venv - Check test framework compatibility (pytest, vitest, etc.)
Problem: Electron MCP tools not available
Solutions:
- Ensure
ELECTRON_MCP_ENABLED=truein.env - Start Electron with
npm run dev(includes debug port) - Verify
ELECTRON_DEBUG_PORT=9222matches app - Check Electron app is running before starting QA
Problem: QA approves without thorough validation
Solutions:
- Review
qa_reviewer.mdprompt - make it stricter - Add more acceptance criteria to spec
- Set higher coverage requirements
- Enable E2E testing for UI changes
- Review
qa_report.mdto see what was checked
Problem: QA hit 50 iterations without approval
Solutions:
- Review
qa_iteration_historyfor patterns - Manually fix recurring issues
- Simplify acceptance criteria if too strict
- Break feature into smaller specs
- Increase
MAX_QA_ITERATIONSif needed (rare)
Location: tests/test_qa_loop.py
Run:
pytest tests/test_qa_loop.py -vCoverage:
- QA status management
- Iteration tracking
- Issue detection logic
- State machine transitions
Test QA system with real projects:
# Create test spec
python spec_runner.py --task "Test feature"
# Run build
python run.py --spec test-feature
# Verify QA runs
python run.py --spec test-feature --qa-status- Testing Patterns - Test framework and patterns
- Architecture Overview - System architecture
- Multi-Agent Pipeline - Agent orchestration
- Advanced Usage - QA customization
- Agent Customization - Modifying QA prompts
QA Loop documentation: 2026-02-12