This document describes the different CI test profiles used in the xchecker project. Each profile is designed for specific testing scenarios and environments.
The Local-Green profile is the primary CI profile for fast, reliable testing across all platforms. It runs the majority of the test suite without requiring any external dependencies, network access, or additional binaries.
- Test Coverage: 791 tests (92.7% of total test suite)
- Duration: ~30 seconds
- Platforms: All (Linux, macOS, Windows)
- Dependencies: None (no external services, APIs, or binaries required)
- Network: No network calls
- Stability: High - designed to be deterministic and always pass
cargo test --tests -- \
--skip requires_claude_stub \
--skip requires_real_claude \
--skip requires_xchecker_binary \
--skip requires_future_phase \
--skip requires_future_api \
--skip requires_refactoring \
--skip windows_ci_onlyThe Local-Green profile skips the following test categories:
- requires_claude_stub: Tests requiring Claude API stub/mock setup
- requires_real_claude: Tests requiring actual Claude API access
- requires_xchecker_binary: Tests requiring a compiled xchecker binary
- requires_future_phase: Tests for features planned for future phases
- requires_future_api: Tests for API features not yet implemented
- requires_refactoring: Tests that need code refactoring before they can pass
- windows_ci_only: Tests that should only run on Windows CI environments
Here's the recommended GitHub Actions job configuration for the Local-Green profile:
name: Local-Green Tests
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
tests-local-green:
name: Local-Green (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
- name: Run Local-Green tests
run: |
cargo test --tests -- \
--skip requires_claude_stub \
--skip requires_real_claude \
--skip requires_xchecker_binary \
--skip requires_future_phase \
--skip requires_future_api \
--skip requires_refactoring \
--skip windows_ci_onlyThe Local-Green profile is designed to:
- Always pass on all platforms (Linux, macOS, Windows)
- Run quickly (~30 seconds) to provide fast feedback
- Require no setup beyond Rust toolchain installation
- Work offline with no network connectivity required
- Be deterministic with no flaky tests or race conditions
Use the Local-Green profile when:
- Running pre-commit checks locally
- Setting up CI for pull requests
- Verifying cross-platform compatibility
- Running tests in environments without external service access
- Needing fast feedback during development
If Local-Green tests fail:
- Check platform-specific issues: Ensure the failure isn't due to a platform-specific bug
- Verify no external dependencies: Confirm the test doesn't accidentally depend on external resources
- Review test isolation: Ensure tests are properly isolated and don't interfere with each other
- Check for timing issues: Verify there are no race conditions or timing-dependent assertions
When adding new tests to the project:
- By default, new tests should be included in the Local-Green profile
- If a test requires external dependencies, mark it with the appropriate skip tag
- Keep the Local-Green profile fast - tests taking >5 seconds individually should be reviewed
- Ensure new tests are deterministic and platform-agnostic unless specifically tagged
The Doc Validation profile validates documentation accuracy, code examples, and schema conformance. This profile ensures that all documentation examples compile and work correctly, and that JSON schema examples match their schemas.
- Test Coverage: Doctests + schema validation tests
- Duration: Fast (~5 seconds)
- Platforms: All (Linux, macOS, Windows)
- Dependencies: None (no external services or APIs required)
- Network: No network calls
- Stability: High - deterministic validation tests
# Run doctests (tests embedded in /// doc comments)
cargo test --doc
# Run schema example validation tests
cargo test schema_examples_tests-
Doctests: Tests embedded in source code documentation comments
- Config API examples
- Usage patterns in doc comments
- Code snippets throughout
src/ - Ensures examples compile and execute correctly
-
Schema Examples: Tests in
tests/doc_validation/schema_examples_tests.rs- Validates receipt/status/doctor schema examples
- Ensures examples conform to JSON schemas
- Verifies example generation functions work correctly
- Checks array sorting and determinism
The doc validation profile is part of the local-green baseline and must remain green at all times. These tests are critical for:
- Documentation accuracy: Ensures examples in docs actually work
- Schema conformance: Validates JSON outputs match their schemas
- API contract validation: Proves documented APIs behave as specified
Include in your CI pipeline as part of the standard test suite:
- name: Run doc validation
run: |
cargo test --doc
cargo test schema_examples_testsThe Doc Validation profile is designed to:
- Catch outdated examples: Documentation examples that no longer compile
- Validate schema compliance: JSON examples that don't match schemas
- Ensure API accuracy: Documented APIs match implementation
- Be fast: Complete in ~5 seconds for quick feedback
Use the Doc Validation profile when:
- Making changes to public APIs
- Updating documentation or examples
- Modifying JSON schemas
- Adding new example code to docs
- Running full CI validation
When updating code that affects documentation:
- Run
cargo test --docto verify examples still work - Run schema validation tests if changing JSON output
- Update examples in doc comments to reflect API changes
- Ensure new public APIs include working examples
The Stub Suite profile extends Local-Green by including integration tests that use the claude-stub binary to mock Claude API responses. This provides comprehensive integration testing without incurring API costs or requiring network access.
- Test Coverage: 840 tests (98.5% of total test suite)
- Duration: ~2 minutes
- Platforms: All (Linux, macOS, Windows)
- Dependencies:
claude-stubbinary (must be built first) - Network: No network calls (mocked responses)
- Stability: High - deterministic mocked responses
The Stub Suite requires building the claude-stub binary before running tests:
cargo build --bin claude-stub# Build claude-stub first
cargo build --bin claude-stub
# Run all tests except real Claude
cargo test --tests --include-ignored -- \
--skip requires_real_claude \
--skip requires_xchecker_binary \
--skip requires_future_phase \
--skip requires_future_api \
--skip requires_refactoring \
--skip windows_ci_onlyThe Stub Suite includes 49 additional tests marked with #[ignore = "requires_claude_stub"]:
- M1 gate integration tests (7 tests)
- M1 gate simple validation tests (8 tests)
- M3/M4 gate validation tests
- Golden pipeline tests (7 tests)
- End-to-end workflow tests with mocked LLM (6 tests)
Status: Not currently automated (optional/manual)
Recommended configuration if you want to enable it:
stub-suite:
name: Stub Suite (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Build claude-stub
run: cargo build --bin claude-stub
- name: Run Stub Suite
run: |
cargo test --tests --include-ignored -- \
--skip requires_real_claude \
--skip requires_xchecker_binary \
--skip requires_future_phase \
--skip requires_future_api \
--skip requires_refactoring \
--skip windows_ci_onlyUse the Stub Suite when:
- Testing LLM interaction logic without API costs
- Validating phase transitions and orchestration flows
- Pre-merge integration validation (more thorough than Local-Green)
- Local development when you need comprehensive coverage
- Debugging integration issues without hitting real APIs
The Stub Suite still skips:
- requires_real_claude (4 tests): Real Claude API smoke tests
- requires_xchecker_binary (2 tests): Binary integration tests
- requires_future_phase/api (4 tests): Unimplemented features
- requires_refactoring (2 tests): Tests needing code updates
- windows_ci_only (1 test): Platform-specific tests
The Firehose profile runs all 853 tests including those that make real Claude API calls. This is the most comprehensive test suite but also the most expensive and time-consuming. It should only be used for pre-release validation or manual investigation of real-world issues.
- Test Coverage: 853 tests (100% of all tests)
- Duration: ~5-10 minutes
- Platforms: Linux only (for cost control)
- Dependencies: Real Claude API access, all binaries, secrets
- Network: Required (real API calls)
- Stability: Low - network-dependent, can be flaky
The Firehose profile requires:
- ✅ Real Claude API access (Anthropic API key)
- ✅
claudeCLI binary available and configured, OR - ✅
ANTHROPIC_API_KEYenvironment variable set - ✅
xcheckerbinary compiled (cargo build --release) - ✅
claude-stubbinary compiled (cargo build --bin claude-stub) - ✅ Environment variable:
XCHECKER_ENABLE_REAL_CLAUDE=1
XCHECKER_ENABLE_REAL_CLAUDE=1 cargo test --tests --include-ignoredTrigger: MANUAL or NIGHTLY only (NOT on every PR/push)
Recommended configuration:
name: Firehose (Real Claude API)
on:
# Manual trigger only
workflow_dispatch:
# OR nightly schedule (choose one)
# schedule:
# - cron: '0 2 * * *' # 2 AM UTC daily
jobs:
firehose:
name: Firehose - All Tests (Real Claude)
runs-on: ubuntu-latest # Linux only for cost control
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Rust
uses: dtolnay/rust-toolchain@stable
- name: Build xchecker binary
run: cargo build --release
- name: Build claude-stub binary
run: cargo build --bin claude-stub
- name: Run Firehose test suite
run: XCHECKER_ENABLE_REAL_CLAUDE=1 cargo test --tests --include-ignored
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
continue-on-error: true # Don't fail build on flaky network issues
- name: Upload test results
if: always()
uses: actions/upload-artifact@v3
with:
name: firehose-test-results
path: |
target/debug/test-results/
firehose.logRequired GitHub Secrets:
ANTHROPIC_API_KEY- Your Anthropic Claude API key
Platform: Linux only (for cost control)
- Makes real API calls to Claude
- Each test run incurs API usage charges
- Estimated cost: $0.01 - $0.05 per run (based on 10-15 API calls)
- Rate limits may apply depending on your API tier
- Network dependency (internet connectivity required)
- API rate limits may cause intermittent failures
- External service availability affects test stability
- Timeouts possible under high load or slow connections
- Full suite runs ~5-10 minutes (vs. 30s for Local-Green)
- Not suitable for fast feedback loops
- Blocks on network I/O and API response times
- Serial execution may be required to avoid rate limits
- Too slow for PR validation
- Too expensive for every commit
- Too unreliable for blocking CI gates
- Use Local-Green or Stub Suite for routine checks
✅ Appropriate scenarios:
- Before major releases (v1.0, v2.0, etc.)
- Investigating real-world integration issues
- Validating Claude API compatibility after LLM updates
- Pre-deployment smoke testing in staging environments
- Manual validation of critical bug fixes
- Nightly regression testing (scheduled, non-blocking)
❌ Inappropriate scenarios:
- Every pull request (use Local-Green instead)
- Every commit to main (use Stub Suite instead)
- Local development loops (use Local-Green)
- Blocking CI gates (too slow and flaky)
The following tests require real Claude API access (marked with #[ignore = "requires_real_claude"]):
| Test File | Test Function | Purpose |
|---|---|---|
tests/smoke.rs |
test_real_claude_basic_interaction |
Verify real Claude CLI works |
tests/smoke.rs |
test_real_claude_streaming_response |
Verify streaming API works |
tests/test_exit_alignment.rs |
test_xchecker_exit_code_success |
Verify exit codes with real binary |
tests/test_exit_alignment.rs |
test_xchecker_exit_code_failure |
Verify error exit codes |
API Calls per Run:
- Real Claude tests: 4 test functions
- Each test may make 1-3 API calls
- Estimated: ~10-15 API calls per Firehose run
Monthly Cost Scenarios (rough estimates):
| Frequency | Runs/Month | Est. Cost/Month |
|---|---|---|
| Manual only | 5-10 | $0.05 - $0.50 |
| Nightly | 30 | $0.30 - $1.50 |
| Per-commit | 100+ | $1.00 - $5.00+ |
Recommendation: Manual or nightly only, NOT per-commit or per-PR
| Profile | Test Count | Duration | Cost | Network | Use Case |
|---|---|---|---|---|---|
| Local-Green | 791 (92.7%) | ~30s | Free | No | Default CI, PR validation |
| Stub Suite | 840 (98.5%) | ~2min | Free | No | Integration testing |
| Firehose | 853 (100%) | ~5-10min | $$ | Yes | Pre-release, real-world validation |
| Feature | Local-Green | Stub Suite | Firehose |
|---|---|---|---|
| Unit tests | ✅ | ✅ | ✅ |
| Dry-run integration | ✅ | ✅ | ✅ |
| Stub-based integration | ❌ | ✅ | ✅ |
| Real Claude API | ❌ | ❌ | ✅ |
| Binary integration | ❌ | ❌ | ✅ |
| Network required | ❌ | ❌ | ✅ |
| API costs | ❌ | ❌ | ✅ |
| Metric | Local-Green | Stub Suite | Firehose |
|---|---|---|---|
| Test count | 791 (92.7%) | 840 (98.5%) | 853 (100%) |
| Duration | ~30s | ~2min | ~5-10min |
| Parallelizable | ✅ | ✅ | |
| Deterministic | ✅ | ✅ | ❌ (network) |
| Cacheable | ✅ | ✅ | ❌ |
| Scenario | Recommended Profile | Rationale |
|---|---|---|
| PR validation | Local-Green | Fast feedback, no flakiness |
| Merge to main | Local-Green | Sufficient coverage for routine changes |
| Pre-release | Firehose | Comprehensive validation before shipping |
| Nightly | Firehose | Catch real-world integration issues |
| Manual testing | Stub Suite or Firehose | Depends on what you're validating |
| Local dev | Local-Green | Fast iteration cycle |
All tests use standardized #[ignore = "reason"] attributes for consistency.
| Marker | Count | Description | Included In |
|---|---|---|---|
requires_claude_stub |
49 | Needs claude-stub binary |
Stub Suite, Firehose |
requires_real_claude |
4 | Real Claude CLI + API | Firehose only |
requires_xchecker_binary |
2 | Compiled xchecker binary |
Firehose only |
requires_future_phase |
2 | Unimplemented phase (Review, Final) | None (will fail) |
requires_future_api |
2 | API not yet wired | None (will fail) |
requires_refactoring |
2 | Needs code refactoring | None (will fail) |
windows_ci_only |
1 | Windows-specific test | Platform-specific |
# Run ONLY tests with a specific marker
cargo test --tests -- --ignored --test requires_real_claude
# Skip tests with a specific marker
cargo test --tests -- --skip requires_real_claude
# Run all ignored tests (Firehose mode)
cargo test --tests -- --include-ignoredControls whether tests should attempt real Claude API calls.
Values:
1ortrue- Enable real Claude API calls (Firehose mode)- Unset or
0- Disable real Claude API calls (default)
Usage:
# Enable for Firehose
XCHECKER_ENABLE_REAL_CLAUDE=1 cargo test --tests --include-ignored
# Disable (default)
cargo test --testsRequired for real Claude API calls (Firehose profile).
Set in GitHub Actions:
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}Set locally:
export ANTHROPIC_API_KEY="sk-ant-..."
XCHECKER_ENABLE_REAL_CLAUDE=1 cargo test --tests --include-ignoredLocal development:
- Use Local-Green for fast iteration:
cargo test --lib - Use Stub Suite before pushing: Build stub, then run with
--include-ignored --skip requires_real_claude - Only use Firehose when debugging real API issues
Pull requests:
- Ensure Local-Green passes before creating PR
- Do NOT run Firehose in CI for every PR
- Add appropriate
#[ignore = "..."]markers to new tests
Pre-release:
- Run Stub Suite to catch integration issues
- Run Firehose manually to validate real API compatibility
- Document any Firehose failures in release notes
Required gates (fast, reliable):
- ✅ Local-Green on all PRs
- ✅ Lint and format checks
- ✅ Schema validation
- ✅ Secret scanning
Optional gates (slower, comprehensive):
⚠️ Stub Suite on main branch⚠️ Parallel tests (non-blocking, validate stability)
Manual/Nightly only (expensive, slow, flaky):
- ❌ Firehose - NOT for routine CI
This table describes the current GitHub Actions jobs and their required/optional status:
| Job | Workflow | When it runs | Required for PRs? | Description |
|---|---|---|---|---|
lint |
ci.yml | PRs, main | ✅ Yes | Format + clippy checks |
test-serial |
ci.yml | PRs, main | ✅ Yes | Serial tests on all 3 OS |
test-parallel |
ci.yml | PRs, main | ❌ No | Parallel tests (non-blocking, validating stability) |
schema-validation |
ci.yml | PRs, main | ✅ Yes | JSON schema compliance |
secret-scanning |
ci.yml | PRs, main | ✅ Yes | Secret detection tests |
docs-conformance |
ci.yml | PRs, main | ✅ Yes | Documentation validation |
gate-validation |
ci.yml | PRs, main | ✅ Yes | Gate command tests |
test-real |
ci.yml | main only | ❌ No | Real Claude API (requires secret) |
test-fast |
test.yml | PRs only | ✅ Yes | Quick unit tests (~30s) |
test-full |
test.yml | main, nightly | ❌ No | Comprehensive tests |
property-tests |
test.yml | main, nightly | ❌ No | Property-based tests with high case count |
stub-tests |
test.yml | PRs, main, nightly | ❌ No | Integration tests with claude-stub (non-blocking) |
example-validation |
test.yml | All events | ✅ Yes | Validate showcase examples |
walkthrough-validation |
test.yml | All events | ✅ Yes | Validate walkthrough snippets |
The stub-tests job currently runs on PRs but is not required in branch protection. This provides:
- Visibility: PR authors see stub-dependent test results before merge
- Non-blocking: Failures don't block merges while we validate stability
- Path to required: After 3 consecutive stable weeks, consider making this required
To promote stub-tests to required:
- Monitor stability in PR feedback for 3+ weeks
- If consistently green, add to branch protection required checks
- Update this documentation when promoting
Symptom: Firehose tests fail with network errors
Possible causes:
- No internet connectivity
- API rate limits exceeded
ANTHROPIC_API_KEYnot set or invalid- Claude CLI not installed
Solutions:
- Check network:
curl https://api.anthropic.com - Wait for rate limit reset (check API dashboard)
- Verify API key:
echo $ANTHROPIC_API_KEY - Install Claude CLI or set API key directly
Symptom: Firehose tests are slow
Possible causes:
- Network latency
- API response times
- Rate limiting backoff
Solutions:
- Run with
--test-threads=1to avoid rate limits - Run subset:
cargo test --test smoke -- --ignored - Use Stub Suite for faster feedback
Symptom: Tests fail with "claude-stub not found"
Possible causes:
claude-stubbinary not built- Wrong PATH configuration
Solutions:
# Build claude-stub first
cargo build --bin claude-stub
# Verify it's built
ls target/debug/claude-stub # Unix
dir target\debug\claude-stub.exe # Windows
# Run tests
cargo test --tests --include-ignored -- --skip requires_real_claudexchecker provides a gate command for enforcing spec completion policies in CI. There are two patterns for integrating the gate into your workflow:
The Smoke Gate pattern validates that the gate command and JSON output work correctly, but does NOT fail the CI job when the spec doesn't meet requirements. This is useful for:
- Initial integration testing
- Demonstrating gate functionality
- Non-blocking informational checks
Behavior: Always exits 0 if the gate command runs successfully, regardless of passed status.
- name: Run gate check (smoke)
run: |
set +e
./target/release/xchecker gate my-spec --min-phase tasks --json > gate-result.json
GATE_STATUS=$?
# Validate JSON structure
PASSED=$(cat gate-result.json | jq -r '.passed')
if [ "$PASSED" = "true" ]; then
echo "Gate PASSED"
else
echo "Gate returned passed=false (informational)"
echo "Failure reasons:"
cat gate-result.json | jq -r '.failure_reasons[]'
fi
# Always exit 0 for smoke test
exit 0The Strict Gate pattern enforces spec policies as blocking CI checks. When the gate returns passed=false, the CI job fails and blocks the merge. This is the recommended pattern for production use.
Behavior: Exits non-zero when passed=false, blocking the PR/merge.
- name: Run gate check (strict)
run: |
# Run gate and capture exit code
./target/release/xchecker gate my-spec \
--min-phase tasks \
--fail-on-pending-fixups \
--json > gate-result.json
GATE_STATUS=$?
# Display result
cat gate-result.json | jq .
# Check if gate passed
PASSED=$(cat gate-result.json | jq -r '.passed')
if [ "$PASSED" = "true" ]; then
echo "✓ Gate PASSED - spec meets all policy requirements"
exit 0
else
echo "✗ Gate FAILED - spec does not meet policy requirements"
echo ""
echo "Failure reasons:"
cat gate-result.json | jq -r '.failure_reasons[]'
echo ""
echo "To resolve:"
echo " 1. Run 'xchecker status my-spec' to see current progress"
echo " 2. Complete required phases: 'xchecker spec my-spec --phase <phase>'"
echo " 3. Address any pending fixups"
exit 1
fiTo convert from the smoke pattern to strict enforcement:
- Remove the
set +ethat suppresses errors - Remove the
exit 0at the end - Add explicit
exit 1whenpassed=false - Configure the job as a required status check in GitHub settings
GitHub Repository Settings:
- Go to Settings → Branches
- Add/edit branch protection rule for
main - Enable "Require status checks to pass before merging"
- Add "Gate Check" as a required status check
| Code | Meaning |
|---|---|
| 0 | Gate passed - all policy conditions met |
| 1 | Gate failed - one or more policy violations |
| 2+ | Runtime error (config, I/O, etc.) |
xchecker gate <spec-id> [OPTIONS]
Options:
--policy <path> Load gate policy from a TOML file
Defaults to .xchecker/policy.toml or ~/.config/xchecker/policy.toml
--min-phase <phase> Require at least this phase completed
Values: requirements, design, tasks, review, fixup, final
--fail-on-pending-fixups Fail if any pending fixups exist
--max-phase-age <duration> Fail if latest success is older than threshold
Format: 7d (days), 24h (hours), 30m (minutes)
--json Output structured JSON for CI parsingDifferent policies for different environments:
jobs:
gate-development:
# Lenient policy for feature branches
runs-on: ubuntu-latest
steps:
- run: xchecker gate $SPEC --min-phase requirements --json
gate-staging:
# Moderate policy for staging
runs-on: ubuntu-latest
steps:
- run: xchecker gate $SPEC --min-phase design --max-phase-age 7d --json
gate-production:
# Strict policy for production
runs-on: ubuntu-latest
steps:
- run: xchecker gate $SPEC --min-phase tasks --fail-on-pending-fixups --max-phase-age 24h --json- TEST_MATRIX.md - Detailed test inventory and statistics
- claude-stub.md - Test harness documentation
- CONFIGURATION.md - Runtime configuration options
- INDEX.md - Documentation index
.github/workflows/ci.yml- Current CI configuration.github/workflows/xchecker-gate.yml- Gate workflow example
2025-12-06 - CI Jobs Reference and stub stance
- Added CI Jobs Reference table documenting all workflow jobs
- Documented stub-tests as non-blocking on PRs with path to required
- Updated test.yml to run stub-tests on PRs for visibility
2025-12-02 - Initial comprehensive CI profiles documentation
- Documented Local-Green profile (existing)
- Added Doc Validation profile
- Added Stub Suite profile specification
- Added Firehose profile with detailed warnings and cost analysis
- Added GitHub Actions specifications for manual/nightly triggers
- Added comparison matrix and best practices
- Added troubleshooting guide