CD001-CTF-001: CTF Detector Unit Tests
Parent: Unit tests creation for CD001 #27
Description
Add a full unit test suite for the CTF detector layer — definition loading, registry, detector primitives, and all six detector implementations. Tests follow the established pattern with Title, Basically question, Steps, Expected Results, and Impact sections. Bug-exposing tests are included for each confirmed production defect.
New test files
tests/unit/ctf/test_definition_loader.py
Validates challenge YAML loading, schema enforcement, and detector instantiation from config.
| Test ID |
Title |
| DEF-LDR-001 |
No challenges dir returns empty |
| DEF-LDR-002 |
Loads challenge from YAML |
| DEF-LDR-003 |
Bad YAML is skipped |
| DEF-LDR-004 |
Multiple challenge files |
| DEF-LDR-005 |
No badges dir returns empty |
| DEF-LDR-006 |
Loads badge from YAML |
| DEF-LDR-007 |
load_all returns combined dict |
| DEF-LDR-008 |
load_all with empty dirs |
| DEF-LDR-009 |
load_challenge_yaml returns schema |
| DEF-LDR-010 |
load_badge_yaml returns schema |
| DEF-LDR-011 |
Challenge validation error propagates |
| DEF-LDR-012 |
Challenge with all optional fields |
| DEF-LDR-013 |
SQLite upsert executes |
| DEF-LDR-014 |
PostgreSQL upsert executes |
| DEF-LDR-015 |
Unknown dialect uses merge |
| DEF-LDR-016 |
Upsert badge SQLite |
| DEF-LDR-017 |
get_loader returns instance |
| DEF-LDR-018 |
get_loader is singleton |
tests/unit/ctf/test_detector_registry.py
Covers @register_detector decorator, duplicate registration guards, and registry lookup behaviour.
| Test ID |
Title |
| REG-DEC-001 |
Decorated class is identical to original |
| REG-DEC-002 |
Subclass-only method accessible on instance |
| REG-DEC-003 |
Return annotation uses TypeVar not BaseDetector |
tests/unit/ctf/test_detector_primitives.py
Full coverage of the detector building blocks.
PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028
| Test ID |
Title |
| PRM-PAT-001 |
Empty text returns False |
| PRM-PAT-002 |
Empty pattern returns False |
| PRM-PAT-003 |
Case-insensitive literal match |
| PRM-PAT-004 |
Case-sensitive no match |
| PRM-PAT-005 |
Case-sensitive match |
| PRM-PAT-006 |
Regex match |
| PRM-PAT-007 |
Invalid regex falls back to literal |
| PRM-PAT-008 |
Context in middle |
| PRM-PAT-009 |
Context at start |
| PRM-PAT-010 |
Context at end |
| PRM-PAT-011 |
String pattern is literal |
| PRM-PAT-012 |
Dict with regex key |
| PRM-PAT-013 |
Dict without regex key |
| PRM-PAT-014 |
Empty text returns no matches |
| PRM-PAT-015 |
Multiple patterns returns all matches |
| PRM-PAT-016 |
No match returns empty |
| PRM-PAT-017 |
Regex pattern in list |
| PRM-PAT-018 |
Config missing field raises |
| PRM-PAT-019 |
Config missing patterns raises |
| PRM-PAT-020 |
Empty patterns raises |
| PRM-PAT-021 |
Invalid match_mode raises |
| PRM-PAT-022 |
Field missing from event |
| PRM-PAT-023 |
Non-string field coerced |
| PRM-PAT-024 |
any mode — one match sufficient |
| PRM-PAT-025 |
all mode — requires all matches |
| PRM-PAT-026 |
all mode — all match |
| PRM-PAT-027 |
No match returns not detected |
| PRM-PAT-028 |
[BUG #129] Valid regex non-match must not fall through to literal search |
ToolCallDetector + _check_condition operators — PRM-TOL-001 through PRM-TOL-019
| Test ID |
Title |
| PRM-TOL-001 |
Missing tool_name raises |
| PRM-TOL-002 |
Wrong tool name |
| PRM-TOL-003 |
Tool name match detected |
| PRM-TOL-004 |
require_success skips non-success |
| PRM-TOL-005 |
require_success passes on success event |
| PRM-TOL-006 |
JSON string tool args parsed |
| PRM-TOL-007 |
Invalid JSON tool args not detected |
| PRM-TOL-008 |
Parameter condition failed |
| PRM-TOL-009 |
Operator gt |
| PRM-TOL-010 |
Operator gte |
| PRM-TOL-011 |
Operator lt/lte |
| PRM-TOL-012 |
Operator in/not_in |
| PRM-TOL-013 |
Operator contains |
| PRM-TOL-014 |
Operator exists |
| PRM-TOL-015 |
Operator matches_regex |
| PRM-TOL-016 |
Direct value comparison |
| PRM-TOL-017 |
None actual with operator returns False |
| PRM-TOL-018 |
[BUG #130] contains with uppercase expected never matches |
| PRM-TOL-019 |
[BUG #131] gt/lte on non-numeric string must not crash |
PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012
| Test ID |
Title |
| PRM-PII-001 |
SSN detected |
| PRM-PII-002 |
Email detected |
| PRM-PII-003 |
No PII returns empty |
| PRM-PII-004 |
Empty text returns empty |
| PRM-PII-005 |
Category filter |
| PRM-PII-006 |
EIN/TIN detected |
| PRM-PII-007 |
Match has required attributes |
| PRM-PII-007b |
to_dict returns expected keys |
| PRM-PII-008 |
Missing fields raises |
| PRM-PII-009 |
Field not in event |
| PRM-PII-010 |
PII in field detected |
| PRM-PII-011 |
Clean field not detected |
| PRM-PII-012 |
[BUG #127] response_content list format extracted as text |
PromptInjectionDetector — PRM-INJ-001
| Test ID |
Title |
| PRM-INJ-001 |
[BUG #128] Multimodal content with no text items returns None without crash |
tests/unit/ctf/test_detectors.py
Implementation-level tests for all six detectors.
InvoiceThresholdBypassDetector — DET-THR-001 through 009
| Test ID |
Title |
| DET-THR-001 |
Non-approval returns not detected |
| DET-THR-002 |
Missing invoice_id |
| DET-THR-003 |
Missing namespace |
| DET-THR-004 |
Invoice not found |
| DET-THR-005 |
Invoice status not approved |
| DET-THR-006 |
Amount within limit |
| DET-THR-007 |
Threshold bypass detected |
| DET-THR-008 |
Uses default threshold |
| DET-THR-009 |
Relevant event types |
InvoiceTrustOverrideDetector — DET-TRU-001 through 008
| Test ID |
Title |
| DET-TRU-001 |
Not approval |
| DET-TRU-002 |
Missing fields |
| DET-TRU-003 |
Invoice not found |
| DET-TRU-004 |
Wrong status |
| DET-TRU-005 |
Amount below minimum |
| DET-TRU-006 |
Vendor not found |
| DET-TRU-007 |
Vendor not low trust |
| DET-TRU-008 |
Trust override detected |
PolicyBypassNonCompliantDetector — DET-POL-001 through 008
| Test ID |
Title |
| DET-POL-001 |
Not approval |
| DET-POL-002 |
No vendor_id |
| DET-POL-003 |
No namespace |
| DET-POL-004 |
Vendor not found |
| DET-POL-005 |
Vendor category mismatch |
| DET-POL-006 |
No prohibited keywords |
| DET-POL-007 |
Policy bypass detected |
| DET-POL-008 |
Custom keywords |
SystemPromptLeakDetector — DET-SPL-001 through 010
| Test ID |
Title |
| DET-SPL-001 |
No system prompt in event |
| DET-SPL-002 |
No LLM output |
| DET-SPL-003 |
Judge raises returns not detected |
| DET-SPL-004 |
Score below threshold |
| DET-SPL-005 |
Score above threshold detected |
| DET-SPL-006 |
Tool call text extracted |
| DET-SPL-007 |
agent_name scopes event types |
| DET-SPL-008 |
No agent_name uses wildcard |
| DET-SPL-009 |
Multiple system messages concatenated |
| DET-SPL-010 |
[BUG #135] assistant message with content=null must not crash |
VendorRiskDownplayDetector — DET-VRD-001 through 008
| Test ID |
Title |
| DET-VRD-001 |
Not approval |
| DET-VRD-002 |
Missing vendor_id |
| DET-VRD-003 |
Vendor not found |
| DET-VRD-004 |
Vendor not active |
| DET-VRD-005 |
Vendor risk not low |
| DET-VRD-006 |
No services |
| DET-VRD-007 |
Judge score below threshold |
| DET-VRD-008 |
Risk downplay detected |
VendorStatusFlipDetector — DET-VSF-001 through 008
| Test ID |
Title |
| DET-VSF-001 |
Not a flip |
| DET-VSF-002 |
Wrong direction flip |
| DET-VSF-003 |
Missing vendor_id |
| DET-VSF-004 |
Vendor not found |
| DET-VSF-005 |
No agent notes |
| DET-VSF-006 |
No prohibited indicators |
| DET-VSF-007 |
Status flip detected |
| DET-VSF-008 |
Custom indicators |
Config validation — DET-CFG-001 through 009
| Test ID |
Title |
| DET-CFG-001 |
Threshold must be positive |
| DET-CFG-002 |
min_amount must be positive |
| DET-CFG-003 |
prohibited_keywords must be list |
| DET-CFG-004 |
SystemPromptLeak requires judge_prompt |
| DET-CFG-005 |
VendorRiskDownplay requires judge_prompt |
| DET-CFG-006 |
prohibited_indicators must be list |
| DET-CFG-007 |
min_confidence range |
| DET-CFG-008 |
[BUG #125] max_invoice_amount=None bypasses None guard |
| DET-CFG-009 |
[BUG #126] min_amount=None bypasses None guard |
Negative / edge case tests
| Test ID |
Title |
| DET-THR-NEG-001 |
[BUG #117] Non-dict config raises AttributeError instead of TypeError |
| DET-THR-NEG-002 |
config=None is valid and normalizes to {} |
| DET-POL-NEG-001 |
[BUG #119] prohibited_keywords=None raises ValueError |
| DET-POL-NEG-002 |
prohibited_keywords=int raises ValueError |
| DET-SPL-NEG-001 |
Missing required event fields |
| DET-SPL-NEG-002 |
Invalid min_confidence type |
| DET-SPL-NEG-003 |
[BUG #122] Empty judge_system_prompt accepted at init, crashes at runtime |
| DET-VRD-NEG-001 |
[BUG #123] Empty judge_system_prompt accepted at init, crashes at runtime |
| DET-VSF-NEG-001 |
prohibited_indicators=None raises ValueError |
| DET-VSF-NEG-002 |
prohibited_indicators=int raises ValueError |
| DET-VSF-NEG-003 |
[BUG #124] Substring match causes false positive |
Bug-exposing tests
| Test ID |
GitHub Issue |
| PRM-PAT-028 |
#129 |
| PRM-TOL-018 |
#130 |
| PRM-TOL-019 |
#131 |
| PRM-PII-012 |
#127 |
| PRM-INJ-001 |
#128 |
| DET-SPL-010 |
#135 |
| DET-CFG-008 |
#125 |
| DET-CFG-009 |
#126 |
| DET-SPL-NEG-003 |
#122 |
| DET-VRD-NEG-001 |
#123 |
| DET-VSF-NEG-003 |
#124 |
| DET-POL-NEG-001 |
#119 |
| DET-THR-NEG-001 |
#117 |
Acceptance criteria
pytest tests/unit/ctf/ -m unit -v collects and executes all tests in test_definition_loader.py, test_detector_registry.py, test_detector_primitives.py, and test_detectors.py
- Bug-exposing tests (marked above) fail until their corresponding fixes are applied — this is expected and documents known defects
- No regressions in the existing
tests/unit/ suite
CD001-CTF-001: CTF Detector Unit Tests
Parent: Unit tests creation for CD001 #27
Description
Add a full unit test suite for the CTF detector layer — definition loading, registry, detector primitives, and all six detector implementations. Tests follow the established pattern with Title, Basically question, Steps, Expected Results, and Impact sections. Bug-exposing tests are included for each confirmed production defect.
New test files
tests/unit/ctf/test_definition_loader.pyValidates challenge YAML loading, schema enforcement, and detector instantiation from config.
tests/unit/ctf/test_detector_registry.pyCovers
@register_detectordecorator, duplicate registration guards, and registry lookup behaviour.tests/unit/ctf/test_detector_primitives.pyFull coverage of the detector building blocks.
PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028
ToolCallDetector +
_check_conditionoperators — PRM-TOL-001 through PRM-TOL-019PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012
PromptInjectionDetector — PRM-INJ-001
tests/unit/ctf/test_detectors.pyImplementation-level tests for all six detectors.
InvoiceThresholdBypassDetector — DET-THR-001 through 009
InvoiceTrustOverrideDetector — DET-TRU-001 through 008
PolicyBypassNonCompliantDetector — DET-POL-001 through 008
SystemPromptLeakDetector — DET-SPL-001 through 010
VendorRiskDownplayDetector — DET-VRD-001 through 008
VendorStatusFlipDetector — DET-VSF-001 through 008
Config validation — DET-CFG-001 through 009
Negative / edge case tests
Bug-exposing tests
Acceptance criteria
pytest tests/unit/ctf/ -m unit -vcollects and executes all tests intest_definition_loader.py,test_detector_registry.py,test_detector_primitives.py, andtest_detectors.pytests/unit/suite