Skip to content

CD001-CTF-001: CTF Detector Unit Tests #137

@steadhac

Description

@steadhac

CD001-CTF-001: CTF Detector Unit Tests

Parent: Unit tests creation for CD001 #27

Description

Add a full unit test suite for the CTF detector layer — definition loading, registry, detector primitives, and all six detector implementations. Tests follow the established pattern with Title, Basically question, Steps, Expected Results, and Impact sections. Bug-exposing tests are included for each confirmed production defect.


New test files

tests/unit/ctf/test_definition_loader.py

Validates challenge YAML loading, schema enforcement, and detector instantiation from config.

Test ID Title
DEF-LDR-001 No challenges dir returns empty
DEF-LDR-002 Loads challenge from YAML
DEF-LDR-003 Bad YAML is skipped
DEF-LDR-004 Multiple challenge files
DEF-LDR-005 No badges dir returns empty
DEF-LDR-006 Loads badge from YAML
DEF-LDR-007 load_all returns combined dict
DEF-LDR-008 load_all with empty dirs
DEF-LDR-009 load_challenge_yaml returns schema
DEF-LDR-010 load_badge_yaml returns schema
DEF-LDR-011 Challenge validation error propagates
DEF-LDR-012 Challenge with all optional fields
DEF-LDR-013 SQLite upsert executes
DEF-LDR-014 PostgreSQL upsert executes
DEF-LDR-015 Unknown dialect uses merge
DEF-LDR-016 Upsert badge SQLite
DEF-LDR-017 get_loader returns instance
DEF-LDR-018 get_loader is singleton

tests/unit/ctf/test_detector_registry.py

Covers @register_detector decorator, duplicate registration guards, and registry lookup behaviour.

Test ID Title
REG-DEC-001 Decorated class is identical to original
REG-DEC-002 Subclass-only method accessible on instance
REG-DEC-003 Return annotation uses TypeVar not BaseDetector

tests/unit/ctf/test_detector_primitives.py

Full coverage of the detector building blocks.

PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028

Test ID Title
PRM-PAT-001 Empty text returns False
PRM-PAT-002 Empty pattern returns False
PRM-PAT-003 Case-insensitive literal match
PRM-PAT-004 Case-sensitive no match
PRM-PAT-005 Case-sensitive match
PRM-PAT-006 Regex match
PRM-PAT-007 Invalid regex falls back to literal
PRM-PAT-008 Context in middle
PRM-PAT-009 Context at start
PRM-PAT-010 Context at end
PRM-PAT-011 String pattern is literal
PRM-PAT-012 Dict with regex key
PRM-PAT-013 Dict without regex key
PRM-PAT-014 Empty text returns no matches
PRM-PAT-015 Multiple patterns returns all matches
PRM-PAT-016 No match returns empty
PRM-PAT-017 Regex pattern in list
PRM-PAT-018 Config missing field raises
PRM-PAT-019 Config missing patterns raises
PRM-PAT-020 Empty patterns raises
PRM-PAT-021 Invalid match_mode raises
PRM-PAT-022 Field missing from event
PRM-PAT-023 Non-string field coerced
PRM-PAT-024 any mode — one match sufficient
PRM-PAT-025 all mode — requires all matches
PRM-PAT-026 all mode — all match
PRM-PAT-027 No match returns not detected
PRM-PAT-028 [BUG #129] Valid regex non-match must not fall through to literal search

ToolCallDetector + _check_condition operators — PRM-TOL-001 through PRM-TOL-019

Test ID Title
PRM-TOL-001 Missing tool_name raises
PRM-TOL-002 Wrong tool name
PRM-TOL-003 Tool name match detected
PRM-TOL-004 require_success skips non-success
PRM-TOL-005 require_success passes on success event
PRM-TOL-006 JSON string tool args parsed
PRM-TOL-007 Invalid JSON tool args not detected
PRM-TOL-008 Parameter condition failed
PRM-TOL-009 Operator gt
PRM-TOL-010 Operator gte
PRM-TOL-011 Operator lt/lte
PRM-TOL-012 Operator in/not_in
PRM-TOL-013 Operator contains
PRM-TOL-014 Operator exists
PRM-TOL-015 Operator matches_regex
PRM-TOL-016 Direct value comparison
PRM-TOL-017 None actual with operator returns False
PRM-TOL-018 [BUG #130] contains with uppercase expected never matches
PRM-TOL-019 [BUG #131] gt/lte on non-numeric string must not crash

PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012

Test ID Title
PRM-PII-001 SSN detected
PRM-PII-002 Email detected
PRM-PII-003 No PII returns empty
PRM-PII-004 Empty text returns empty
PRM-PII-005 Category filter
PRM-PII-006 EIN/TIN detected
PRM-PII-007 Match has required attributes
PRM-PII-007b to_dict returns expected keys
PRM-PII-008 Missing fields raises
PRM-PII-009 Field not in event
PRM-PII-010 PII in field detected
PRM-PII-011 Clean field not detected
PRM-PII-012 [BUG #127] response_content list format extracted as text

PromptInjectionDetector — PRM-INJ-001

Test ID Title
PRM-INJ-001 [BUG #128] Multimodal content with no text items returns None without crash

tests/unit/ctf/test_detectors.py

Implementation-level tests for all six detectors.

InvoiceThresholdBypassDetector — DET-THR-001 through 009

Test ID Title
DET-THR-001 Non-approval returns not detected
DET-THR-002 Missing invoice_id
DET-THR-003 Missing namespace
DET-THR-004 Invoice not found
DET-THR-005 Invoice status not approved
DET-THR-006 Amount within limit
DET-THR-007 Threshold bypass detected
DET-THR-008 Uses default threshold
DET-THR-009 Relevant event types

InvoiceTrustOverrideDetector — DET-TRU-001 through 008

Test ID Title
DET-TRU-001 Not approval
DET-TRU-002 Missing fields
DET-TRU-003 Invoice not found
DET-TRU-004 Wrong status
DET-TRU-005 Amount below minimum
DET-TRU-006 Vendor not found
DET-TRU-007 Vendor not low trust
DET-TRU-008 Trust override detected

PolicyBypassNonCompliantDetector — DET-POL-001 through 008

Test ID Title
DET-POL-001 Not approval
DET-POL-002 No vendor_id
DET-POL-003 No namespace
DET-POL-004 Vendor not found
DET-POL-005 Vendor category mismatch
DET-POL-006 No prohibited keywords
DET-POL-007 Policy bypass detected
DET-POL-008 Custom keywords

SystemPromptLeakDetector — DET-SPL-001 through 010

Test ID Title
DET-SPL-001 No system prompt in event
DET-SPL-002 No LLM output
DET-SPL-003 Judge raises returns not detected
DET-SPL-004 Score below threshold
DET-SPL-005 Score above threshold detected
DET-SPL-006 Tool call text extracted
DET-SPL-007 agent_name scopes event types
DET-SPL-008 No agent_name uses wildcard
DET-SPL-009 Multiple system messages concatenated
DET-SPL-010 [BUG #135] assistant message with content=null must not crash

VendorRiskDownplayDetector — DET-VRD-001 through 008

Test ID Title
DET-VRD-001 Not approval
DET-VRD-002 Missing vendor_id
DET-VRD-003 Vendor not found
DET-VRD-004 Vendor not active
DET-VRD-005 Vendor risk not low
DET-VRD-006 No services
DET-VRD-007 Judge score below threshold
DET-VRD-008 Risk downplay detected

VendorStatusFlipDetector — DET-VSF-001 through 008

Test ID Title
DET-VSF-001 Not a flip
DET-VSF-002 Wrong direction flip
DET-VSF-003 Missing vendor_id
DET-VSF-004 Vendor not found
DET-VSF-005 No agent notes
DET-VSF-006 No prohibited indicators
DET-VSF-007 Status flip detected
DET-VSF-008 Custom indicators

Config validation — DET-CFG-001 through 009

Test ID Title
DET-CFG-001 Threshold must be positive
DET-CFG-002 min_amount must be positive
DET-CFG-003 prohibited_keywords must be list
DET-CFG-004 SystemPromptLeak requires judge_prompt
DET-CFG-005 VendorRiskDownplay requires judge_prompt
DET-CFG-006 prohibited_indicators must be list
DET-CFG-007 min_confidence range
DET-CFG-008 [BUG #125] max_invoice_amount=None bypasses None guard
DET-CFG-009 [BUG #126] min_amount=None bypasses None guard

Negative / edge case tests

Test ID Title
DET-THR-NEG-001 [BUG #117] Non-dict config raises AttributeError instead of TypeError
DET-THR-NEG-002 config=None is valid and normalizes to {}
DET-POL-NEG-001 [BUG #119] prohibited_keywords=None raises ValueError
DET-POL-NEG-002 prohibited_keywords=int raises ValueError
DET-SPL-NEG-001 Missing required event fields
DET-SPL-NEG-002 Invalid min_confidence type
DET-SPL-NEG-003 [BUG #122] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VRD-NEG-001 [BUG #123] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VSF-NEG-001 prohibited_indicators=None raises ValueError
DET-VSF-NEG-002 prohibited_indicators=int raises ValueError
DET-VSF-NEG-003 [BUG #124] Substring match causes false positive

Bug-exposing tests

Test ID GitHub Issue
PRM-PAT-028 #129
PRM-TOL-018 #130
PRM-TOL-019 #131
PRM-PII-012 #127
PRM-INJ-001 #128
DET-SPL-010 #135
DET-CFG-008 #125
DET-CFG-009 #126
DET-SPL-NEG-003 #122
DET-VRD-NEG-001 #123
DET-VSF-NEG-003 #124
DET-POL-NEG-001 #119
DET-THR-NEG-001 #117

Acceptance criteria

  • pytest tests/unit/ctf/ -m unit -v collects and executes all tests in test_definition_loader.py, test_detector_registry.py, test_detector_primitives.py, and test_detectors.py
  • Bug-exposing tests (marked above) fail until their corresponding fixes are applied — this is expected and documents known defects
  • No regressions in the existing tests/unit/ suite

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions