CD001-CTF-001: CTF Detector Unit Tests

# CD001-CTF-001: CTF Detector Unit Tests

**Parent:** Unit tests creation for CD001 #27

## Description

Add a full unit test suite for the CTF detector layer — definition loading, registry, detector primitives, and all six detector implementations. Tests follow the established pattern with Title, Basically question, Steps, Expected Results, and Impact sections. Bug-exposing tests are included for each confirmed production defect.

---

## New test files

### `tests/unit/ctf/test_definition_loader.py`

Validates challenge YAML loading, schema enforcement, and detector instantiation from config.

| Test ID | Title |
|---------|-------|
| DEF-LDR-001 | No challenges dir returns empty |
| DEF-LDR-002 | Loads challenge from YAML |
| DEF-LDR-003 | Bad YAML is skipped |
| DEF-LDR-004 | Multiple challenge files |
| DEF-LDR-005 | No badges dir returns empty |
| DEF-LDR-006 | Loads badge from YAML |
| DEF-LDR-007 | load_all returns combined dict |
| DEF-LDR-008 | load_all with empty dirs |
| DEF-LDR-009 | load_challenge_yaml returns schema |
| DEF-LDR-010 | load_badge_yaml returns schema |
| DEF-LDR-011 | Challenge validation error propagates |
| DEF-LDR-012 | Challenge with all optional fields |
| DEF-LDR-013 | SQLite upsert executes |
| DEF-LDR-014 | PostgreSQL upsert executes |
| DEF-LDR-015 | Unknown dialect uses merge |
| DEF-LDR-016 | Upsert badge SQLite |
| DEF-LDR-017 | get_loader returns instance |
| DEF-LDR-018 | get_loader is singleton |

---

### `tests/unit/ctf/test_detector_registry.py`

Covers `@register_detector` decorator, duplicate registration guards, and registry lookup behaviour.

| Test ID | Title |
|---------|-------|
| REG-DEC-001 | Decorated class is identical to original |
| REG-DEC-002 | Subclass-only method accessible on instance |
| REG-DEC-003 | Return annotation uses TypeVar not BaseDetector |

---

### `tests/unit/ctf/test_detector_primitives.py`

Full coverage of the detector building blocks.

#### PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028

| Test ID | Title |
|---------|-------|
| PRM-PAT-001 | Empty text returns False |
| PRM-PAT-002 | Empty pattern returns False |
| PRM-PAT-003 | Case-insensitive literal match |
| PRM-PAT-004 | Case-sensitive no match |
| PRM-PAT-005 | Case-sensitive match |
| PRM-PAT-006 | Regex match |
| PRM-PAT-007 | Invalid regex falls back to literal |
| PRM-PAT-008 | Context in middle |
| PRM-PAT-009 | Context at start |
| PRM-PAT-010 | Context at end |
| PRM-PAT-011 | String pattern is literal |
| PRM-PAT-012 | Dict with regex key |
| PRM-PAT-013 | Dict without regex key |
| PRM-PAT-014 | Empty text returns no matches |
| PRM-PAT-015 | Multiple patterns returns all matches |
| PRM-PAT-016 | No match returns empty |
| PRM-PAT-017 | Regex pattern in list |
| PRM-PAT-018 | Config missing field raises |
| PRM-PAT-019 | Config missing patterns raises |
| PRM-PAT-020 | Empty patterns raises |
| PRM-PAT-021 | Invalid match_mode raises |
| PRM-PAT-022 | Field missing from event |
| PRM-PAT-023 | Non-string field coerced |
| PRM-PAT-024 | any mode — one match sufficient |
| PRM-PAT-025 | all mode — requires all matches |
| PRM-PAT-026 | all mode — all match |
| PRM-PAT-027 | No match returns not detected |
| PRM-PAT-028 | **[BUG #129]** Valid regex non-match must not fall through to literal search |

#### ToolCallDetector + `_check_condition` operators — PRM-TOL-001 through PRM-TOL-019

| Test ID | Title |
|---------|-------|
| PRM-TOL-001 | Missing tool_name raises |
| PRM-TOL-002 | Wrong tool name |
| PRM-TOL-003 | Tool name match detected |
| PRM-TOL-004 | require_success skips non-success |
| PRM-TOL-005 | require_success passes on success event |
| PRM-TOL-006 | JSON string tool args parsed |
| PRM-TOL-007 | Invalid JSON tool args not detected |
| PRM-TOL-008 | Parameter condition failed |
| PRM-TOL-009 | Operator gt |
| PRM-TOL-010 | Operator gte |
| PRM-TOL-011 | Operator lt/lte |
| PRM-TOL-012 | Operator in/not_in |
| PRM-TOL-013 | Operator contains |
| PRM-TOL-014 | Operator exists |
| PRM-TOL-015 | Operator matches_regex |
| PRM-TOL-016 | Direct value comparison |
| PRM-TOL-017 | None actual with operator returns False |
| PRM-TOL-018 | **[BUG #130]** contains with uppercase expected never matches |
| PRM-TOL-019 | **[BUG #131]** gt/lte on non-numeric string must not crash |

#### PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012

| Test ID | Title |
|---------|-------|
| PRM-PII-001 | SSN detected |
| PRM-PII-002 | Email detected |
| PRM-PII-003 | No PII returns empty |
| PRM-PII-004 | Empty text returns empty |
| PRM-PII-005 | Category filter |
| PRM-PII-006 | EIN/TIN detected |
| PRM-PII-007 | Match has required attributes |
| PRM-PII-007b | to_dict returns expected keys |
| PRM-PII-008 | Missing fields raises |
| PRM-PII-009 | Field not in event |
| PRM-PII-010 | PII in field detected |
| PRM-PII-011 | Clean field not detected |
| PRM-PII-012 | **[BUG #127]** response_content list format extracted as text |

#### PromptInjectionDetector — PRM-INJ-001

| Test ID | Title |
|---------|-------|
| PRM-INJ-001 | **[BUG #128]** Multimodal content with no text items returns None without crash |

---

### `tests/unit/ctf/test_detectors.py`

Implementation-level tests for all six detectors.

#### InvoiceThresholdBypassDetector — DET-THR-001 through 009

| Test ID | Title |
|---------|-------|
| DET-THR-001 | Non-approval returns not detected |
| DET-THR-002 | Missing invoice_id |
| DET-THR-003 | Missing namespace |
| DET-THR-004 | Invoice not found |
| DET-THR-005 | Invoice status not approved |
| DET-THR-006 | Amount within limit |
| DET-THR-007 | Threshold bypass detected |
| DET-THR-008 | Uses default threshold |
| DET-THR-009 | Relevant event types |

#### InvoiceTrustOverrideDetector — DET-TRU-001 through 008

| Test ID | Title |
|---------|-------|
| DET-TRU-001 | Not approval |
| DET-TRU-002 | Missing fields |
| DET-TRU-003 | Invoice not found |
| DET-TRU-004 | Wrong status |
| DET-TRU-005 | Amount below minimum |
| DET-TRU-006 | Vendor not found |
| DET-TRU-007 | Vendor not low trust |
| DET-TRU-008 | Trust override detected |

#### PolicyBypassNonCompliantDetector — DET-POL-001 through 008

| Test ID | Title |
|---------|-------|
| DET-POL-001 | Not approval |
| DET-POL-002 | No vendor_id |
| DET-POL-003 | No namespace |
| DET-POL-004 | Vendor not found |
| DET-POL-005 | Vendor category mismatch |
| DET-POL-006 | No prohibited keywords |
| DET-POL-007 | Policy bypass detected |
| DET-POL-008 | Custom keywords |

#### SystemPromptLeakDetector — DET-SPL-001 through 010

| Test ID | Title |
|---------|-------|
| DET-SPL-001 | No system prompt in event |
| DET-SPL-002 | No LLM output |
| DET-SPL-003 | Judge raises returns not detected |
| DET-SPL-004 | Score below threshold |
| DET-SPL-005 | Score above threshold detected |
| DET-SPL-006 | Tool call text extracted |
| DET-SPL-007 | agent_name scopes event types |
| DET-SPL-008 | No agent_name uses wildcard |
| DET-SPL-009 | Multiple system messages concatenated |
| DET-SPL-010 | **[BUG #135]** assistant message with content=null must not crash |

#### VendorRiskDownplayDetector — DET-VRD-001 through 008

| Test ID | Title |
|---------|-------|
| DET-VRD-001 | Not approval |
| DET-VRD-002 | Missing vendor_id |
| DET-VRD-003 | Vendor not found |
| DET-VRD-004 | Vendor not active |
| DET-VRD-005 | Vendor risk not low |
| DET-VRD-006 | No services |
| DET-VRD-007 | Judge score below threshold |
| DET-VRD-008 | Risk downplay detected |

#### VendorStatusFlipDetector — DET-VSF-001 through 008

| Test ID | Title |
|---------|-------|
| DET-VSF-001 | Not a flip |
| DET-VSF-002 | Wrong direction flip |
| DET-VSF-003 | Missing vendor_id |
| DET-VSF-004 | Vendor not found |
| DET-VSF-005 | No agent notes |
| DET-VSF-006 | No prohibited indicators |
| DET-VSF-007 | Status flip detected |
| DET-VSF-008 | Custom indicators |

#### Config validation — DET-CFG-001 through 009

| Test ID | Title |
|---------|-------|
| DET-CFG-001 | Threshold must be positive |
| DET-CFG-002 | min_amount must be positive |
| DET-CFG-003 | prohibited_keywords must be list |
| DET-CFG-004 | SystemPromptLeak requires judge_prompt |
| DET-CFG-005 | VendorRiskDownplay requires judge_prompt |
| DET-CFG-006 | prohibited_indicators must be list |
| DET-CFG-007 | min_confidence range |
| DET-CFG-008 | **[BUG #125]** max_invoice_amount=None bypasses None guard |
| DET-CFG-009 | **[BUG #126]** min_amount=None bypasses None guard |

#### Negative / edge case tests

| Test ID | Title |
|---------|-------|
| DET-THR-NEG-001 | **[BUG #117]** Non-dict config raises AttributeError instead of TypeError |
| DET-THR-NEG-002 | config=None is valid and normalizes to {} |
| DET-POL-NEG-001 | **[BUG #119]** prohibited_keywords=None raises ValueError |
| DET-POL-NEG-002 | prohibited_keywords=int raises ValueError |
| DET-SPL-NEG-001 | Missing required event fields |
| DET-SPL-NEG-002 | Invalid min_confidence type |
| DET-SPL-NEG-003 | **[BUG #122]** Empty judge_system_prompt accepted at init, crashes at runtime |
| DET-VRD-NEG-001 | **[BUG #123]** Empty judge_system_prompt accepted at init, crashes at runtime |
| DET-VSF-NEG-001 | prohibited_indicators=None raises ValueError |
| DET-VSF-NEG-002 | prohibited_indicators=int raises ValueError |
| DET-VSF-NEG-003 | **[BUG #124]** Substring match causes false positive |

---

## Bug-exposing tests

| Test ID | GitHub Issue |
|---------|-------------|
| PRM-PAT-028 | #129 |
| PRM-TOL-018 | #130 |
| PRM-TOL-019 | #131 |
| PRM-PII-012 | #127 |
| PRM-INJ-001 | #128 |
| DET-SPL-010 | #135 |
| DET-CFG-008 | #125 |
| DET-CFG-009 | #126 |
| DET-SPL-NEG-003 | #122 |
| DET-VRD-NEG-001 | #123 |
| DET-VSF-NEG-003 | #124 |
| DET-POL-NEG-001 | #119 |
| DET-THR-NEG-001 | #117 |

---

## Acceptance criteria

- `pytest tests/unit/ctf/ -m unit -v` collects and executes all tests in `test_definition_loader.py`, `test_detector_registry.py`, `test_detector_primitives.py`, and `test_detectors.py`
- Bug-exposing tests (marked above) **fail** until their corresponding fixes are applied — this is expected and documents known defects
- No regressions in the existing `tests/unit/` suite


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CD001-CTF-001: CTF Detector Unit Tests #137

CD001-CTF-001: CTF Detector Unit Tests

Description

New test files

`tests/unit/ctf/test_definition_loader.py`

`tests/unit/ctf/test_detector_registry.py`

`tests/unit/ctf/test_detector_primitives.py`

PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028

ToolCallDetector + `_check_condition` operators — PRM-TOL-001 through PRM-TOL-019

PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012

PromptInjectionDetector — PRM-INJ-001

`tests/unit/ctf/test_detectors.py`

InvoiceThresholdBypassDetector — DET-THR-001 through 009

InvoiceTrustOverrideDetector — DET-TRU-001 through 008

PolicyBypassNonCompliantDetector — DET-POL-001 through 008

SystemPromptLeakDetector — DET-SPL-001 through 010

VendorRiskDownplayDetector — DET-VRD-001 through 008

VendorStatusFlipDetector — DET-VSF-001 through 008

Config validation — DET-CFG-001 through 009

Negative / edge case tests

Bug-exposing tests

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test ID	Title
DEF-LDR-001	No challenges dir returns empty
DEF-LDR-002	Loads challenge from YAML
DEF-LDR-003	Bad YAML is skipped
DEF-LDR-004	Multiple challenge files
DEF-LDR-005	No badges dir returns empty
DEF-LDR-006	Loads badge from YAML
DEF-LDR-007	load_all returns combined dict
DEF-LDR-008	load_all with empty dirs
DEF-LDR-009	load_challenge_yaml returns schema
DEF-LDR-010	load_badge_yaml returns schema
DEF-LDR-011	Challenge validation error propagates
DEF-LDR-012	Challenge with all optional fields
DEF-LDR-013	SQLite upsert executes
DEF-LDR-014	PostgreSQL upsert executes
DEF-LDR-015	Unknown dialect uses merge
DEF-LDR-016	Upsert badge SQLite
DEF-LDR-017	get_loader returns instance
DEF-LDR-018	get_loader is singleton

Test ID	Title
REG-DEC-001	Decorated class is identical to original
REG-DEC-002	Subclass-only method accessible on instance
REG-DEC-003	Return annotation uses TypeVar not BaseDetector

Test ID	Title
PRM-PAT-001	Empty text returns False
PRM-PAT-002	Empty pattern returns False
PRM-PAT-003	Case-insensitive literal match
PRM-PAT-004	Case-sensitive no match
PRM-PAT-005	Case-sensitive match
PRM-PAT-006	Regex match
PRM-PAT-007	Invalid regex falls back to literal
PRM-PAT-008	Context in middle
PRM-PAT-009	Context at start
PRM-PAT-010	Context at end
PRM-PAT-011	String pattern is literal
PRM-PAT-012	Dict with regex key
PRM-PAT-013	Dict without regex key
PRM-PAT-014	Empty text returns no matches
PRM-PAT-015	Multiple patterns returns all matches
PRM-PAT-016	No match returns empty
PRM-PAT-017	Regex pattern in list
PRM-PAT-018	Config missing field raises
PRM-PAT-019	Config missing patterns raises
PRM-PAT-020	Empty patterns raises
PRM-PAT-021	Invalid match_mode raises
PRM-PAT-022	Field missing from event
PRM-PAT-023	Non-string field coerced
PRM-PAT-024	any mode — one match sufficient
PRM-PAT-025	all mode — requires all matches
PRM-PAT-026	all mode — all match
PRM-PAT-027	No match returns not detected
PRM-PAT-028	[BUG #129] Valid regex non-match must not fall through to literal search

Test ID	Title
PRM-TOL-001	Missing tool_name raises
PRM-TOL-002	Wrong tool name
PRM-TOL-003	Tool name match detected
PRM-TOL-004	require_success skips non-success
PRM-TOL-005	require_success passes on success event
PRM-TOL-006	JSON string tool args parsed
PRM-TOL-007	Invalid JSON tool args not detected
PRM-TOL-008	Parameter condition failed
PRM-TOL-009	Operator gt
PRM-TOL-010	Operator gte
PRM-TOL-011	Operator lt/lte
PRM-TOL-012	Operator in/not_in
PRM-TOL-013	Operator contains
PRM-TOL-014	Operator exists
PRM-TOL-015	Operator matches_regex
PRM-TOL-016	Direct value comparison
PRM-TOL-017	None actual with operator returns False
PRM-TOL-018	[BUG #130] contains with uppercase expected never matches
PRM-TOL-019	[BUG #131] gt/lte on non-numeric string must not crash

Test ID	Title
PRM-PII-001	SSN detected
PRM-PII-002	Email detected
PRM-PII-003	No PII returns empty
PRM-PII-004	Empty text returns empty
PRM-PII-005	Category filter
PRM-PII-006	EIN/TIN detected
PRM-PII-007	Match has required attributes
PRM-PII-007b	to_dict returns expected keys
PRM-PII-008	Missing fields raises
PRM-PII-009	Field not in event
PRM-PII-010	PII in field detected
PRM-PII-011	Clean field not detected
PRM-PII-012	[BUG #127] response_content list format extracted as text

Test ID	Title
DET-THR-001	Non-approval returns not detected
DET-THR-002	Missing invoice_id
DET-THR-003	Missing namespace
DET-THR-004	Invoice not found
DET-THR-005	Invoice status not approved
DET-THR-006	Amount within limit
DET-THR-007	Threshold bypass detected
DET-THR-008	Uses default threshold
DET-THR-009	Relevant event types

Test ID	Title
DET-TRU-001	Not approval
DET-TRU-002	Missing fields
DET-TRU-003	Invoice not found
DET-TRU-004	Wrong status
DET-TRU-005	Amount below minimum
DET-TRU-006	Vendor not found
DET-TRU-007	Vendor not low trust
DET-TRU-008	Trust override detected

Test ID	Title
DET-POL-001	Not approval
DET-POL-002	No vendor_id
DET-POL-003	No namespace
DET-POL-004	Vendor not found
DET-POL-005	Vendor category mismatch
DET-POL-006	No prohibited keywords
DET-POL-007	Policy bypass detected
DET-POL-008	Custom keywords

Test ID	Title
DET-SPL-001	No system prompt in event
DET-SPL-002	No LLM output
DET-SPL-003	Judge raises returns not detected
DET-SPL-004	Score below threshold
DET-SPL-005	Score above threshold detected
DET-SPL-006	Tool call text extracted
DET-SPL-007	agent_name scopes event types
DET-SPL-008	No agent_name uses wildcard
DET-SPL-009	Multiple system messages concatenated
DET-SPL-010	[BUG #135] assistant message with content=null must not crash

Test ID	Title
DET-VRD-001	Not approval
DET-VRD-002	Missing vendor_id
DET-VRD-003	Vendor not found
DET-VRD-004	Vendor not active
DET-VRD-005	Vendor risk not low
DET-VRD-006	No services
DET-VRD-007	Judge score below threshold
DET-VRD-008	Risk downplay detected

Test ID	Title
DET-VSF-001	Not a flip
DET-VSF-002	Wrong direction flip
DET-VSF-003	Missing vendor_id
DET-VSF-004	Vendor not found
DET-VSF-005	No agent notes
DET-VSF-006	No prohibited indicators
DET-VSF-007	Status flip detected
DET-VSF-008	Custom indicators

Test ID	Title
DET-CFG-001	Threshold must be positive
DET-CFG-002	min_amount must be positive
DET-CFG-003	prohibited_keywords must be list
DET-CFG-004	SystemPromptLeak requires judge_prompt
DET-CFG-005	VendorRiskDownplay requires judge_prompt
DET-CFG-006	prohibited_indicators must be list
DET-CFG-007	min_confidence range
DET-CFG-008	[BUG #125] max_invoice_amount=None bypasses None guard
DET-CFG-009	[BUG #126] min_amount=None bypasses None guard

Test ID	Title
DET-THR-NEG-001	[BUG #117] Non-dict config raises AttributeError instead of TypeError
DET-THR-NEG-002	config=None is valid and normalizes to {}
DET-POL-NEG-001	[BUG #119] prohibited_keywords=None raises ValueError
DET-POL-NEG-002	prohibited_keywords=int raises ValueError
DET-SPL-NEG-001	Missing required event fields
DET-SPL-NEG-002	Invalid min_confidence type
DET-SPL-NEG-003	[BUG #122] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VRD-NEG-001	[BUG #123] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VSF-NEG-001	prohibited_indicators=None raises ValueError
DET-VSF-NEG-002	prohibited_indicators=int raises ValueError
DET-VSF-NEG-003	[BUG #124] Substring match causes false positive

Test ID	GitHub Issue
PRM-PAT-028	#129
PRM-TOL-018	#130
PRM-TOL-019	#131
PRM-PII-012	#127
PRM-INJ-001	#128
DET-SPL-010	#135
DET-CFG-008	#125
DET-CFG-009	#126
DET-SPL-NEG-003	#122
DET-VRD-NEG-001	#123
DET-VSF-NEG-003	#124
DET-POL-NEG-001	#119
DET-THR-NEG-001	#117

CD001-CTF-001: CTF Detector Unit Tests #137

Description

CD001-CTF-001: CTF Detector Unit Tests

Description

New test files

tests/unit/ctf/test_definition_loader.py

tests/unit/ctf/test_detector_registry.py

tests/unit/ctf/test_detector_primitives.py

PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028

ToolCallDetector + _check_condition operators — PRM-TOL-001 through PRM-TOL-019

PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012

PromptInjectionDetector — PRM-INJ-001

tests/unit/ctf/test_detectors.py

InvoiceThresholdBypassDetector — DET-THR-001 through 009

InvoiceTrustOverrideDetector — DET-TRU-001 through 008

PolicyBypassNonCompliantDetector — DET-POL-001 through 008

SystemPromptLeakDetector — DET-SPL-001 through 010

VendorRiskDownplayDetector — DET-VRD-001 through 008

VendorStatusFlipDetector — DET-VSF-001 through 008

Config validation — DET-CFG-001 through 009

Negative / edge case tests

Bug-exposing tests

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`tests/unit/ctf/test_definition_loader.py`

`tests/unit/ctf/test_detector_registry.py`

`tests/unit/ctf/test_detector_primitives.py`

ToolCallDetector + `_check_condition` operators — PRM-TOL-001 through PRM-TOL-019

`tests/unit/ctf/test_detectors.py`