Skip to content

auto: cycle 0 (prompt-injection) — add ii_css_font_injection pattern …#64

Merged
killertcell428 merged 1 commit into
masterfrom
claude/eloquent-davinci-vQ7Y3
May 18, 2026
Merged

auto: cycle 0 (prompt-injection) — add ii_css_font_injection pattern …#64
killertcell428 merged 1 commit into
masterfrom
claude/eloquent-davinci-vQ7Y3

Conversation

@killertcell428
Copy link
Copy Markdown
Owner

…(arxiv:2505.16957)

Adds detection for CSS @font-face rules with remote HTTP(S) font sources in retrieved or external web content. Malicious fonts remap ASCII characters so human readers see innocuous text while the LLM tokenises adversarial injection instructions. Demonstrated against MCP-enabled agents in two attack scenarios (content relay + data exfiltration via tool calls) with safety-filter bypass (arxiv:2505.16957, May 2026).

Files changed:

  • aigis/filters/patterns.py: add ii_css_font_injection to INDIRECT_INJECTION_PATTERNS
  • tests/test_prompt_injection_cycle0_pass4.py: 16 new tests (all pass)
  • auto-improvement/: research, changes, pending, INDEX, ROTATION updates

Summary

Closes #

Changes

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • New detection pattern
  • Breaking change (fix or feature that would cause existing behaviour to change)
  • Documentation update
  • Refactor / performance improvement

Testing

  • pytest tests/ -v passes locally
  • New tests added for the change
  • Existing tests updated if needed (explain why)

For new detection patterns, confirm both:

  • Positive test — the pattern correctly detects a malicious input
  • Negative test — the pattern does NOT fire on legitimate input

Checklist

  • Code follows the style of the project (ruff check passes)
  • Type annotations are correct (mypy aigis/ passes)
  • Public API changes are reflected in docs/api-reference.md
  • CHANGELOG.md updated under [Unreleased]
  • I have read CONTRIBUTING.md

Screenshots / output

…(arxiv:2505.16957)

Adds detection for CSS @font-face rules with remote HTTP(S) font sources in retrieved
or external web content.  Malicious fonts remap ASCII characters so human readers see
innocuous text while the LLM tokenises adversarial injection instructions.  Demonstrated
against MCP-enabled agents in two attack scenarios (content relay + data exfiltration via
tool calls) with safety-filter bypass (arxiv:2505.16957, May 2026).

Files changed:
- aigis/filters/patterns.py: add ii_css_font_injection to INDIRECT_INJECTION_PATTERNS
- tests/test_prompt_injection_cycle0_pass4.py: 16 new tests (all pass)
- auto-improvement/: research, changes, pending, INDEX, ROTATION updates

Signed-off-by: killertcell428 <killertcell428@gmail.com>
@killertcell428 killertcell428 force-pushed the claude/eloquent-davinci-vQ7Y3 branch from eebe87e to 5964d68 Compare May 18, 2026 16:05
@killertcell428 killertcell428 merged commit a23c417 into master May 18, 2026
14 checks passed
@killertcell428 killertcell428 deleted the claude/eloquent-davinci-vQ7Y3 branch May 18, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant