Skip to content

fix: sanitize memory content before injecting into system prompt#5662

Open
NIK-TIGER-BILL wants to merge 2 commits intocrewAIInc:mainfrom
NIK-TIGER-BILL:fix/memory-sanitization
Open

fix: sanitize memory content before injecting into system prompt#5662
NIK-TIGER-BILL wants to merge 2 commits intocrewAIInc:mainfrom
NIK-TIGER-BILL:fix/memory-sanitization

Conversation

@NIK-TIGER-BILL
Copy link
Copy Markdown
Contributor

Fixes #5057

Summary

LiteAgent._inject_memory() was concatenating raw memory content directly into the system prompt without any sanitization. Since memories can originate from tool outputs or previous interactions, a poisoned tool response could persist as a memory entry and later be injected as instructions.

Changes

  • Introduced a _sanitize_memory() helper in _inject_memory() that:
    • Removes backticks, braces, and other characters commonly used to delimit instructions in LLM prompts.
    • Strips null bytes and truncates entries to 512 characters to prevent context-overflow tricks.
  • Added a regression test (test_lite_agent_memory_sanitization_injects_no_backticks_or_braces) that verifies poisoned memory records do not appear verbatim in the system prompt.

AI usage disclosure

I used AI assistance for:

  • Code generation
  • Test generation
  • Research and understanding

Memory records are now stripped of backticks, braces, and truncated to\n512 chars before being concatenated into the system prompt. This reduces\nthe surface for indirect prompt injection via poisoned memory entries.\n\nFixes crewAIInc#5057

Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
@caoyc
Copy link
Copy Markdown

caoyc commented May 4, 2026

Thanks for tackling this — memory injection is a real attack surface that most agent frameworks overlook.

A few architectural concerns with the current approach:

1. Truncation at 512 chars silently destroys context

Many legitimate memory entries exceed 512 characters (e.g., conversation summaries, tool outputs, structured observations). Hard truncation would break these silently, which is worse than no sanitization because the agent would operate on incomplete information without any signal that data was lost.

Suggestion: Use a token-based limit aligned with your context window budget, and when an entry exceeds it, either (a) skip it with a warning log, or (b) use the LLM to compress it rather than truncate mid-sentence.

2. Stripping backticks and braces breaks legitimate content

Memory entries that contain code snippets, JSON configurations, or structured data would be corrupted by removing these characters. In our memory system we found that the right approach is prompt-level isolation, not character-level sanitization:

  • Wrap memory injection in unambiguous boundary markers (e.g., xml-style tags)
  • Escape special characters rather than removing them
  • Never interpolate memory content as part of the instruction — always treat it as opaque data within a delimited section

3. The real defense is prompt boundary discipline

The root issue is that memory content is being concatenated into the system prompt as if it were trusted instruction text. The fix should be at the prompt architecture level:

This way, even if a memory entry contains injection attempts, the LLM knows to treat the entire section as data, not instructions. This is more robust than any character-level filtering.

Happy to elaborate on any of these points. We have been running a production memory system and have dealt with similar injection concerns.

@NIK-TIGER-BILL
Copy link
Copy Markdown
Contributor Author

@caoyc Thank you for the thorough and thoughtful review — these are excellent points from real production experience.

Re: 1. Truncation at 512 chars
You are absolutely right that hard truncation silently destroys context. The 512 limit was intended as a conservative stopgap, not a long-term solution. I agree that a token-based budget with warning logs (or compression) is the correct approach.

Re: 2. Stripping backticks/braces
Valid concern — legitimate code snippets and JSON get corrupted. I will update the PR to escape rather than strip, preserving data integrity.

Re: 3. Prompt boundary discipline
This is the strongest point. Character-level filtering is a band-aid; the real fix is architectural — wrapping memory in unambiguous delimiters so the LLM treats it as opaque data.

Next steps:
I will revise this PR to:

  1. Remove hard truncation and replace with escape-based sanitization
  2. Add XML-style boundary markers around memory injection
  3. Keep the guard-rail but move the defense closer to the prompt architecture

If you prefer, I am also happy to close this and open a follow-up PR with the boundary-marker approach as the primary fix rather than a secondary layer. Let me know what the maintainers prefer.

…oundaries

Per review feedback from @caoyc:
- Remove hard 512-char truncation that silently destroyed context.
- Replace character stripping (backticks/braces) with XML escaping.
- Wrap memory injection in <memories>/<memory> boundary markers so
  the LLM treats the block as opaque data rather than instructions.
- Update regression test to verify boundary markers and escaping.

Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
@NIK-TIGER-BILL
Copy link
Copy Markdown
Contributor Author

@caoyc Thanks again for the excellent review — the architectural feedback was spot on.

I have pushed a revised commit that:

  1. Removed the 512-char hard truncation — no more silent data loss.
  2. Replaced character stripping with XML escaping (&, <, >) so backticks, braces, and code snippets remain intact.
  3. Wrapped memory injection in <memories>/<memory> boundary markers — the LLM now sees the block as opaque data rather than part of the instruction text.

This aligns with your recommendation to fix the issue at the prompt-architecture level rather than with character-level filtering. Let me know if you see anything else!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] Memory content injected into system prompt without sanitization enables indirect prompt injection

2 participants