fix: sanitize memory content before injecting into system prompt#5662
fix: sanitize memory content before injecting into system prompt#5662NIK-TIGER-BILL wants to merge 2 commits intocrewAIInc:mainfrom
Conversation
Memory records are now stripped of backticks, braces, and truncated to\n512 chars before being concatenated into the system prompt. This reduces\nthe surface for indirect prompt injection via poisoned memory entries.\n\nFixes crewAIInc#5057 Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
|
Thanks for tackling this — memory injection is a real attack surface that most agent frameworks overlook. A few architectural concerns with the current approach: 1. Truncation at 512 chars silently destroys context Many legitimate memory entries exceed 512 characters (e.g., conversation summaries, tool outputs, structured observations). Hard truncation would break these silently, which is worse than no sanitization because the agent would operate on incomplete information without any signal that data was lost. Suggestion: Use a token-based limit aligned with your context window budget, and when an entry exceeds it, either (a) skip it with a warning log, or (b) use the LLM to compress it rather than truncate mid-sentence. 2. Stripping backticks and braces breaks legitimate content Memory entries that contain code snippets, JSON configurations, or structured data would be corrupted by removing these characters. In our memory system we found that the right approach is prompt-level isolation, not character-level sanitization:
3. The real defense is prompt boundary discipline The root issue is that memory content is being concatenated into the system prompt as if it were trusted instruction text. The fix should be at the prompt architecture level: This way, even if a memory entry contains injection attempts, the LLM knows to treat the entire section as data, not instructions. This is more robust than any character-level filtering. Happy to elaborate on any of these points. We have been running a production memory system and have dealt with similar injection concerns. |
|
@caoyc Thank you for the thorough and thoughtful review — these are excellent points from real production experience. Re: 1. Truncation at 512 chars Re: 2. Stripping backticks/braces Re: 3. Prompt boundary discipline Next steps:
If you prefer, I am also happy to close this and open a follow-up PR with the boundary-marker approach as the primary fix rather than a secondary layer. Let me know what the maintainers prefer. |
…oundaries Per review feedback from @caoyc: - Remove hard 512-char truncation that silently destroyed context. - Replace character stripping (backticks/braces) with XML escaping. - Wrap memory injection in <memories>/<memory> boundary markers so the LLM treats the block as opaque data rather than instructions. - Update regression test to verify boundary markers and escaping. Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
|
@caoyc Thanks again for the excellent review — the architectural feedback was spot on. I have pushed a revised commit that:
This aligns with your recommendation to fix the issue at the prompt-architecture level rather than with character-level filtering. Let me know if you see anything else! |
Fixes #5057
Summary
LiteAgent._inject_memory()was concatenating raw memory content directly into the system prompt without any sanitization. Since memories can originate from tool outputs or previous interactions, a poisoned tool response could persist as a memory entry and later be injected as instructions.Changes
_sanitize_memory()helper in_inject_memory()that:test_lite_agent_memory_sanitization_injects_no_backticks_or_braces) that verifies poisoned memory records do not appear verbatim in the system prompt.AI usage disclosure
I used AI assistance for: