Skip to content

[Feature] [Advanced] Implement Context Compression for Long-Term Memory #3

@JStaRFilms

Description

@JStaRFilms

Description:

Problem:
The conversation history is limited by a fixed maxlen (e.g., 50 messages). Once this limit is reached, the oldest interactions are permanently lost, leading to a loss of long-term context.

Proposed Solution:
Implement an advanced context compression mechanism. When the history buffer is full, instead of just dropping the oldest conversation turn, the system will use an internal LLM call to summarize that turn and replace it with the condensed summary.

Implementation Details:

  1. In history_manager.py, modify the add_message() method.
  2. Before adding a new message, check if len(self.history) >= self.history.maxlen.
  3. If the buffer is full:
    a. Pop the two oldest messages (the first user/assistant pair) from the deque.
    b. Make a separate, non-streamed, internal call to the LLM via llm_interface.py.
    c. The prompt for this call will be specialized for summarization, e.g., "Condense the key information from this user/assistant turn into one sentence: USER: [old user message] ASSISTANT: [old assistant message]".
    d. Take the LLM's summarized response.
    e. Prepend a new "system" message to the start of the history deque containing the summary, e.g., {"role": "system", "content": "[SUMMARIZED CONTEXT]: [summary text]"}.

Acceptance Criteria:

  • When the conversation history exceeds its maxlen, the oldest turn is replaced by a single summarized system message at the beginning of the context.
  • The effective memory of the conversation is extended, allowing the AI to reference topics from much earlier in the chat.
  • The compression process is seamless and does not noticeably delay the user's current query.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions