Description:
Problem:
The conversation history is limited by a fixed maxlen (e.g., 50 messages). Once this limit is reached, the oldest interactions are permanently lost, leading to a loss of long-term context.
Proposed Solution:
Implement an advanced context compression mechanism. When the history buffer is full, instead of just dropping the oldest conversation turn, the system will use an internal LLM call to summarize that turn and replace it with the condensed summary.
Implementation Details:
- In
history_manager.py, modify the add_message() method.
- Before adding a new message, check if
len(self.history) >= self.history.maxlen.
- If the buffer is full:
a. Pop the two oldest messages (the first user/assistant pair) from the deque.
b. Make a separate, non-streamed, internal call to the LLM via llm_interface.py.
c. The prompt for this call will be specialized for summarization, e.g., "Condense the key information from this user/assistant turn into one sentence: USER: [old user message] ASSISTANT: [old assistant message]".
d. Take the LLM's summarized response.
e. Prepend a new "system" message to the start of the history deque containing the summary, e.g., {"role": "system", "content": "[SUMMARIZED CONTEXT]: [summary text]"}.
Acceptance Criteria:
Description:
Problem:
The conversation history is limited by a fixed
maxlen(e.g., 50 messages). Once this limit is reached, the oldest interactions are permanently lost, leading to a loss of long-term context.Proposed Solution:
Implement an advanced context compression mechanism. When the history buffer is full, instead of just dropping the oldest conversation turn, the system will use an internal LLM call to summarize that turn and replace it with the condensed summary.
Implementation Details:
history_manager.py, modify theadd_message()method.len(self.history) >= self.history.maxlen.a. Pop the two oldest messages (the first user/assistant pair) from the deque.
b. Make a separate, non-streamed, internal call to the LLM via
llm_interface.py.c. The prompt for this call will be specialized for summarization, e.g.,
"Condense the key information from this user/assistant turn into one sentence: USER: [old user message] ASSISTANT: [old assistant message]".d. Take the LLM's summarized response.
e. Prepend a new "system" message to the start of the history deque containing the summary, e.g.,
{"role": "system", "content": "[SUMMARIZED CONTEXT]: [summary text]"}.Acceptance Criteria:
maxlen, the oldest turn is replaced by a single summarized system message at the beginning of the context.