Skip to content

feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear#10446

Merged
mudler merged 17 commits into
masterfrom
feat/realtime-conversation-compaction
Jun 22, 2026
Merged

feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear#10446
mudler merged 17 commits into
masterfrom
feat/realtime-conversation-compaction

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Summary

Adds server-side conversation compaction to the realtime (voice) API so long sessions stay cheap on CPU without forgetting earlier context. Today a realtime session either feeds the whole growing buffer to the LLM (latency death-spiral on CPU) or, with max_history_items, silently drops old turns and forgets them. This change lets the server fold aged-out turns into a rolling summary instead.

Two layers:

1. OpenAI-parity conversation events

The realtime endpoint was missing client-side history management that the OpenAI Realtime API relies on. Now implemented:

  • conversation.item.delete — was a not_implemented stub; now removes the item and emits conversation.item.deleted.
  • conversation.item.truncate — clears an assistant item's text/transcript at a content index (discard an interrupted/barge-in tail).
  • input_audio_buffer.clear — resets pending input audio.

2. Summarize-then-drop compactor

  • A rolling Conversation.Memory summary, kept out of Items so trimRealtimeItems can't drop it, injected into the prompt right after the instructions.
  • Filled by an async, post-turn compactor using a snapshot → summarize → commit pattern that never holds conv.Lock across the summarizer LLM call. Commit re-validates the head (prefixMatches) so a concurrent item.delete can't cause lost/misdropped data. Function-call/output pairs are never split across the boundary.
  • A failed/empty summary leaves the buffer untouched — items are never evicted without a summary replacing them.
  • This resolves the long-standing in-code TODO at the Conversation creation site.

Config (two-number model)

pipeline:
  max_history_items: 6        # live window — recent turns kept verbatim
  compaction:
    enabled: true
    trigger_items: 12         # high-water mark; summarize overflow back down to max_history_items
    summary_model: ""         # optional small/cheap model for the summary (CPU); default = pipeline LLM
    max_summary_tokens: 512

max_history_items = live window; compaction.trigger_items = high-water mark (must exceed it). A summary call runs roughly every (trigger_items - max_history_items) turns. The summarizer model is resolved lazily, inside the compaction goroutine (off the response path). With compaction absent/disabled, behavior is byte-for-byte unchanged.

Testing

  • New Ginkgo/Gomega specs for every pure helper (resolveCompaction, itemID, deleteItem, truncateAssistantText, clearInputAudio, compactionCut, withMemory, renderItemsTranscript, buildSummaryMessages, prefixMatches, compact, summarizerModel), incl. the commit/abort and summarizer-error paths.
  • go test -race ./core/http/endpoints/openai/... clean (the compactor spawns a goroutine).
  • make lint reports 0 issues on the feature files.
  • Folded into the existing TestOpenAI suite (single RunSpecs).

Notes / follow-ups (out of scope)

  • max_summary_tokens is advisory (fed to the prompt) in this PR; a hard Predict-level cap is a follow-up.
  • A small/fast summary_model for CPU loads via the realtime pipeline path and falls back to the pipeline LLM if it can't be constructed.
  • Docs: new "Conversation compaction" section in docs/content/features/openai-realtime.md.

🤖 Generated with Claude Code

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler added 17 commits June 22, 2026 17:03
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ai suite

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Add a handler for the input_audio_buffer.clear client event that discards
a partially-captured utterance (raw PCM + buffered Opus frames) via a
unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Clears both .Text and .Transcript of the assistant content part at
contentIndex so barge-in truncation also works for audio turns whose
spoken words live in .Transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…avoids panic)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…y, off-path)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ary stripping

Replace the bespoke <think> regex in the compactor with the shared
pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of
the realtime path and covering all reasoning tag families, not just <think>.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have
a UI/meta registry entry; add the four pipeline.compaction.* leaves so they
render with proper labels/descriptions instead of the reflection fallback.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/realtime-conversation-compaction branch from aee1a53 to f6edc05 Compare June 22, 2026 17:03
@mudler mudler merged commit fdf475e into master Jun 22, 2026
60 checks passed
@mudler mudler deleted the feat/realtime-conversation-compaction branch June 22, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants