feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear by localai-bot · Pull Request #10446 · mudler/LocalAI

localai-bot · 2026-06-22T14:35:13Z

Summary

Adds server-side conversation compaction to the realtime (voice) API so long sessions stay cheap on CPU without forgetting earlier context. Today a realtime session either feeds the whole growing buffer to the LLM (latency death-spiral on CPU) or, with max_history_items, silently drops old turns and forgets them. This change lets the server fold aged-out turns into a rolling summary instead.

Two layers:

1. OpenAI-parity conversation events

The realtime endpoint was missing client-side history management that the OpenAI Realtime API relies on. Now implemented:

conversation.item.delete — was a not_implemented stub; now removes the item and emits conversation.item.deleted.
conversation.item.truncate — clears an assistant item's text/transcript at a content index (discard an interrupted/barge-in tail).
input_audio_buffer.clear — resets pending input audio.

2. Summarize-then-drop compactor

A rolling Conversation.Memory summary, kept out of Items so trimRealtimeItems can't drop it, injected into the prompt right after the instructions.
Filled by an async, post-turn compactor using a snapshot → summarize → commit pattern that never holds conv.Lock across the summarizer LLM call. Commit re-validates the head (prefixMatches) so a concurrent item.delete can't cause lost/misdropped data. Function-call/output pairs are never split across the boundary.
A failed/empty summary leaves the buffer untouched — items are never evicted without a summary replacing them.
This resolves the long-standing in-code TODO at the Conversation creation site.

Config (two-number model)

pipeline:
  max_history_items: 6        # live window — recent turns kept verbatim
  compaction:
    enabled: true
    trigger_items: 12         # high-water mark; summarize overflow back down to max_history_items
    summary_model: ""         # optional small/cheap model for the summary (CPU); default = pipeline LLM
    max_summary_tokens: 512

max_history_items = live window; compaction.trigger_items = high-water mark (must exceed it). A summary call runs roughly every (trigger_items - max_history_items) turns. The summarizer model is resolved lazily, inside the compaction goroutine (off the response path). With compaction absent/disabled, behavior is byte-for-byte unchanged.

Testing

New Ginkgo/Gomega specs for every pure helper (resolveCompaction, itemID, deleteItem, truncateAssistantText, clearInputAudio, compactionCut, withMemory, renderItemsTranscript, buildSummaryMessages, prefixMatches, compact, summarizerModel), incl. the commit/abort and summarizer-error paths.
go test -race ./core/http/endpoints/openai/... clean (the compactor spawns a goroutine).
make lint reports 0 issues on the feature files.
Folded into the existing TestOpenAI suite (single RunSpecs).

Notes / follow-ups (out of scope)

max_summary_tokens is advisory (fed to the prompt) in this PR; a hard Predict-level cap is a follow-up.
A small/fast summary_model for CPU loads via the realtime pipeline path and falls back to the pipeline LLM if it can't be constructed.
Docs: new "Conversation compaction" section in docs/content/features/openai-realtime.md.

🤖 Generated with Claude Code

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…ai suite Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Add a handler for the input_audio_buffer.clear client event that discards a partially-captured utterance (raw PCM + buffered Opus frames) via a unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Clears both .Text and .Transcript of the assistant content part at contentIndex so barge-in truncation also works for audio turns whose spoken words live in .Transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…avoids panic) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…y, off-path) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…ary stripping Replace the bespoke <think> regex in the compactor with the shared pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of the realtime path and covering all reasoning tag families, not just <think>. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have a UI/meta registry entry; add the four pipeline.compaction.* leaves so they render with proper labels/descriptions instead of the reflection fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 17 commits June 22, 2026 17:03

feat(realtime): add pipeline.compaction config + resolution

f14e0db

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

refactor(realtime): extract itemID helper, reuse in item.retrieve

ae1adab

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

test(realtime): drop duplicate Ginkgo bootstrap, fold specs into open…

237b701

…ai suite Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(realtime): implement conversation.item.delete

13f6880

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(realtime): add Conversation.Memory + pair-safe compactionCut

25decca

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, …

582b7e6

…avoids panic) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

style(realtime): gofmt compaction test helper closures

2a84d00

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(realtime): inject rolling memory into the prompt + summary builders

63bf24f

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(realtime): server-side summarize-then-drop compactor

29e1176

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

test(realtime): unit-test prefixMatches eviction-safety predicate

c967c2f

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(realtime): resolve summarizer model + schedule compaction per turn

09909c1

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

docs(realtime): document conversation compaction + new item events

6c567dd

Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

fix(realtime): resolve summary model inside compaction goroutine (laz…

487868e

…y, off-path) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/realtime-conversation-compaction branch from aee1a53 to f6edc05 Compare June 22, 2026 17:03

mudler merged commit fdf475e into master Jun 22, 2026
60 checks passed

mudler deleted the feat/realtime-conversation-compaction branch June 22, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear#10446

feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear#10446
mudler merged 17 commits into
masterfrom
feat/realtime-conversation-compaction

localai-bot commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants