Skip to content

Fix memory system: lower thresholds, add dedup, store from free-mode#8

Draft
AaronGoldsmith wants to merge 2 commits intomainfrom
fix/memory-thresholds-and-dedup
Draft

Fix memory system: lower thresholds, add dedup, store from free-mode#8
AaronGoldsmith wants to merge 2 commits intomainfrom
fix/memory-thresholds-and-dedup

Conversation

@AaronGoldsmith
Copy link
Owner

  • Lower similarity thresholds from 0.9/0.7 to 0.5/0.3 so memory actually influences agent selection (real similarities are 0.3-0.5)
  • Add dedup check in memory.store() to prevent duplicate task+winner entries
  • Store memory entries from free-mode matches via record_verdict.py so /mobius-run competitions feed back into the selection system

This pull request enhances the verdict recording script and memory management in the Mobius project. The main improvements involve storing match outcomes in vector memory for future retrieval, preventing duplicate memory entries, and adjusting similarity thresholds to make memory retrieval more permissive.

Memory storage and retrieval improvements:

  • The record_verdict.py script now stores the winning agent and task description in vector memory after each verdict, enabling future matches to benefit from past outcomes. This uses the embed function and the Memory class to create and store a MemoryEntry. Errors during storage are caught and logged as warnings. [1] [2] [3]
  • The Memory.store method now checks for duplicates before inserting a new memory entry, preventing redundant data storage for the same agent and task combination.

Configuration changes:

  • The similarity thresholds for specialist and ensemble memory retrieval have been lowered (similarity_specialist_threshold from 0.9 to 0.5, and similarity_ensemble_threshold from 0.7 to 0.3) to make memory lookups more permissive and likely to return results.

- Lower similarity thresholds from 0.9/0.7 to 0.5/0.3 so memory
  actually influences agent selection (real similarities are 0.3-0.5)
- Add dedup check in memory.store() to prevent duplicate task+winner entries
- Store memory entries from free-mode matches via record_verdict.py
  so /mobius-run competitions feed back into the selection system

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 15, 2026 23:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Mobius’s vector-memory feedback loop so past match outcomes influence future agent selection more reliably, including when verdicts are recorded via the free-mode record_verdict.py script.

Changes:

  • Lowered strategy selection similarity thresholds to make memory retrieval more permissive.
  • Added duplicate prevention in Memory.store() to avoid repeated task+winner entries.
  • Extended record_verdict.py to store match outcomes into vector memory after recording a verdict.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/mobius/memory.py Adds a duplicate check before inserting new memory entries.
src/mobius/config.py Lowers memory similarity thresholds used by the selection strategy.
.claude/skills/mobius-judge/scripts/record_verdict.py Stores winner + task into memory after verdict recording.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +43
existing = self.conn.execute(
"SELECT id FROM memory WHERE task_text = ? AND winning_agent_id = ?",
(entry.task_text, entry.winning_agent_id),
).fetchone()
if existing:
Comment on lines +133 to +140
task_vec = embed(task_text, config)
memory_entry = MemoryEntry(
task_embedding=vec_to_blob(task_vec),
task_text=task_text,
winning_agent_id=full_winner_id,
score=max(scores.values()) if scores else 0.0,
)
memory.store(memory_entry)
Comment on lines +129 to +133
# Store in vector memory so future selections benefit
task_text = match.get("task_description", "")
if task_text and full_winner_id:
try:
task_vec = embed(task_text, config)
All four providers (Anthropic, Google, OpenAI, OpenRouter) were hardcoded
to max_tokens=4096, causing output truncation on complex tasks. Models
support much higher limits (Haiku 8k, Sonnet/GPT-4o 16k, Gemini 65k).
Bump to 16384 which is safe for all models — APIs naturally cap at
each model's actual limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants