Fix memory system: lower thresholds, add dedup, store from free-mode by AaronGoldsmith · Pull Request #8 · AaronGoldsmith/mobius

AaronGoldsmith · 2026-03-15T23:32:45Z

Lower similarity thresholds from 0.9/0.7 to 0.5/0.3 so memory actually influences agent selection (real similarities are 0.3-0.5)
Add dedup check in memory.store() to prevent duplicate task+winner entries
Store memory entries from free-mode matches via record_verdict.py so /mobius-run competitions feed back into the selection system

This pull request enhances the verdict recording script and memory management in the Mobius project. The main improvements involve storing match outcomes in vector memory for future retrieval, preventing duplicate memory entries, and adjusting similarity thresholds to make memory retrieval more permissive.

Memory storage and retrieval improvements:

The record_verdict.py script now stores the winning agent and task description in vector memory after each verdict, enabling future matches to benefit from past outcomes. This uses the embed function and the Memory class to create and store a MemoryEntry. Errors during storage are caught and logged as warnings. [1] [2] [3]
The Memory.store method now checks for duplicates before inserting a new memory entry, preventing redundant data storage for the same agent and task combination.

Configuration changes:

The similarity thresholds for specialist and ensemble memory retrieval have been lowered (similarity_specialist_threshold from 0.9 to 0.5, and similarity_ensemble_threshold from 0.7 to 0.3) to make memory lookups more permissive and likely to return results.

- Lower similarity thresholds from 0.9/0.7 to 0.5/0.3 so memory actually influences agent selection (real similarities are 0.3-0.5) - Add dedup check in memory.store() to prevent duplicate task+winner entries - Store memory entries from free-mode matches via record_verdict.py so /mobius-run competitions feed back into the selection system Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves Mobius’s vector-memory feedback loop so past match outcomes influence future agent selection more reliably, including when verdicts are recorded via the free-mode record_verdict.py script.

Changes:

Lowered strategy selection similarity thresholds to make memory retrieval more permissive.
Added duplicate prevention in Memory.store() to avoid repeated task+winner entries.
Extended record_verdict.py to store match outcomes into vector memory after recording a verdict.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/mobius/memory.py`	Adds a duplicate check before inserting new memory entries.
`src/mobius/config.py`	Lowers memory similarity thresholds used by the selection strategy.
`.claude/skills/mobius-judge/scripts/record_verdict.py`	Stores winner + task into memory after verdict recording.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/mobius/memory.py

+        existing = self.conn.execute(
+            "SELECT id FROM memory WHERE task_text = ? AND winning_agent_id = ?",
+            (entry.task_text, entry.winning_agent_id),
+        ).fetchone()
+        if existing:


.claude/skills/mobius-judge/scripts/record_verdict.py

+            task_vec = embed(task_text, config)
+            memory_entry = MemoryEntry(
+                task_embedding=vec_to_blob(task_vec),
+                task_text=task_text,
+                winning_agent_id=full_winner_id,
+                score=max(scores.values()) if scores else 0.0,
+            )
+            memory.store(memory_entry)


.claude/skills/mobius-judge/scripts/record_verdict.py

+    # Store in vector memory so future selections benefit
+    task_text = match.get("task_description", "")
+    if task_text and full_winner_id:
+        try:
+            task_vec = embed(task_text, config)


All four providers (Anthropic, Google, OpenAI, OpenRouter) were hardcoded to max_tokens=4096, causing output truncation on complex tasks. Models support much higher limits (Haiku 8k, Sonnet/GPT-4o 16k, Gemini 65k). Bump to 16384 which is safe for all models — APIs naturally cap at each model's actual limit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 15, 2026 23:32

Copilot started reviewing on behalf of AaronGoldsmith March 15, 2026 23:33 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory system: lower thresholds, add dedup, store from free-mode#8

Fix memory system: lower thresholds, add dedup, store from free-mode#8
AaronGoldsmith wants to merge 2 commits intomainfrom
fix/memory-thresholds-and-dedup

AaronGoldsmith commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AaronGoldsmith commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants