Fix memory system: lower thresholds, add dedup, store from free-mode#8
Draft
AaronGoldsmith wants to merge 2 commits intomainfrom
Draft
Fix memory system: lower thresholds, add dedup, store from free-mode#8AaronGoldsmith wants to merge 2 commits intomainfrom
AaronGoldsmith wants to merge 2 commits intomainfrom
Conversation
- Lower similarity thresholds from 0.9/0.7 to 0.5/0.3 so memory actually influences agent selection (real similarities are 0.3-0.5) - Add dedup check in memory.store() to prevent duplicate task+winner entries - Store memory entries from free-mode matches via record_verdict.py so /mobius-run competitions feed back into the selection system Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves Mobius’s vector-memory feedback loop so past match outcomes influence future agent selection more reliably, including when verdicts are recorded via the free-mode record_verdict.py script.
Changes:
- Lowered strategy selection similarity thresholds to make memory retrieval more permissive.
- Added duplicate prevention in
Memory.store()to avoid repeated task+winner entries. - Extended
record_verdict.pyto store match outcomes into vector memory after recording a verdict.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/mobius/memory.py |
Adds a duplicate check before inserting new memory entries. |
src/mobius/config.py |
Lowers memory similarity thresholds used by the selection strategy. |
.claude/skills/mobius-judge/scripts/record_verdict.py |
Stores winner + task into memory after verdict recording. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+39
to
+43
| existing = self.conn.execute( | ||
| "SELECT id FROM memory WHERE task_text = ? AND winning_agent_id = ?", | ||
| (entry.task_text, entry.winning_agent_id), | ||
| ).fetchone() | ||
| if existing: |
Comment on lines
+133
to
+140
| task_vec = embed(task_text, config) | ||
| memory_entry = MemoryEntry( | ||
| task_embedding=vec_to_blob(task_vec), | ||
| task_text=task_text, | ||
| winning_agent_id=full_winner_id, | ||
| score=max(scores.values()) if scores else 0.0, | ||
| ) | ||
| memory.store(memory_entry) |
Comment on lines
+129
to
+133
| # Store in vector memory so future selections benefit | ||
| task_text = match.get("task_description", "") | ||
| if task_text and full_winner_id: | ||
| try: | ||
| task_vec = embed(task_text, config) |
All four providers (Anthropic, Google, OpenAI, OpenRouter) were hardcoded to max_tokens=4096, causing output truncation on complex tasks. Models support much higher limits (Haiku 8k, Sonnet/GPT-4o 16k, Gemini 65k). Bump to 16384 which is safe for all models — APIs naturally cap at each model's actual limit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request enhances the verdict recording script and memory management in the Mobius project. The main improvements involve storing match outcomes in vector memory for future retrieval, preventing duplicate memory entries, and adjusting similarity thresholds to make memory retrieval more permissive.
Memory storage and retrieval improvements:
record_verdict.pyscript now stores the winning agent and task description in vector memory after each verdict, enabling future matches to benefit from past outcomes. This uses theembedfunction and theMemoryclass to create and store aMemoryEntry. Errors during storage are caught and logged as warnings. [1] [2] [3]Memory.storemethod now checks for duplicates before inserting a new memory entry, preventing redundant data storage for the same agent and task combination.Configuration changes:
similarity_specialist_thresholdfrom 0.9 to 0.5, andsimilarity_ensemble_thresholdfrom 0.7 to 0.3) to make memory lookups more permissive and likely to return results.