fix(retrieve): don't let length normalization suppress long-document recall#26
Merged
Merged
Conversation
…recall stage_length_normalization divided each result's fused score by (1 + log2(content_len/500)).max(1.0), penalizing long memories up to ~4x (a 4KB note /4). Cosine similarity is already length-invariant, so this double-penalized length and buried detailed runbooks/inventories under short, less-relevant entries: the ourmem#1 vector hit (0.658) was demoted to ~0.16 and dropped from the top-k. Disable it — length must not suppress recall in a memory store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
CI here is red only because of pre-existing |
Contributor
|
Merged and deployed — agreed that double-penalizing length makes no sense for a recall store. Long detailed runbooks should rank higher, not lower. 👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
stage_length_normalizationdivides each fused result's score by(1 + log2(content_len / 500)).max(1.0)— roughly ÷2 at 1 KB, ÷3 at 2 KB, ÷4 at 4 KB. But the vector half of the fused score is cosine similarity, which is already length-invariant, so this double-penalizes length. A long, highly-relevant memory can be the top vector hit yet be demoted below short, weakly-relevant ones and fall out of the top-k entirely.Observed in practice: a ~4 KB document was the #1 vector match (cosine ≈ 0.66) but, after the ÷4 penalty, dropped out of results entirely — long, detailed notes (runbooks, inventories, writeups) were systematically unrecallable while short notes worked fine.
Fix
Disable the length penalty. Cosine already normalizes for length, and BM25's own length normalization is internal to the FTS scorer; an extra global divide-by-log(length) on the fused score just suppresses recall of long content.
Test
Updated
test_length_normalizationto assert length-invariance (a long memory keeps the same fused score as a short one).