fix(retrieve): don't let length normalization suppress long-document recall by doctatortot · Pull Request #26 · ourmem/omem

doctatortot · 2026-05-21T21:12:35Z

Problem

stage_length_normalization divides each fused result's score by (1 + log2(content_len / 500)).max(1.0) — roughly ÷2 at 1 KB, ÷3 at 2 KB, ÷4 at 4 KB. But the vector half of the fused score is cosine similarity, which is already length-invariant, so this double-penalizes length. A long, highly-relevant memory can be the top vector hit yet be demoted below short, weakly-relevant ones and fall out of the top-k entirely.

Observed in practice: a ~4 KB document was the #1 vector match (cosine ≈ 0.66) but, after the ÷4 penalty, dropped out of results entirely — long, detailed notes (runbooks, inventories, writeups) were systematically unrecallable while short notes worked fine.

Fix

Disable the length penalty. Cosine already normalizes for length, and BM25's own length normalization is internal to the FTS scorer; an extra global divide-by-log(length) on the fused score just suppresses recall of long content.

Test

Updated test_length_normalization to assert length-invariance (a long memory keeps the same fused score as a short one).

…recall stage_length_normalization divided each result's fused score by (1 + log2(content_len/500)).max(1.0), penalizing long memories up to ~4x (a 4KB note /4). Cosine similarity is already length-invariant, so this double-penalized length and buried detailed runbooks/inventories under short, less-relevant entries: the ourmem#1 vector hit (0.658) was demoted to ~0.16 and dropped from the top-k. Disable it — length must not suppress recall in a memory store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

doctatortot · 2026-05-21T21:22:27Z

CI here is red only because of pre-existing rustfmt drift on main (the fmt step fails before build/clippy/test can run). The files changed in this PR are fmt-clean. #27 fixes that drift — once it merges, a rebase onto main turns this green.

yhyyz · 2026-05-22T00:43:49Z

Merged and deployed — agreed that double-penalizing length makes no sense for a recall store. Long detailed runbooks should rank higher, not lower. 👍

yhyyz merged commit 4025e2a into ourmem:main May 22, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(retrieve): don't let length normalization suppress long-document recall#26

fix(retrieve): don't let length normalization suppress long-document recall#26
yhyyz merged 1 commit into
ourmem:mainfrom
doctatortot:pr/length-norm

doctatortot commented May 21, 2026 •

edited

Loading

Uh oh!

doctatortot commented May 21, 2026

Uh oh!

Uh oh!

yhyyz commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

doctatortot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test

Uh oh!

doctatortot commented May 21, 2026

Uh oh!

Uh oh!

yhyyz commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

doctatortot commented May 21, 2026 •

edited

Loading