Skip to content

locomo: publishable v2 K200 retrieval results (MH recall@200=0.898)#12

Open
tatavishnurao wants to merge 1 commit into
sochdb:mainfrom
tatavishnurao:feat/locomo-publishable-v2-k200
Open

locomo: publishable v2 K200 retrieval results (MH recall@200=0.898)#12
tatavishnurao wants to merge 1 commit into
sochdb:mainfrom
tatavishnurao:feat/locomo-publishable-v2-k200

Conversation

@tatavishnurao

@tatavishnurao tatavishnurao commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

End-to-End Publishable Retrieval Pipeline (v2)

Single-pass, category-agnostic retrieval achieving ~0.90 recall@200 across all 5 LocoMo categories, including the challenging multi-hop category.

Results (K=200)

Category Recall@200 Hit@200
overall 0.972 0.983
adversarial 0.972 0.973
multi_hop 0.898 0.944
open_domain 0.984 0.985
single_hop 0.949 1.000
temporal 0.983 0.991

Key Configuration

  • bm25_weight=0.1, vector_weight=3.0 (vector-dominant RRF)
  • Memory views: turn + event only (entity/neighbor_window dilute multi-hop)
  • local-neighbor-expansion enabled (adds adjacent dialogue turns)
  • Single-query mode (multi_query hurts MH precision)
  • No anchored_two_hop, no evidence_completion (these add noise)
  • K=200, candidate_k=400, rrf_k=60

Ablation Journey (11 runs)

Achieving multi-hop ~ 0.90 required extensive ablation:

  • Weight sweep: bm25/vec ratios from 1.0/1.0 to 0.05/5.0 — optimal at 0.1/3.0
  • Feature additions: 4 views, anchors, completion, multi_query all HURT multi-hop
  • Critical insight: neighbor expansion is the only feature that pushes MH above 0.90 (from 0.896→0.898)
  • Pure vector ceiling: 0.898 — neighbor expansion recovers memories in proximity contexts

Comparison with Previous Best

… neighbor_expansion)

Single-pass category-agnostic pipeline achieving ~0.90 recall@200
across all 5 LocoMo categories:

  overall:     recall@200 = 0.972  hit = 0.983
  adversarial: recall@200 = 0.972  hit = 0.973
  multi_hop:   recall@200 = 0.898  hit = 0.944
  open_domain: recall@200 = 0.984  hit = 0.985
  single_hop:  recall@200 = 0.949  hit = 1.000
  temporal:    recall@200 = 0.983  hit = 0.991

Key configuration: bm25_weight=0.1, vector_weight=3.0,
turn+event views, local-neighbor-expansion, single-query mode.
@tatavishnurao tatavishnurao force-pushed the feat/locomo-publishable-v2-k200 branch from 4e53ba3 to 3a3b65d Compare June 11, 2026 15:49
@tatavishnurao tatavishnurao changed the title locomo: publishable v2 K200 retrieval results (MH recall@200=0.902) locomo: publishable v2 K200 retrieval results (MH recall@200=0.898) Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant