fix(search): resolve zero results bug and improve UX #20

prosdev · 2025-11-23T09:36:07Z

🐛 Problem

Search was indexing 566 documents but returning 0 results for all queries, making the semantic search feature completely non-functional.

Root Cause

LanceDB returns L2 distance (~1.0 for similar vectors), but our code incorrectly calculated:

score = 1 - distance = 1 - 1.0 = 0

All results scored 0 and were filtered out by the threshold.

✅ Solution

1. Fix Distance-to-Similarity Conversion

Use exponential decay: score = e^(-distance²)
Provides proper 0-1 range with good score distribution
Exact matches now score 70-90%, semantic matches 25-60%

2. Fix CLI Metadata Access

Changed metadata.file → metadata.path
Prevents undefined errors in search output

3. Add Comprehensive Tests

11 new integration tests for search functionality
Tests cover: stats, thresholds, limits, sorting, score ranges
Protects against regression

4. Update Documentation

Real working examples from testing
Threshold recommendations (0.7=precise, 0.25=exploratory)
Natural language query examples
Actual output with real scores

🧪 Verification

Before:

$ dev search "coordinator" --threshold 0.7
✖ Found 0 result(s)

After:

$ dev search "coordinator" --threshold 0.3
1. CoordinatorLogger (42.6% match)
2. Coordinator - The Central Nervous System (42.4% match)
3. CoordinatorLogger.info (35.6% match)
✔ Found 3 result(s)

Test Results:

✅ 11/11 integration tests passing
✅ Natural language queries working ("how do agents communicate" → 51.9% match)
✅ Exact term matching ("RepositoryIndexer" → 85.7% match)
✅ Technical concepts ("vector embeddings" → 58.5% match)

📊 Impact

Search Quality Examples:

Query	Top Result	Score
`RepositoryIndexer`	RepositoryIndexer.index	85.7%
`vector embeddings`	EmbeddingProvider	58.5%
`how do agents communicate`	Message Architecture	51.9%
`error handling`	Handle Errors Gracefully	39.3%

Score Interpretation:

70-90%: Exact matches, highly relevant
40-60%: Strong semantic matches
25-40%: Related concepts, exploratory
<20%: Weak matches

🔗 Commits (Atomic)

fix(search): Correct distance-to-similarity calculation
test(search): Add 11 comprehensive integration tests
chore: Update gitignore, remove tsx dependency
docs: Update READMEs with real examples

Each commit builds and tests independently.

🚀 Next Steps

Fix explore similar command (searches filename as text, not content)
Consider adding score calibration based on query length
Add search result caching for frequently used queries

Dogfooded on dev-agent itself: All examples are real searches on this repository! 🐕

- Fix L2 distance conversion in LanceDBVectorStore * Use exponential decay: score = e^(-distance^2) * Provides scores 0-1 range with better distribution * Fixes issue where all results were filtered out - Fix CLI metadata field reference * Changed metadata.file to metadata.path * Prevents 'undefined' errors in search output Fixes search returning 0 results despite indexed data.

- Add 11 integration tests for search functionality - Tests cover: stats, semantic search, thresholds, limits, sorting - Tests validate score ranges and metadata structure - Uses real indexed data from dev-agent repository Provides regression protection for search bug fixes.

- Add .dev-agent.json and .dev-agent/ to gitignore - Remove tsx devDependency (was only for debug script) - Update pnpm-lock.yaml

- Add practical Quick Start with npm link instructions - Include real search output from testing - Document threshold recommendations (0.7=precise, 0.25=exploratory) - Add explore command documentation - Show actual semantic search scores and results - Include pro tips for scripting and workflows Makes documentation reflect actual working functionality.

- Add 9 unit tests for distance-to-similarity conversion * Tests the core bug fix: score = e^(-distance²) * Validates score ranges, monotonic decrease, edge cases * Fast (<2ms), deterministic, always run in CI - Make integration tests skip in CI by default * Set RUN_INTEGRATION=true to run in CI if needed * Require pre-indexed data (.dev-agent/) * Run locally after `dev index .` Hybrid approach: Unit tests catch logic bugs, integration tests validate real behavior when available.

prosdev added 5 commits November 22, 2025 19:58

chore: update gitignore and remove tsx dependency

882feaa

- Add .dev-agent.json and .dev-agent/ to gitignore - Remove tsx devDependency (was only for debug script) - Update pnpm-lock.yaml

prosdev merged commit 4f9df34 into main Nov 23, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(search): resolve zero results bug and improve UX #20

fix(search): resolve zero results bug and improve UX #20

Uh oh!

prosdev commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(search): resolve zero results bug and improve UX #20

fix(search): resolve zero results bug and improve UX #20

Uh oh!

Conversation

prosdev commented Nov 23, 2025

🐛 Problem

Root Cause

✅ Solution

1. Fix Distance-to-Similarity Conversion

2. Fix CLI Metadata Access

3. Add Comprehensive Tests

4. Update Documentation

🧪 Verification

Before:

After:

Test Results:

📊 Impact

Search Quality Examples:

Score Interpretation:

🔗 Commits (Atomic)

🚀 Next Steps

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant