-
Notifications
You must be signed in to change notification settings - Fork 5
Highlight sentence in results #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
refact(settings): extensions options are generated by a setting method chore(settings): - default chunk_size equals to the model context window - increase FTS weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements sentence-level highlighting in search results by splitting chunks into sentences, generating embeddings for them, and using semantic search to identify the most relevant sentences within matching chunks. This provides more precise result snippets and better context for users.
Key changes:
- Added sentence splitting functionality with offset tracking for chunk text positioning
- Extended database schema and processing pipeline to store and search sentence embeddings
- Updated search results to include top-ranked sentences with their offsets for highlighting
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_settings.py | Updated test assertions to use renamed other_model_options setting field |
| tests/test_sentence_splitter.py | Added comprehensive tests for new sentence splitting functionality |
| tests/test_engine.py | Updated Engine tests with sentence_splitter dependency and added sentence processing tests |
| tests/test_chunker.py | Fixed variable naming from excinfo to exc_info for consistency |
| tests/integration/test_engine.py | Moved search tests to integration suite and added sentence search test cases |
| tests/conftest.py | Updated engine fixture to include SentenceSplitter dependency |
| src/sqlite_rag/sqliterag.py | Integrated sentence splitting and search into main search workflow |
| src/sqlite_rag/settings.py | Refactored settings with renamed fields, new methods for context/vector options, and sentence configuration |
| src/sqlite_rag/sentence_splitter.py | New module implementing sentence splitting with offset tracking |
| src/sqlite_rag/repository.py | Extended to persist sentence embeddings to database |
| src/sqlite_rag/models/sentence_result.py | New model for sentence search results |
| src/sqlite_rag/models/sentence.py | New model representing a sentence with embedding and offsets |
| src/sqlite_rag/models/document_result.py | Added sentences field to include sentence results |
| src/sqlite_rag/models/document.py | Fixed type hint from string literal to direct Chunk reference |
| src/sqlite_rag/models/chunk.py | Added sentences field and improved comment clarity |
| src/sqlite_rag/formatters.py | Implemented sentence-based preview generation and display formatting |
| src/sqlite_rag/engine.py | Added sentence processing, search_sentences method, and quantization support |
| src/sqlite_rag/database.py | Added sentences table and vector initialization |
| src/sqlite_rag/cli.py | Updated search command defaults and help text |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
aee25fd to
8dbae68
Compare
Avoid to fetch the entire chunk to extract the content
8601e8a to
50430d7
Compare
50430d7 to
c9ee5dd
Compare
No description provided.