Open
Conversation
Collaborator
Author
|
@greptile |
Collaborator
Author
|
@greptile |
chinaux
reviewed
Mar 23, 2026
| auto collection = result.value(); | ||
| uint64_t doc_count{collection->Stats().value().doc_count}; | ||
| ASSERT_EQ(doc_count, (num_batches + 500) * batch_size); | ||
| for (uint64_t doc_id = 0; doc_id < doc_count; doc_id++) { |
Collaborator
There was a problem hiding this comment.
上面也有类似的逻辑,建议抽取为辅助函数或者加点注释
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Collaborator
Author
|
@greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Greptile Summary
This PR implements crash recovery for collection optimization by cleaning up stale index files left from a previous interrupted run, alongside a suite of typo/rename fixes and a new end-to-end crash-during-optimize test.
Key changes:
segment.cc): Before building a vector index file, the code now checks whether the file already exists (as crash residue from a prior incomplete optimization) and removes it before re-creating it. This preventsmerge_vector_indexerfrom operating on a potentially partial/corrupt file after recovery.FileExists+RemoveFileguard is applied to theQuantizeType::UNDEFINEDbranch and the flat-vector sub-branch of the quantized path, but is missing from the non-RABITQ quantized index path (MakeQuantizeVectorIndexPath, line ~1715) and the RABITQ quantized index path (line ~1787). A crash during quantized index construction would leave residue that is silently handed tomerge_vector_indexeron the next recovery run.CreateComapctTask→CreateCompactTask,new_segmnet_meta→new_segment_meta,indice→index,create→create_if_missing, and a log message typo fix — all consistent and correct.collection_optimizerbinary andoptimize_recovery_test.ccprovide an end-to-end crash-during-optimize scenario. The test itself has several robustness issues flagged in earlier review threads (flakyWIFSIGNALEDassertion, copy-paste insert loop instead of verify loop at line 185).Doc::ValueEqual(100× forfloat, 1000× fordouble) affects all production callers ofDoc::operator==, not only tests; no explanatory comment is present.Confidence Score: 2/5
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[create_vector_index called on recovery] --> B{QuantizeType == UNDEFINED?} B -- Yes --> C[allocate_block_id\nMakeVectorIndexPath] C --> D{FileExists?\n✅ GUARDED} D -- Yes --> E[LOG_WARN\nRemoveFile] D -- No --> F[merge_vector_indexer] E --> F B -- No --> G{metric_type changed\nOR indexers count != 1?} G -- Yes --> H[allocate_block_id\nMakeVectorIndexPath\nfor flat/raw] H --> I{FileExists?\n✅ GUARDED} I -- Yes --> J[LOG_WARN\nRemoveFile] I -- No --> K[merge_vector_indexer\nfor flat/raw] J --> K G -- No --> L[reuse existing\nraw_vector_provider] K --> M{QuantizeType == RABITQ?} L --> M M -- No --> N[allocate_block_id\nMakeQuantizeVectorIndexPath] N --> O{FileExists?\n❌ NOT GUARDED} O --> P[merge_vector_indexer\nfor quantized index] M -- Yes --> Q[Train RabitQ converter] Q --> R[allocate_block_id\nMakeQuantizeVectorIndexPath] R --> S{FileExists?\n❌ NOT GUARDED} S --> T[merge_vector_indexer\nfor RABITQ index] style D fill:#90EE90 style I fill:#90EE90 style O fill:#FFB6C1 style S fill:#FFB6C1Comments Outside Diff (6)
tests/db/crash_recovery/write_recovery_test.cc, line 129 (link)SetUp/TearDownreference stale path after renamedir_path_was renamed from"crash_test_db"to"write_recovery_test_db"in this PR, but bothSetUp(line 129) andTearDown(line 134) still attempt to delete the old"./crash_test_db"path. As a result:write_recovery_test_dbis never cleaned up between test runs, causing test pollution and potential false-positives on repeated runs.crash_test_dbis a no-op since that directory no longer exists.tests/db/crash_recovery/write_recovery_test.cc, line 134 (link)TearDownafter renameSame rename miss as in
SetUp. TheTearDownmust also reference the updated directory name.src/db/index/segment/segment.cc, line 1715-1718 (link)The PR adds
FileExists+RemoveFileguards before building vector index files at lines 1634 and 1678 (forMakeVectorIndexPath), but the corresponding quantized index files built viaMakeQuantizeVectorIndexPath— at this location (non-RabitQ, line ~1715) and the RabitQ path (~line 1787) — receive no such guard.The same crash scenario applies: if a crash occurs after the flat index block ID is committed to
segment_meta_but during the quantized index build,block_id_allocator_will be reset tomax_block_id + 1from the persisted meta on recovery, producing the samequant_block_idand therefore the same file path. The orphaned (and possibly corrupted) quantized index file will then be silently re-opened bymerge_vector_indexerrather than being detected and cleaned up.std::string index_file_path = FileHelper::MakeQuantizeVectorIndexPath( path_, column, segment_meta_->id(), quant_block_id); + if (FileHelper::FileExists(index_file_path)) { + LOG_WARN( + "Quantize index file[%s] already exists (possible crash residue); " + "cleaning and overwriting.", + index_file_path.c_str()); + FileHelper::RemoveFile(index_file_path); + } auto vector_indexer = merge_vector_indexer( index_file_path, column, *field_with_new_index_params, concurrency);The same fix should be applied to the RabitQ quantized index path at line ~1787.
src/db/index/common/doc.cc, line 1153-1166 (link)The tolerance for
floatvectors is widened 100× (from1e-6fto1e-4f) and fordoublevectors 1000× (from1e-9to1e-6).Doc::operator==— and by extension the crash-recovery integration tests — now accepts vectors whose components differ by up to ~0.0001 as "equal".While some precision loss during compaction/reconstruction is expected (e.g., quantisation round-trips, format conversions), a 100× relaxation is very coarse for
float32. Any data corruption that introduces per-element errors below1e-4fwill be silently accepted by the equality check, defeating the purpose of the data-integrity assertions in the new test.Consider tightening the tolerance back towards the original values, or — if a specific lossy path is known to cause up to this level of error — adding a comment explaining the maximum expected error and why it is acceptable, so future readers do not loosen it further.
src/db/index/segment/segment.cc, line 1715-1718 (link)The PR adds
FileHelper::FileExists/FileHelper::RemoveFileguards before creating the flat vector index at lines 1634 and 1678, but the analogous guard is absent forMakeQuantizeVectorIndexPathin the non-RABITQ quantize path here.If the optimizer crashes after
quant_block_idhas been allocated and the flat index file has been fully written, but during writing of the quantize index file, the quant file will be a partial crash residue. On the next recovery run,allocate_block_id()will produce the samequant_block_id(block counter recovered from persistent state), constructing the same path — butmerge_vector_indexeris called against the pre-existing partial file with no cleanup, potentially causing corruption or an incorrect index.The flat index path at line 1678 is protected; the quantize path should be too:
std::string index_file_path = FileHelper::MakeQuantizeVectorIndexPath( path_, column, segment_meta_->id(), quant_block_id); + if (FileHelper::FileExists(index_file_path)) { + LOG_WARN( + "Quantize index file[%s] already exists (possible crash residue); cleaning " + "and overwriting.", + index_file_path.c_str()); + FileHelper::RemoveFile(index_file_path); + } auto vector_indexer = merge_vector_indexer( index_file_path, column, *field_with_new_index_params, concurrency);src/db/index/segment/segment.cc, line 1715-1718 (link)The PR adds a
FileHelper::FileExists+FileHelper::RemoveFileguard before building index files in theQuantizeType::UNDEFINEDbranch (lines ~1634–1639) and the flat-vector sub-branch of the quantized case (lines ~1678–1683). However, the same guard is not applied here when building the quantized index file viaMakeQuantizeVectorIndexPath.If optimization crashes after the flat/raw vector index is successfully written but before this quantized index file is committed, the next recovery run will reallocate the same
quant_block_id, construct the same file path, and callmerge_vector_indexeron an already-existing (potentially partial) file. The resulting behavior—silently overwriting, failing, or producing a corrupt index—is undefined.The RABITQ path at line ~1787 has the same gap:
std::string index_file_path = FileHelper::MakeQuantizeVectorIndexPath( path_, column, segment_meta_->id(), quant_block_id); if (FileHelper::FileExists(index_file_path)) { LOG_WARN( "Index file[%s] already exists (possible crash residue); cleaning " "and overwriting.", index_file_path.c_str()); FileHelper::RemoveFile(index_file_path); } auto vector_indexer = merge_vector_indexer( index_file_path, column, *field_with_new_index_params, concurrency);Apply the same pattern at both the non-RABITQ quantized path (lines 1715–1718) and the RABITQ path (lines 1787–1790).
Reviews (4): Last reviewed commit: "Apply suggestion from @greptile-apps[bot..." | Re-trigger Greptile