-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
sort:fails to reorder binary lines containing NUL bytes #9306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…line tuning Add support for filtering non-printing and non-dictionary characters in sort keys, along with options to ignore case. Implement dynamic buffer size normalization and pipeline depth tuning based on user settings and file size for improved performance. Add new fields to LineData for caching filtered lines and UTF-8 data. This improves sort accuracy and efficiency for large file sorting scenarios.
Use utf8_cache to retrieve precomputed UTF-8 strings in fast lexicographic mode, falling back to standard from_utf8 conversion if not available. This reduces redundant UTF-8 validations and improves performance for repeated comparisons.
…function Format long chain method calls over multiple lines for better code clarity and maintainability.
…cations Refactored LineData to use a single Vec<u8> for filtered_lines_data and a Vec<(usize, usize)> for ranges, instead of Vec<Vec<u8>>. Updated build_filtered_line to append_filtered_line_to for appending. This reduces per-line allocations and improves memory efficiency in sorting operations.
… impl Use '_' instead of 'a' for the lifetime parameter in the impl block to simplify and modernize the code without changing behavior.
|
GNU testsuite comparison: |
CodSpeed Performance ReportMerging #9306 will degrade performance by 3.67%Comparing Summary
Benchmarks breakdown
Footnotes
|
…ache Remove the unused utf8_cache field from LineData struct and related code for caching UTF-8 strings during line parsing. This simplifies the lexicographic comparison logic in compare_by to always perform byte-level comparison, reducing code complexity and potential maintenance overhead without affecting sorting functionality. The previous cache was intended for faster lexical sorting but is no longer needed.
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
8a9e3fa to
d8c228b
Compare
…utils into sort_fix_rebased
|
GNU testsuite comparison: |
fix #9264