⚡️ Speed up function _uniquity_file by 27%
#18
+5
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 27% (0.27x) speedup for
_uniquity_fileinunstructured/metrics/utils.py⏱️ Runtime :
8.23 milliseconds→6.48 milliseconds(best of250runs)📝 Explanation and details
The optimization achieves a 26% speedup by making two key changes to regex handling in
_uniquity_file:What was optimized:
re.match(pattern, f)topattern = re.compile(...); pattern.match(f)- compiling the regex once upfront instead of recompiling it for every file in the list.sorted([f for f in file_list if re.match(pattern, f)], key=_sorting_key)into two steps: first filter with list comprehension, then sort separately.Why this is faster:
Performance impact analysis:
From the line profiler, the critical line went from 31.9ms (75.3% of total time) to 16.1ms + 2.9ms = 19.0ms total (54.8% of total time) - a 40% improvement on the hottest code path.
When this optimization matters most:
Based on the annotated tests, the biggest gains occur with:
Function context impact:
Since
_uniquity_fileis called by_get_non_duplicated_filenamewhich processes directory listings, this optimization will significantly improve performance when dealing with directories containing many files, making file deduplication operations much faster in real-world scenarios.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
metrics/test_utils.py::test_uniquity_file🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_uniquity_file-mjckqvakand push.