⚡️ Speed up function is_inline_element by 303%
#7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 303% (3.03x) speedup for
is_inline_elementinunstructured/partition/html/transformations.py⏱️ Runtime :
999 microseconds→248 microseconds(best of120runs)📝 Explanation and details
The optimization achieves a 302% speedup by moving constant list/tuple creation out of the function and using more efficient Python operations.
Key optimizations applied:
Moved constants to module level: The
inline_classesandinline_categoriescollections are now defined once at module load time as tuples, eliminating the overhead of recreating these collections on every function call. The line profiler shows this eliminated ~1.1ms of overhead (31.9% + 39.7% of original runtime).Replaced
any()with directisinstance(): Instead of using a generator expression withany()to check class membership, the code now usesisinstance(ontology_element, inline_classes)directly with a tuple of classes. This is more efficient becauseisinstance()can natively handle tuple arguments.Replaced
any()withinoperator: The second check now usesontology_element.elementType in inline_categoriesinstead of a generator expression withany(). Theinoperator on tuples is optimized at the C level and significantly faster than generator-based iteration.Used tuples instead of lists: Tuples are slightly more memory-efficient and faster for membership testing than lists, especially for small collections.
Performance impact in context:
Based on the function reference,
is_inline_element()is called within a loop incan_unstructured_elements_be_merged()when processing HTML elements. Since HTML parsing often involves checking many elements, this optimization provides substantial benefits in document processing pipelines where this function may be called hundreds or thousands of times.Test case insights:
The optimization shows consistent 150-340% speedups across all test scenarios, with particularly strong performance on large-scale tests (295-338% faster) where the constant overhead elimination compounds. Both basic element type checking and class inheritance checking benefit significantly, making this optimization valuable for diverse HTML parsing workloads.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-is_inline_element-mjccmo8hand push.