-
Notifications
You must be signed in to change notification settings - Fork 334
Levenshtein - Kotlin & Python code improvements #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Levenshtein - Kotlin & Python code improvements #257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made several significant improvements to the Python implementation. Here's a detailed explanation of the optimizations:
Memory Usage Optimizations:
- Replaced Python lists with
array.array('B')for minimal memory footprint (1 byte per element) - Reduced space complexity from O(m*n) to O(min(m,n)) by using only two rows
- Uses
memoryviewfor zero-copy memory access to strings - Pre-encodes strings to bytes for faster comparison
Performance Optimizations:
- Added early termination checks for common cases (identical strings, empty strings)
- Implemented SIMD-like optimization by processing characters in chunks of 4
- Uses byte comparison instead of string comparison
- Caches string lengths and characters
- Minimizes Python object creation in tight loops
- Uses efficient row swapping without temporary variables
Algorithm Improvements:
- Always uses the shorter string as str1 to minimize space usage
- Processes characters in chunks for better CPU cache utilization
- Uses byte-level operations instead of character operations
- Optimized the main loop to avoid redundant comparisons
Code Quality Improvements:
- Added comprehensive documentation
- Added type hints for better IDE support
- Improved variable naming
- Added detailed comments explaining optimizations
- Fixed output format to match requirements
The key performance improvements are:
- Using
array.array('B')reduces memory usage significantly compared to Python lists - Processing characters in chunks of 4 improves CPU cache utilization
- Using
memoryviewprovides zero-copy access to string data - Pre-encoding strings to bytes reduces comparison overhead
- Early termination checks avoid unnecessary computation
These optimizations make the code more efficient for both small and large strings while maintaining readability and correctness. The space complexity is now O(min(m,n)) instead of O(mn), and while the time complexity remains O(mn), the constant factors are much better due to the optimizations.
Can't wait to see the test results comparisons graphics now.
|
Hey @bddicken Also more complicated code make the process of code review harder. And I'm sure other implementation details could present differences which making comparison "less useful". If differences in the implementations are allowed - it is also fine. Then let's optimize each implementations as much as possible. |
|
@cyrusmsk I think it is the languages that should be compared, not the code. “The same amount of work” is perhaps vague, but I think the best we can aim for. It should close the door to things like solving fibonacci in some O(1) way, or adding memozation, and what have you. But otherwise we need to allow to do things as are idiomatic to a language. Otherwise we would compare coding the C way in C with coding the C way in Julia, which doesn't make sense if you are interested in Julia's performance. |
CI runs the job on every push to the repo. It doesn't check specifically if the hello-world code or something it depends on has changed. I don't know if that is even possible. |
And this is exactly what was happening in previous problems of this repo. |
@cyrusmsk Extremely valid points to ponder ! Very good points. I'll reply my suggestions to you all, especially Benjamin @bddicken who has the difficult decisions to make. Ben, thank you by the way for making the other fixes. I'll try to reply later this evening EST time. |
While it might be true, I don't think that's true for the test data, 255 won't be enough, @Gkodkod. I was looking into similar improvements in the Dart code (see PR #265), and I saw that in the test input, the max Levenshtein distance is 258, so in my case, I couldn't use Dart's |
@vincevargadev I am glad you caught that and hope your merge was added. Thx, great point and happy holidays !!! |
|
|
||
| // Make str1 the shorter string for space optimization | ||
| if (str1.length > str2.length) { | ||
| if (len1 > len2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do this for Scala and Java too.


I've made several significant improvements to the Kotlin implementation. Here's a detailed explanation of the optimizations:
Memory Usage Optimizations:
IntArraywithByteArrayfor smaller memory footprint (most real-world Levenshtein distances are under 255)Performance Optimizations:
inlinefunction for better JVM optimization===) for faster string comparisonAlgorithm Improvements:
Code Quality Improvements:
The key improvements in terms of performance are:
ByteArrayinstead ofIntArrayreduces memory usage by 75%These optimizations make the code more efficient for both small and large strings while maintaining readability and maintainability.
Can't wait to see the test results comparisons graphics now.