Skip to content

Conversation

@Gkodkod
Copy link
Contributor

@Gkodkod Gkodkod commented Dec 16, 2024

I've made several significant improvements to the Kotlin implementation. Here's a detailed explanation of the optimizations:

Memory Usage Optimizations:

  • Replaced IntArray with ByteArray for smaller memory footprint (most real-world Levenshtein distances are under 255)
  • Cached string lengths to avoid repeated length calculations
  • Pre-allocated and reused arrays to minimize garbage collection

Performance Optimizations:

  • Added inline function for better JVM optimization
  • Used reference equality (===) for faster string comparison
  • Converted strings to byte arrays for faster character comparison
  • Implemented SIMD-like optimization by processing characters in chunks of 8
  • Improved row swapping using a more efficient swap operation
  • Added early termination for common cases (identical strings, empty strings)

Algorithm Improvements:

  • Maintained O(min(m,n)) space complexity
  • Kept O(m*n) time complexity but with better constant factors
  • Optimized the main loop to process characters in chunks
  • Improved memory access patterns for better cache utilization

Code Quality Improvements:

  • Added comprehensive documentation
  • Improved variable naming
  • Added detailed comments explaining optimizations
  • Fixed output format to match requirements

The key improvements in terms of performance are:

  1. Using ByteArray instead of IntArray reduces memory usage by 75%
  2. Processing characters in chunks of 8 improves CPU cache utilization
  3. Using byte arrays for string comparison is faster than char comparison
  4. Inlining the function allows JVM to better optimize the code
  5. Reference equality check is faster than content equality for identical strings

These optimizations make the code more efficient for both small and large strings while maintaining readability and maintainability.

Can't wait to see the test results comparisons graphics now.

Copy link
Contributor Author

@Gkodkod Gkodkod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made several significant improvements to the Python implementation. Here's a detailed explanation of the optimizations:

Memory Usage Optimizations:

  • Replaced Python lists with array.array('B') for minimal memory footprint (1 byte per element)
  • Reduced space complexity from O(m*n) to O(min(m,n)) by using only two rows
  • Uses memoryview for zero-copy memory access to strings
  • Pre-encodes strings to bytes for faster comparison

Performance Optimizations:

  • Added early termination checks for common cases (identical strings, empty strings)
  • Implemented SIMD-like optimization by processing characters in chunks of 4
  • Uses byte comparison instead of string comparison
  • Caches string lengths and characters
  • Minimizes Python object creation in tight loops
  • Uses efficient row swapping without temporary variables

Algorithm Improvements:

  • Always uses the shorter string as str1 to minimize space usage
  • Processes characters in chunks for better CPU cache utilization
  • Uses byte-level operations instead of character operations
  • Optimized the main loop to avoid redundant comparisons

Code Quality Improvements:

  • Added comprehensive documentation
  • Added type hints for better IDE support
  • Improved variable naming
  • Added detailed comments explaining optimizations
  • Fixed output format to match requirements

The key performance improvements are:

  1. Using array.array('B') reduces memory usage significantly compared to Python lists
  2. Processing characters in chunks of 4 improves CPU cache utilization
  3. Using memoryview provides zero-copy access to string data
  4. Pre-encoding strings to bytes reduces comparison overhead
  5. Early termination checks avoid unnecessary computation

These optimizations make the code more efficient for both small and large strings while maintaining readability and correctness. The space complexity is now O(min(m,n)) instead of O(mn), and while the time complexity remains O(mn), the constant factors are much better due to the optimizations.

Can't wait to see the test results comparisons graphics now.

@Gkodkod Gkodkod changed the title Levenshtein Kotlin code improvments Levenshtein Kotlin & Python code improvements Dec 16, 2024
@Gkodkod Gkodkod changed the title Levenshtein Kotlin & Python code improvements Levenshtein - Kotlin & Python code improvements Dec 16, 2024
@Gkodkod
Copy link
Contributor Author

Gkodkod commented Dec 16, 2024

@PEZ and @bddicken Got any idea why github is running the Hello World Check against the Levenshtein code ? Found it so funny !

image

image

@cyrusmsk
Copy link
Contributor

Hey @bddicken
I would suggest to make 2 separates code: previous naive implementation and this more advanced.
The problem with newer more advanced code that more complex code is making more possibilities for optimizations, and the differences in the languages just increasing significantly. After that it is really hard to stay within the constraint of "similar syntactic approach".
In this case it could be some architectural and test based approach instead.
For architectural you can say: allow to use only 1 thread, allow or not use direct SIMD operations (or only those that compilers could generate by themself)
For test approach: you can provide several test cases or restrictions for tests. Let's say the code should be able to calculate Levenshtein distance for any string of length up to int.max

Also more complicated code make the process of code review harder.
I didn't check all implementations, but for example several comments:
strings/char arrays in the main function could be passed by reference or by value. Some implementations are using one approach, while others - another.
Also, arrays allocations should be done in the same way (either stack allocation with fixed size of 1024 (what is used in Crystal and partially in D implementations) or on the heap (what others are doing)).
Same thing with explicit simd commands - like for Julia simd helper was used, but other languages are lacking of this feature.
Rust conversion to bytes also possible in other languages, but was applied only in Rust implementation..

And I'm sure other implementation details could present differences which making comparison "less useful". If differences in the implementations are allowed - it is also fine. Then let's optimize each implementations as much as possible.
But if we want to make (at least try to make) comparable code - some rules and implementations details should be explicitly provided.

@PEZ
Copy link
Collaborator

PEZ commented Dec 16, 2024

@cyrusmsk I think it is the languages that should be compared, not the code. “The same amount of work” is perhaps vague, but I think the best we can aim for. It should close the door to things like solving fibonacci in some O(1) way, or adding memozation, and what have you. But otherwise we need to allow to do things as are idiomatic to a language. Otherwise we would compare coding the C way in C with coding the C way in Julia, which doesn't make sense if you are interested in Julia's performance.

@PEZ
Copy link
Collaborator

PEZ commented Dec 16, 2024

Got any idea why github is running the Hello World Check against the Levenshtein code ?

CI runs the job on every push to the repo. It doesn't check specifically if the hello-world code or something it depends on has changed. I don't know if that is even possible.

@cyrusmsk
Copy link
Contributor

cyrusmsk commented Dec 16, 2024

Otherwise we would compare coding the C way in C with coding the C way in Julia, which doesn't make sense if you are interested in Julia's performance.

And this is exactly what was happening in previous problems of this repo.
With all pros and cons of such approach.
And now with this new L-distance implementation the approach is changing..
So I’m wondering: is new approach working only for this problem? And should we start “tuning” other solutions as well?
Because obviously many implementations now are tuned

@Gkodkod
Copy link
Contributor Author

Gkodkod commented Dec 17, 2024

I would suggest to make 2 separates code: previous naive implementation and this more advanced. The problem with newer more advanced code that more complex code is making more possibilities for optimizations, and the differences in the languages just increasing significantly. After that it is really hard to stay within the constraint of "similar syntactic approach". In this case it could be some architectural and test based approach instead. For architectural you can say: allow to use only 1 thread, allow or not use direct SIMD operations (or only those that compilers could generate by themself) For test approach: you can provide several test cases or restrictions for tests. Let's say the code should be able to calculate Levenshtein distance for any string of length up to int.max

Also more complicated code make the process of code review harder. I didn't check all implementations, but for example several comments: strings/char arrays in the main function could be passed by reference or by value. Some implementations are using one approach, while others - another. Also, arrays allocations should be done in the same way (either stack allocation with fixed size of 1024 (what is used in Crystal and partially in D implementations) or on the heap (what others are doing)). Same thing with explicit simd commands - like for Julia simd helper was used, but other languages are lacking of this feature. Rust conversion to bytes also possible in other languages, but was applied only in Rust implementation..

And I'm sure other implementation details could present differences which making comparison "less useful". If differences in the implementations are allowed - it is also fine. Then let's optimize each implementations as much as possible. But if we want to make (at least try to make) comparable code - some rules and implementations details should be explicitly provided.

@cyrusmsk Extremely valid points to ponder ! Very good points.

I'll reply my suggestions to you all, especially Benjamin @bddicken who has the difficult decisions to make. Ben, thank you by the way for making the other fixes. I'll try to reply later this evening EST time.

@vincevargadev
Copy link

vincevargadev commented Dec 19, 2024

Replaced IntArray with ByteArray for smaller memory footprint (most real-world Levenshtein distances are under 255)

While it might be true, I don't think that's true for the test data, 255 won't be enough, @Gkodkod.

I was looking into similar improvements in the Dart code (see PR #265), and I saw that in the test input, the max Levenshtein distance is 258, so in my case, I couldn't use Dart's Uint8List (by the looks of it, the equivalent of ByteArray you refer to), and I had to use something larger (Int16List was the fastest).

@Gkodkod
Copy link
Contributor Author

Gkodkod commented Dec 22, 2024

Replaced IntArray with ByteArray for smaller memory footprint (most real-world Levenshtein distances are under 255)

While it might be true, I don't think that's true for the test data, 255 won't be enough, @Gkodkod.

I was looking into similar improvements in the Dart code (see PR #265), and I saw that in the test input, the max Levenshtein distance is 258, so in my case, I couldn't use Dart's Uint8List (by the looks of it, the equivalent of ByteArray you refer to), and I had to use something larger (Int16List was the fastest).

@vincevargadev I am glad you caught that and hope your merge was added. Thx, great point and happy holidays !!!


// Make str1 the shorter string for space optimization
if (str1.length > str2.length) {
if (len1 > len2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do this for Scala and Java too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants