Dramatically optimize algorithm in the common case by excluding match…#89
Dramatically optimize algorithm in the common case by excluding match…#89epatey wants to merge 2 commits intojflinter:masterfrom
Conversation
…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.
|
I haven't had a chance to actually measure the perf difference, but you could replace with something like: if it makes a perf difference. |
|
So I did the perf test. After I added the proper I still don't have an instinct around what operations drop you out of the lazy domain. |
|
|
||
| internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) { | ||
| let minTotalCount = min(lhs.count, rhs.count) | ||
| let matchingHeadCount = zip(lhs, rhs).lazy.prefix() { $0.0 == $0.1 }.count() |
There was a problem hiding this comment.
These two lines feel a bit too clever for their own good - it's hard for me to understand what they're doing at a quick glance. I think using a couple of plain old for loops would be preferable here.
| /// Namespace for the `diff` and `apply` functions. | ||
| public enum Dwifft { | ||
|
|
||
| internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) { |
There was a problem hiding this comment.
Even though it's an internal function, can you please add a docstring to this method to help with future debugging etc?
| /// - rhs: another, uh, array | ||
| /// - Returns: the series of transformations that, when applied to `lhs`, will yield `rhs`. | ||
| public static func diff<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> [DiffStep<Value>] { | ||
| let (matchingHeadCount, lhs, rhs) = matchingEndsInfo(lhs, rhs) |
There was a problem hiding this comment.
Nitpicking, but can you not shadow the lhs and rhs variable names here? Made it harder for me to understand how this compiled at first glance...
…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.