-
Notifications
You must be signed in to change notification settings - Fork 219
Assorted optimizations #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The existing benchmark is dominated by the string to []rune conversion. It's good to have a benchmark that includes this, since it is part of the exposed API. However, during a diff, the conversion cost has been paid, and it is the core of the implementation that matters.
commonPrefixLength is mainly simplification. It might seem like it'd be better to determine up front which string is shorter, so as to eliminate a conditional from the inner loop. But the compiler isn't currently smart enough to do bounds check elimination in that case, so we end up with a conditional anyway. And these branches are highly predictable anyway, and many calls to commonPrefixLength are for short shared prefixes, so opt for simpler code. The commonSuffixLength improvements come mainly from eliminating the subtraction in the inner loop. name old time/op new time/op delta DiffCommonPrefix-8 146ns ± 1% 145ns ± 3% -0.77% (p=0.011 n=15+15) DiffCommonSuffix-8 159ns ± 2% 153ns ± 2% -3.49% (p=0.000 n=15+15) CommonLength/prefix/empty-8 4.03ns ± 2% 3.74ns ± 4% -7.11% (p=0.000 n=14+15) CommonLength/prefix/short-8 5.29ns ± 1% 4.69ns ± 2% -11.25% (p=0.000 n=14+14) CommonLength/prefix/long-8 603ns ± 2% 608ns ± 2% ~ (p=0.050 n=15+15) CommonLength/suffix/empty-8 3.82ns ± 2% 3.66ns ± 3% -4.22% (p=0.000 n=14+15) CommonLength/suffix/short-8 6.36ns ± 2% 5.90ns ± 2% -7.21% (p=0.000 n=15+14) CommonLength/suffix/long-8 1.14µs ± 3% 0.90µs ± 2% -20.79% (p=0.000 n=15+15)
The old code was a cute one-liner, but it allocated needlessly, quite a lot. Replace it with slightly more careful code that only allocates and copies as necessary. name old time/op new time/op delta DiffCommonPrefix-8 135ns ± 2% 133ns ± 1% ~ (p=0.213 n=10+9) DiffCommonSuffix-8 142ns ± 3% 141ns ± 2% ~ (p=0.173 n=10+9) DiffHalfMatch-8 107µs ± 0% 107µs ± 0% ~ (p=0.400 n=9+9) DiffCleanupSemantic-8 11.4ms ± 1% 0.9ms ± 1% -91.72% (p=0.000 n=10+9) DiffMain-8 1.01s ± 0% 1.01s ± 0% ~ (p=0.780 n=10+9) DiffMainLarge-8 134ms ± 1% 101ms ± 4% -24.45% (p=0.000 n=9+9) DiffMainRunesLargeLines-8 707µs ± 0% 681µs ± 2% -3.61% (p=0.000 n=9+10) name old alloc/op new alloc/op delta DiffCommonPrefix-8 0.00B 0.00B ~ (all equal) DiffCommonSuffix-8 0.00B 0.00B ~ (all equal) DiffHalfMatch-8 106kB ± 0% 106kB ± 0% ~ (all equal) DiffCleanupSemantic-8 17.7MB ± 0% 0.3MB ± 0% -98.56% (p=0.000 n=9+9) DiffMain-8 16.4MB ± 0% 16.4MB ± 0% -0.00% (p=0.000 n=10+10) DiffMainLarge-8 63.8MB ± 0% 4.8MB ± 0% -92.42% (p=0.000 n=9+10) DiffMainRunesLargeLines-8 209kB ± 0% 175kB ± 0% -16.54% (p=0.000 n=10+10) name old allocs/op new allocs/op delta DiffCommonPrefix-8 0.00 0.00 ~ (all equal) DiffCommonSuffix-8 0.00 0.00 ~ (all equal) DiffHalfMatch-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) DiffCleanupSemantic-8 11.4k ± 0% 3.1k ± 0% -72.46% (p=0.000 n=10+10) DiffMain-8 89.0 ± 0% 83.0 ± 0% -6.74% (p=0.000 n=10+10) DiffMainLarge-8 55.3k ± 0% 46.5k ± 0% -15.94% (p=0.000 n=10+10) DiffMainRunesLargeLines-8 1.19k ± 0% 1.09k ± 0% -8.60% (p=0.000 n=10+8)
Nested appends generally lead to unnecessary allocation and copying. Unwind them in diffCompute. Replace them in DiffCleanupSemantics with splice. DiffHalfMatch-8 107µs ± 0% 108µs ± 1% +0.70% (p=0.000 n=10+10) DiffCleanupSemantic-8 1.00ms ± 2% 0.97ms ± 0% -2.71% (p=0.000 n=10+8) DiffMain-8 1.01s ± 0% 1.01s ± 0% ~ (p=0.236 n=8+9) DiffMainLarge-8 110ms ± 1% 110ms ± 1% ~ (p=1.000 n=9+8) DiffMainRunesLargeLines-8 692µs ± 1% 693µs ± 1% ~ (p=0.762 n=10+8) name old alloc/op new alloc/op delta DiffHalfMatch-8 106kB ± 0% 106kB ± 0% ~ (all equal) DiffCleanupSemantic-8 255kB ± 0% 177kB ± 0% -30.76% (p=0.000 n=9+10) DiffMain-8 16.4MB ± 0% 16.4MB ± 0% ~ (all equal) DiffMainLarge-8 4.84MB ± 0% 4.81MB ± 0% -0.57% (p=0.000 n=10+10) DiffMainRunesLargeLines-8 175kB ± 0% 174kB ± 0% -0.34% (p=0.000 n=10+10) name old allocs/op new allocs/op delta DiffHalfMatch-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) DiffCleanupSemantic-8 3.13k ± 0% 3.12k ± 0% -0.06% (p=0.000 n=10+10) DiffMain-8 83.0 ± 0% 83.0 ± 0% ~ (all equal) DiffMainLarge-8 46.5k ± 0% 46.3k ± 0% -0.41% (p=0.000 n=10+10) DiffMainRunesLargeLines-8 1.09k ± 0% 1.08k ± 0% -0.83% (p=0.000 n=9+10)
Calling time.Now() is a bit expensive to do in a tight loop. Check it only every 16th time. This results in an average 0.16% time overrun when the deadline is hit, which seems a small price to pay for up to a 25% speedup on the actual work. name old time/op new time/op delta DiffHalfMatch-8 108µs ± 1% 108µs ± 0% ~ (p=0.211 n=10+9) DiffCleanupSemantic-8 973µs ± 0% 971µs ± 1% ~ (p=0.673 n=8+9) DiffMain-8 1.01s ± 0% 1.01s ± 0% +0.16% (p=0.003 n=9+10) DiffMainLarge-8 110ms ± 1% 102ms ± 3% -7.44% (p=0.000 n=8+10) DiffMainRunesLargeLines-8 693µs ± 1% 515µs ± 1% -25.70% (p=0.000 n=8+9) name old alloc/op new alloc/op delta DiffHalfMatch-8 106kB ± 0% 106kB ± 0% ~ (all equal) DiffCleanupSemantic-8 177kB ± 0% 177kB ± 0% ~ (all equal) DiffMain-8 16.4MB ± 0% 16.4MB ± 0% ~ (all equal) DiffMainLarge-8 4.81MB ± 0% 4.81MB ± 0% ~ (p=0.764 n=10+9) DiffMainRunesLargeLines-8 174kB ± 0% 174kB ± 0% +0.01% (p=0.014 n=10+10) name old allocs/op new allocs/op delta DiffHalfMatch-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) DiffCleanupSemantic-8 3.12k ± 0% 3.12k ± 0% ~ (all equal) DiffMain-8 83.0 ± 0% 83.0 ± 0% ~ (all equal) DiffMainLarge-8 46.3k ± 0% 46.3k ± 0% ~ (p=1.000 n=10+10) DiffMainRunesLargeLines-8 1.08k ± 0% 1.08k ± 0% ~ (p=0.211 n=10+10)
This is easier to follow and more idiomatic. It also offers a minor overall performance boost. name old time/op new time/op delta DiffHalfMatch-8 107µs ± 1% 108µs ± 0% +0.91% (p=0.000 n=9+9) DiffCleanupSemantic-8 968µs ± 1% 921µs ± 1% -4.87% (p=0.000 n=9+10) DiffMain-8 1.01s ± 0% 1.01s ± 0% ~ (p=0.353 n=10+10) DiffMainLarge-8 102ms ± 2% 101ms ± 1% ~ (p=0.842 n=10+9) DiffMainRunesLargeLines-8 515µs ± 1% 515µs ± 1% ~ (p=0.400 n=9+10) name old alloc/op new alloc/op delta DiffHalfMatch-8 106kB ± 0% 106kB ± 0% ~ (all equal) DiffCleanupSemantic-8 177kB ± 0% 163kB ± 0% -7.81% (p=0.000 n=10+10) DiffMain-8 16.4MB ± 0% 16.4MB ± 0% +0.00% (p=0.000 n=9+10) DiffMainLarge-8 4.81MB ± 0% 4.81MB ± 0% ~ (p=1.000 n=10+10) DiffMainRunesLargeLines-8 174kB ± 0% 174kB ± 0% ~ (p=0.810 n=10+10) name old allocs/op new allocs/op delta DiffHalfMatch-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) DiffCleanupSemantic-8 3.12k ± 0% 1.11k ± 0% -64.48% (p=0.000 n=10+10) DiffMain-8 83.0 ± 0% 84.0 ± 0% +1.20% (p=0.000 n=9+10) DiffMainLarge-8 46.3k ± 0% 46.3k ± 0% -0.08% (p=0.000 n=10+10) DiffMainRunesLargeLines-8 1.08k ± 0% 1.08k ± 0% ~ (all equal)
Hi @josharian, Thanks so much for this PR, this is great work. My apologies for getting back to you so late, but better late than never! |
sergi
approved these changes
Jan 25, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great optimizations!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overall performance impact from this PR:
There's still some fairly low hanging fruit remaining after this, but I'm out of time for the moment. Hopefully this is enough of an impact nevertheless to attract a reviewer.
Please note that I may be slow (like days or weeks) to respond to review comments, particularly if they are substantive.
This PR is designed to be read and reviewed commit-by-commit.
cc @maksimov @zimmski