Skip to content

Assorted optimizations #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 25, 2018
Merged

Assorted optimizations #85

merged 7 commits into from
Jan 25, 2018

Conversation

josharian
Copy link
Contributor

Overall performance impact from this PR:

name                         old time/op    new time/op    delta
DiffCommonPrefix-8              147ns ± 1%     146ns ± 1%   -0.59%  (p=0.037 n=8+8)
DiffCommonSuffix-8              161ns ± 1%     155ns ± 1%   -3.23%  (p=0.000 n=9+8)
CommonLength/prefix/empty-8    4.08ns ± 2%    3.77ns ± 1%   -7.80%  (p=0.000 n=9+10)
CommonLength/prefix/short-8    5.29ns ± 2%    5.01ns ± 1%   -5.30%  (p=0.000 n=10+9)
CommonLength/prefix/long-8      604ns ± 2%     613ns ± 2%   +1.42%  (p=0.005 n=10+10)
CommonLength/suffix/empty-8    3.82ns ± 1%    3.68ns ± 1%   -3.82%  (p=0.000 n=10+10)
CommonLength/suffix/short-8    6.36ns ± 1%    5.90ns ± 1%   -7.15%  (p=0.000 n=9+10)
CommonLength/suffix/long-8     1.13µs ± 1%    0.91µs ± 1%  -19.72%  (p=0.000 n=9+10)
DiffHalfMatch-8                 107µs ± 1%     108µs ± 0%   +0.70%  (p=0.000 n=10+10)
DiffCleanupSemantic-8          11.7ms ± 2%     0.9ms ± 1%  -92.15%  (p=0.000 n=10+10)
DiffMain-8                      1.01s ± 0%     1.01s ± 0%   +0.19%  (p=0.000 n=9+9)
DiffMainLarge-8                 138ms ± 1%     102ms ± 2%  -26.20%  (p=0.000 n=8+10)
DiffMainRunesLargeLines-8       721µs ± 1%     515µs ± 1%  -28.56%  (p=0.000 n=9+10)

name                         old alloc/op   new alloc/op   delta
DiffCommonPrefix-8              0.00B          0.00B          ~     (all equal)
DiffCommonSuffix-8              0.00B          0.00B          ~     (all equal)
CommonLength/prefix/empty-8     0.00B          0.00B          ~     (all equal)
CommonLength/prefix/short-8     0.00B          0.00B          ~     (all equal)
CommonLength/prefix/long-8      0.00B          0.00B          ~     (all equal)
CommonLength/suffix/empty-8     0.00B          0.00B          ~     (all equal)
CommonLength/suffix/short-8     0.00B          0.00B          ~     (all equal)
CommonLength/suffix/long-8      0.00B          0.00B          ~     (all equal)
DiffHalfMatch-8                 106kB ± 0%     106kB ± 0%     ~     (all equal)
DiffCleanupSemantic-8          17.7MB ± 0%     0.2MB ± 0%  -99.08%  (p=0.000 n=6+9)
DiffMain-8                     16.4MB ± 0%    16.4MB ± 0%   -0.00%  (p=0.000 n=10+10)
DiffMainLarge-8                63.8MB ± 0%     4.8MB ± 0%  -92.46%  (p=0.000 n=10+7)
DiffMainRunesLargeLines-8       209kB ± 0%     174kB ± 0%  -16.82%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
DiffCommonPrefix-8               0.00           0.00          ~     (all equal)
DiffCommonSuffix-8               0.00           0.00          ~     (all equal)
CommonLength/prefix/empty-8      0.00           0.00          ~     (all equal)
CommonLength/prefix/short-8      0.00           0.00          ~     (all equal)
CommonLength/prefix/long-8       0.00           0.00          ~     (all equal)
CommonLength/suffix/empty-8      0.00           0.00          ~     (all equal)
CommonLength/suffix/short-8      0.00           0.00          ~     (all equal)
CommonLength/suffix/long-8       0.00           0.00          ~     (all equal)
DiffHalfMatch-8                  2.00 ± 0%      2.00 ± 0%     ~     (all equal)
DiffCleanupSemantic-8           11.4k ± 0%      1.1k ± 0%  -90.22%  (p=0.000 n=10+10)
DiffMain-8                       89.0 ± 0%      84.0 ± 0%   -5.62%  (p=0.000 n=10+10)
DiffMainLarge-8                 55.3k ± 0%     46.3k ± 0%  -16.35%  (p=0.000 n=10+8)
DiffMainRunesLargeLines-8       1.19k ± 0%     1.08k ± 0%   -9.33%  (p=0.001 n=8+9)

There's still some fairly low hanging fruit remaining after this, but I'm out of time for the moment. Hopefully this is enough of an impact nevertheless to attract a reviewer.

Please note that I may be slow (like days or weeks) to respond to review comments, particularly if they are substantive.

This PR is designed to be read and reviewed commit-by-commit.

cc @maksimov @zimmski

The existing benchmark is dominated by the string to []rune conversion.
It's good to have a benchmark that includes this, since it is part
of the exposed API. However, during a diff, the conversion cost
has been paid, and it is the core of the implementation that matters.
commonPrefixLength is mainly simplification.
It might seem like it'd be better to determine
up front which string is shorter, so as to
eliminate a conditional from the inner loop.
But the compiler isn't currently smart enough
to do bounds check elimination in that case,
so we end up with a conditional anyway.
And these branches are highly predictable anyway,
and many calls to commonPrefixLength are for
short shared prefixes, so opt for simpler code.

The commonSuffixLength improvements come mainly
from eliminating the subtraction in the inner loop.

name                         old time/op  new time/op  delta
DiffCommonPrefix-8            146ns ± 1%   145ns ± 3%   -0.77%  (p=0.011 n=15+15)
DiffCommonSuffix-8            159ns ± 2%   153ns ± 2%   -3.49%  (p=0.000 n=15+15)
CommonLength/prefix/empty-8  4.03ns ± 2%  3.74ns ± 4%   -7.11%  (p=0.000 n=14+15)
CommonLength/prefix/short-8  5.29ns ± 1%  4.69ns ± 2%  -11.25%  (p=0.000 n=14+14)
CommonLength/prefix/long-8    603ns ± 2%   608ns ± 2%     ~     (p=0.050 n=15+15)
CommonLength/suffix/empty-8  3.82ns ± 2%  3.66ns ± 3%   -4.22%  (p=0.000 n=14+15)
CommonLength/suffix/short-8  6.36ns ± 2%  5.90ns ± 2%   -7.21%  (p=0.000 n=15+14)
CommonLength/suffix/long-8   1.14µs ± 3%  0.90µs ± 2%  -20.79%  (p=0.000 n=15+15)
The old code was a cute one-liner,
but it allocated needlessly, quite a lot.
Replace it with slightly more careful code
that only allocates and copies as necessary.


name                       old time/op    new time/op    delta
DiffCommonPrefix-8            135ns ± 2%     133ns ± 1%     ~     (p=0.213 n=10+9)
DiffCommonSuffix-8            142ns ± 3%     141ns ± 2%     ~     (p=0.173 n=10+9)
DiffHalfMatch-8               107µs ± 0%     107µs ± 0%     ~     (p=0.400 n=9+9)
DiffCleanupSemantic-8        11.4ms ± 1%     0.9ms ± 1%  -91.72%  (p=0.000 n=10+9)
DiffMain-8                    1.01s ± 0%     1.01s ± 0%     ~     (p=0.780 n=10+9)
DiffMainLarge-8               134ms ± 1%     101ms ± 4%  -24.45%  (p=0.000 n=9+9)
DiffMainRunesLargeLines-8     707µs ± 0%     681µs ± 2%   -3.61%  (p=0.000 n=9+10)

name                       old alloc/op   new alloc/op   delta
DiffCommonPrefix-8            0.00B          0.00B          ~     (all equal)
DiffCommonSuffix-8            0.00B          0.00B          ~     (all equal)
DiffHalfMatch-8               106kB ± 0%     106kB ± 0%     ~     (all equal)
DiffCleanupSemantic-8        17.7MB ± 0%     0.3MB ± 0%  -98.56%  (p=0.000 n=9+9)
DiffMain-8                   16.4MB ± 0%    16.4MB ± 0%   -0.00%  (p=0.000 n=10+10)
DiffMainLarge-8              63.8MB ± 0%     4.8MB ± 0%  -92.42%  (p=0.000 n=9+10)
DiffMainRunesLargeLines-8     209kB ± 0%     175kB ± 0%  -16.54%  (p=0.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
DiffCommonPrefix-8             0.00           0.00          ~     (all equal)
DiffCommonSuffix-8             0.00           0.00          ~     (all equal)
DiffHalfMatch-8                2.00 ± 0%      2.00 ± 0%     ~     (all equal)
DiffCleanupSemantic-8         11.4k ± 0%      3.1k ± 0%  -72.46%  (p=0.000 n=10+10)
DiffMain-8                     89.0 ± 0%      83.0 ± 0%   -6.74%  (p=0.000 n=10+10)
DiffMainLarge-8               55.3k ± 0%     46.5k ± 0%  -15.94%  (p=0.000 n=10+10)
DiffMainRunesLargeLines-8     1.19k ± 0%     1.09k ± 0%   -8.60%  (p=0.000 n=10+8)
Nested appends generally lead to
unnecessary allocation and copying.

Unwind them in diffCompute.

Replace them in DiffCleanupSemantics with splice.

DiffHalfMatch-8               107µs ± 0%     108µs ± 1%   +0.70%  (p=0.000 n=10+10)
DiffCleanupSemantic-8        1.00ms ± 2%    0.97ms ± 0%   -2.71%  (p=0.000 n=10+8)
DiffMain-8                    1.01s ± 0%     1.01s ± 0%     ~     (p=0.236 n=8+9)
DiffMainLarge-8               110ms ± 1%     110ms ± 1%     ~     (p=1.000 n=9+8)
DiffMainRunesLargeLines-8     692µs ± 1%     693µs ± 1%     ~     (p=0.762 n=10+8)

name                       old alloc/op   new alloc/op   delta
DiffHalfMatch-8               106kB ± 0%     106kB ± 0%     ~     (all equal)
DiffCleanupSemantic-8         255kB ± 0%     177kB ± 0%  -30.76%  (p=0.000 n=9+10)
DiffMain-8                   16.4MB ± 0%    16.4MB ± 0%     ~     (all equal)
DiffMainLarge-8              4.84MB ± 0%    4.81MB ± 0%   -0.57%  (p=0.000 n=10+10)
DiffMainRunesLargeLines-8     175kB ± 0%     174kB ± 0%   -0.34%  (p=0.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
DiffHalfMatch-8                2.00 ± 0%      2.00 ± 0%     ~     (all equal)
DiffCleanupSemantic-8         3.13k ± 0%     3.12k ± 0%   -0.06%  (p=0.000 n=10+10)
DiffMain-8                     83.0 ± 0%      83.0 ± 0%     ~     (all equal)
DiffMainLarge-8               46.5k ± 0%     46.3k ± 0%   -0.41%  (p=0.000 n=10+10)
DiffMainRunesLargeLines-8     1.09k ± 0%     1.08k ± 0%   -0.83%  (p=0.000 n=9+10)
Calling time.Now() is a bit expensive to do in a tight loop.
Check it only every 16th time.
This results in an average 0.16% time overrun when the deadline is hit,
which seems a small price to pay for up to a 25% speedup on the
actual work.

name                       old time/op    new time/op    delta
DiffHalfMatch-8               108µs ± 1%     108µs ± 0%     ~     (p=0.211 n=10+9)
DiffCleanupSemantic-8         973µs ± 0%     971µs ± 1%     ~     (p=0.673 n=8+9)
DiffMain-8                    1.01s ± 0%     1.01s ± 0%   +0.16%  (p=0.003 n=9+10)
DiffMainLarge-8               110ms ± 1%     102ms ± 3%   -7.44%  (p=0.000 n=8+10)
DiffMainRunesLargeLines-8     693µs ± 1%     515µs ± 1%  -25.70%  (p=0.000 n=8+9)

name                       old alloc/op   new alloc/op   delta
DiffHalfMatch-8               106kB ± 0%     106kB ± 0%     ~     (all equal)
DiffCleanupSemantic-8         177kB ± 0%     177kB ± 0%     ~     (all equal)
DiffMain-8                   16.4MB ± 0%    16.4MB ± 0%     ~     (all equal)
DiffMainLarge-8              4.81MB ± 0%    4.81MB ± 0%     ~     (p=0.764 n=10+9)
DiffMainRunesLargeLines-8     174kB ± 0%     174kB ± 0%   +0.01%  (p=0.014 n=10+10)

name                       old allocs/op  new allocs/op  delta
DiffHalfMatch-8                2.00 ± 0%      2.00 ± 0%     ~     (all equal)
DiffCleanupSemantic-8         3.12k ± 0%     3.12k ± 0%     ~     (all equal)
DiffMain-8                     83.0 ± 0%      83.0 ± 0%     ~     (all equal)
DiffMainLarge-8               46.3k ± 0%     46.3k ± 0%     ~     (p=1.000 n=10+10)
DiffMainRunesLargeLines-8     1.08k ± 0%     1.08k ± 0%     ~     (p=0.211 n=10+10)
This is easier to follow and more idiomatic.
It also offers a minor overall performance boost.

name                       old time/op    new time/op    delta
DiffHalfMatch-8               107µs ± 1%     108µs ± 0%   +0.91%  (p=0.000 n=9+9)
DiffCleanupSemantic-8         968µs ± 1%     921µs ± 1%   -4.87%  (p=0.000 n=9+10)
DiffMain-8                    1.01s ± 0%     1.01s ± 0%     ~     (p=0.353 n=10+10)
DiffMainLarge-8               102ms ± 2%     101ms ± 1%     ~     (p=0.842 n=10+9)
DiffMainRunesLargeLines-8     515µs ± 1%     515µs ± 1%     ~     (p=0.400 n=9+10)

name                       old alloc/op   new alloc/op   delta
DiffHalfMatch-8               106kB ± 0%     106kB ± 0%     ~     (all equal)
DiffCleanupSemantic-8         177kB ± 0%     163kB ± 0%   -7.81%  (p=0.000 n=10+10)
DiffMain-8                   16.4MB ± 0%    16.4MB ± 0%   +0.00%  (p=0.000 n=9+10)
DiffMainLarge-8              4.81MB ± 0%    4.81MB ± 0%     ~     (p=1.000 n=10+10)
DiffMainRunesLargeLines-8     174kB ± 0%     174kB ± 0%     ~     (p=0.810 n=10+10)

name                       old allocs/op  new allocs/op  delta
DiffHalfMatch-8                2.00 ± 0%      2.00 ± 0%     ~     (all equal)
DiffCleanupSemantic-8         3.12k ± 0%     1.11k ± 0%  -64.48%  (p=0.000 n=10+10)
DiffMain-8                     83.0 ± 0%      84.0 ± 0%   +1.20%  (p=0.000 n=9+10)
DiffMainLarge-8               46.3k ± 0%     46.3k ± 0%   -0.08%  (p=0.000 n=10+10)
DiffMainRunesLargeLines-8     1.08k ± 0%     1.08k ± 0%     ~     (all equal)
@sergi
Copy link
Owner

sergi commented Jan 25, 2018

Hi @josharian,

Thanks so much for this PR, this is great work. My apologies for getting back to you so late, but better late than never!

@sergi sergi self-assigned this Jan 25, 2018
Copy link
Owner

@sergi sergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great optimizations!

@sergi sergi merged commit f3948f6 into sergi:master Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants