Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 834913 - dev-util/hdiffpatch - fast binary diff/patch tool with multi-threading (alternative to dev-util/xdelta)
Summary: dev-util/hdiffpatch - fast binary diff/patch tool with multi-threading (alter...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Default Assignee for New Packages
URL: https://github.com/sisong/HDiffPatch
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-10 12:00 UTC by Patrick
Modified: 2022-06-05 09:41 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick 2022-03-10 12:00:22 UTC
https://github.com/sisong/HDiffPatch

I've compared the tool to xdelta, and it's light years ahead. Would be great to have it in Gentoo :)
Comment 1 Jonas Stein gentoo-dev 2022-03-10 23:06:20 UTC
Patrick, could you explain/list what is so much better? 

Note: 
* No other distribution has this package for now.
  https://repology.org/projects/?search=hdiffpatch

* Upstream works on tickets in Chinese language: 
  https://github.com/sisong/HDiffPatch/issues
Comment 2 Patrick 2022-03-11 08:31:46 UTC
Hi Jonas, happy to :)

I wanted to create a diff from one large Docker image (22 GB) to another (23 GB), and expected a result in the low single-digit GB range, ideally even <1 GB, because I know lots of layers had been rearranged, but the actual data wasn't all that different.

So I tried xdelta and got the following results:

Command line                                       Diff (GB)    Time (h:mm:ss)
xdelta -e                                               8       0:50:29
xdelta -e -B524288000 -W16777216 -I0 -9                 7.7     1:22:03

not happy with the result and processing time, and also seeing that it only used 1/8 of my cores, I went to search for better binary diff tools and found hdiffpatch, which produced the following results:

Command line                                       Diff (GB)    Time
hdiffz -s-64m -p-8 -c-pzlib                             6       0:05:34
hdiffz -s-64k -p-8 -c-pzlib                             2.9     0:03:53
hdiffz -s-64 -p-8 -c-pzlib                              2.4     0:13:06  (>6GB RAM usage)
hdiffz -s-4k -p-8 -c-lzma2                              1.8     0:13:48  (~1.2 GB RAM usage)

I was quite happy with the last result and also the time it took to create and apply the diff (which took 2m15s).

I also found it peculiar that the author has lots of code comments in Chinese, and is maintaining his own tickets in Chinese as well. He's probably not very comfortable in English, but still seems to be making an effort (README, help text etc. is all in decent English, and when someone opens a ticket in English, he replies in English as well.

Given that the first commits are from 2013 I also was surprised to not find it packaged in any distro, yet. But then again, he provides pre-built binaries that just work without any deps, so I guess whoever needed the tool just directly took his binaries...

I've tried to build from source (plain Makefile) but encountered an issue which was easy to fix. I've reported it just now - see https://github.com/sisong/HDiffPatch/issues/279
Comment 3 Patrick 2022-03-11 13:41:21 UTC
(The problem I had building it was due to bzip2 lib + headers not being available on my Ubuntu box, btw.)