Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 550384 - sys-fs/duperemove fails to deduplicate
Summary: sys-fs/duperemove fails to deduplicate
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Michał Górny
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-25 22:00 UTC by junkmail
Modified: 2020-10-31 06:20 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Output of emerge --info (info.txt,16.10 KB, text/plain)
2015-05-25 22:00 UTC, junkmail
Details

Note You need to log in before you can comment on or make changes to this bug.
Description junkmail 2015-05-25 22:00:50 UTC
Created attachment 403956 [details]
Output of emerge --info

duperemove doesn't appear to deduplicate files as expected.

I made a small loop device to test brtfs deduplication, mounted at /mnt/gentoo

I formatted my /tmp partition with btrfs to test with a few files for deduplication.

It looks like duperemove is correctly identifying the files to deduplicate, but not actually doing it:

# duperemove -drh /tmp/test
Using 128K blocks
Using hash: SHA256  
Using 4 threads for file hashing phase
csum: /tmp/test/1/video.wmv     [1/3]
csum: /tmp/test/2/video.wmv     [2/3]
csum: /tmp/test/video.wmv       [3/3]
Hashed 84 blocks, resulting in 28 unique hashes. Calculating duplicate extents - this may take some time.
[########################################]
Search completed with no errors.             
Simple read and compare of file data found 1 instances of extents that might benefit from deduplication.
Start           Length          Filename (3 extents)
0.0     3.5M    "/tmp/test/video.wmv"
0.0     3.5M    "/tmp/test/1/video.wmv"
0.0     3.5M    "/tmp/test/2/video.wmv"
Dedupe 2 extents with target: (0.0, 3.5M), "/tmp/test/video.wmv"
Kernel processed data (excludes target files): 7.0M
Comparison of extent info shows a net change in shared extents of: 0.0

I've also tried doing formatting a loop device with brtfs, and achieved the same results.

Using dupremove version 0.09.3 and gentoo-sources 4.0.2
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-05-29 06:13:15 UTC
Sorry for asking.. but are you sure those files weren't deduped already by the fs? :)
Comment 2 junkmail 2015-05-31 08:21:12 UTC
Yes I'm sure. I can fill up the filesystem with the same file and after deduplication it's still full
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-09-02 09:21:46 UTC
I'm sorry for not being able to deal with this. Could you try if the current version works?
Comment 4 junkmail 2017-09-16 18:57:30 UTC
I tried it again duperemove-0.11_beta4 on kernel 4.13.2 and it sort of works, but not as I would expect.

I filled a 1000MiB loop device with one small video file (approx 25MB), and the free space as reported by KDiskFree went from 93.7MiB free to 185.8MiB free. There was nothing else on this device.
Comment 5 Kent Fredric (IRC: kent\n) (RETIRED) gentoo-dev 2020-09-21 03:48:09 UTC
(In reply to junkmail from comment #4)
> I tried it again duperemove-0.11_beta4 on kernel 4.13.2 and it sort of
> works, but not as I would expect.
> 
> I filled a 1000MiB loop device with one small video file (approx 25MB), and
> the free space as reported by KDiskFree went from 93.7MiB free to 185.8MiB
> free. There was nothing else on this device.

I'd suggest using a test target that is smaller, and has lower net entropy first.

High entropy seems to be the sort of thing that would thwart anything block oriented.

In that, I suspect it "partially works", but your test case is pushing its limits.

/me wants to try it himself, but currently blocked by bug #707792