Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 635002 - sys-fs/zfs-kmod: ZFS on Linux reflink potentially causes empty/sparse files to be installed
Summary: sys-fs/zfs-kmod: ZFS on Linux reflink potentially causes empty/sparse files t...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Richard Yao (RETIRED)
URL: https://github.com/zfsonlinux/zfs/iss...
Whiteboard:
Keywords:
Depends on:
Blocks: 635020
  Show dependency tree
 
Reported: 2017-10-21 15:50 UTC by Chris Henhawke
Modified: 2020-01-17 04:44 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Henhawke 2017-10-21 15:50:50 UTC
Hello, I've been asked by ryao to post a gentoo bug for this issue which was reported by myself and a few others on the ZFS on Linux github.  It appears that there may be an issue with reflinks during package installation, which can lead to empty/sparse files being written when the package is installed.  My personal experience is that this can occur very randomly, however other reporters can trigger this more frequently.

Reproducible: Sometimes

Steps to Reproduce:
1. Install a gentoo system on ZFS with / and /var/tmp/portage on separate datasets
2. Install/upgrade packages

Actual Results:  
Things break because files wind up being filled with null bytes

Expected Results:  
Regular system operation

Users have reported that this appeared in the 0.7.x releases of ZFS, unsure if there were any changes to portage recently which may exacerbate this issue.
Comment 1 Zac Medico gentoo-dev 2017-10-21 20:48:07 UTC
Please strace it, in order to check whether it's using copy_file_range or sendfile. For example, see bug 621994, comment #5.
Comment 2 Chris Henhawke 2017-10-22 00:21:09 UTC
Looks like sendfile, last few lines...

strace python -c 'from portage.util.file_copy import copyfile; copyfile("/var/tmp/portage/test.file", "/usr/lib64/test.file")' &> strace.log

> lseek(4, 0, SEEK_DATA)                  = 0
> lseek(4, 0, SEEK_HOLE)                  = 6
> lseek(4, 0, SEEK_SET)                   = 0
> sendfile(5, 4, [0] => [6], 6)           = 6
> lseek(4, 6, SEEK_DATA)                  = -1 ENXIO (No such device or address)
> lseek(4, 0, SEEK_END)                   = 6
> ftruncate(5, 6)                         = 0
> close(5)
> close(4)
> rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fbc1a40eea0}, {sa_handler=0x7fbc1a76cf80, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fbc1a40eea0}, 8) = 0
> close(3)
> exit_group(0)
Comment 3 Zac Medico gentoo-dev 2017-10-23 00:54:47 UTC
We haven't had any issue involving sendfile with other filesystems, and the sendfile code is probably used by more users than anything else, since copy_file_range will return EXDEV for the most common configurations.

I'm guessing that there's a bug somewhere in the ZFS implementation of sendfile.
Comment 4 Zac Medico gentoo-dev 2017-10-23 01:06:53 UTC
It might be worth looking into how ZFS sendfile handles the *offset parameter in the even of an EINTR error (triggered by SIGSTOP/SIGCONT). If it increments *offset without actually copying the bytes, then that would cause the output file to contain null bytes for the range that was supposed to have been copied.
Comment 5 Zac Medico gentoo-dev 2017-10-23 01:18:42 UTC
Looking and the sendfile documentation, it looks like portage's interpretation of the *offset parameter is incorrect, since that variable represents the input file offset rather than the output file offset. I'll fix it to use sendfile
s return value to measure bytes written to the output file.
Comment 6 Zac Medico gentoo-dev 2017-10-23 02:24:42 UTC
You can try this patch from bug 635126:

https://patch-diff.githubusercontent.com/raw/gentoo/portage/pull/223.patch

Save that patch in /etc/portage/patches/sys-apps/portage/, and make sure you build portage with USE=native-extensions enabled.
Comment 7 Chris Henhawke 2017-10-23 23:58:49 UTC
A couple of the reporters said this didn't seem to help for them.  I can't tell either way.  Will keep testing.
Comment 8 Zac Medico gentoo-dev 2017-10-24 10:03:52 UTC
I have updated the patch to also use lseek to ensure that the output file is positioned at the correct offset.
Comment 9 Zac Medico gentoo-dev 2017-10-24 18:43:59 UTC
I've update the patch again, so that it no longer relies on the file offset of fd_in (eliminating some lseek calls).
Comment 10 George Diamantopoulos 2017-10-26 12:25:18 UTC
Unfortunately I'm still getting sparse files with v5 of Zac Medico's patch (/var/tmp/portage on separate ZFS dataset).
Comment 11 Zac Medico gentoo-dev 2017-10-27 20:53:34 UTC
(In reply to George Diamantopoulos from comment #10)
> Unfortunately I'm still getting sparse files with v5 of Zac Medico's patch
> (/var/tmp/portage on separate ZFS dataset).

A problem in the lseek SEEK_DATA/SEEK_HOLE implementation might cause this.
Comment 12 George Diamantopoulos 2017-11-01 12:48:33 UTC
I does appear to be a ZFS issue after all, there's a patch that fixes this for me at https://github.com/zfsonlinux/zfs/issues/3125#issuecomment-339747744 (applies cleanly as a user patch against zfs-kmod-0.7.2).
Comment 13 Chris Henhawke 2017-11-02 13:07:45 UTC
I think the ZFS folks have a patch that should be sufficient in conjunction with the patch you're proposing for portage.
Comment 14 Chris Henhawke 2017-11-15 18:22:33 UTC
https://github.com/zfsonlinux/zfs/commit/454365bbaacc153f98d2a3adaf33b13a6183d45d has been merged to the ZOL code base, we should be good now.
Comment 15 Chris Henhawke 2017-12-18 05:59:20 UTC
Hi, did this patch ever get added to portage?
Comment 16 Zac Medico gentoo-dev 2017-12-18 06:21:03 UTC
The portage patch for bug 635126 involving sendfile was released in portage-2.3.13.

It doesn't look like there's been an zfs-kmod version bump including the patch from https://github.com/zfsonlinux/zfs/issues/3125.
Comment 17 Chris Henhawke 2017-12-18 06:48:17 UTC
Fair enough, thanks for the confirmation.
Comment 18 Chris Henhawke 2019-05-26 06:17:50 UTC
Possible repeat on zfs 0.8.0 https://github.com/zfsonlinux/zfs/issues/8816

Disabling native-extensions seems to fix the issue

I haven't had the issue on my systems, but I haven't upgraded to kernel 5.x
Comment 19 Larry the Git Cow gentoo-dev 2019-05-29 22:15:03 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=abf3bbd8488b7eb5177cac9898ccc8ea6d963429

commit abf3bbd8488b7eb5177cac9898ccc8ea6d963429
Author:     Georgy Yakovlev <gyakovlev@gentoo.org>
AuthorDate: 2019-05-29 22:12:55 +0000
Commit:     Georgy Yakovlev <gyakovlev@gentoo.org>
CommitDate: 2019-05-29 22:14:29 +0000

    sys-fs/zfs-kmod: revbump 0.8.0 with critical patches
    
    Issue: https://github.com/zfsonlinux/zfs/issues/8816
    Issue: https://github.com/zfsonlinux/zfs/issues/8778
    Bug: https://bugs.gentoo.org/635002
    Package-Manager: Portage-2.3.67, Repoman-2.3.12
    Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org>

 ....0_Fix_integer_overflow_in_get_next_chunk.patch |  32 ++++
 .../zfs-kmod/files/0.8.0_revert_Report_holes.patch |  53 ++++++
 sys-fs/zfs-kmod/zfs-kmod-0.8.0-r1.ebuild           | 178 +++++++++++++++++++++
 3 files changed, 263 insertions(+)
Comment 20 Georgy Yakovlev archtester gentoo-dev 2019-06-15 08:58:31 UTC
We no longer have affected zfs version in the tree.