Description
terinjokes@gmail.com
2023-11-12 02:42:12 UTC
Reporter mentioned on IRC they're using ZFS. Few immediate questions: * What version? zfs -V output please * Did you enable the new block cloning pool feature in ZFS 2.2? * What version of sys-apps/coreutils? (cp --version as well please) * Please try to grab the build.log by running e.g. PORTAGE_LOGDIR="/var/log/portage" emerge -v1 dev-lang/go. It should put a log in /var/log/portage/build or so when it's done. (You have to do this for Portage to save a "successful" build log.) Workaround: * It's likely that setting USE="-native-extensions" on sys-apps/portage will work. Notes: * If this is what I think it is, this is _not_ a Portage or Go bug, but is instead another version of an insidious ZFS bug which pops up every so often. * See also https://wiki.gentoo.org/wiki/User:Sam/Memorable_bugs_I_like_to_reference#Bugs_found_by_Portage.27s_native_file_copying. * You can see https://github.com/openzfs/zfs/issues/11900#issuecomment-927568640 onwards for one previous incarnation where it manifested *extremely* similarly (with chunks of Go being replaced by zeroes). Created attachment 874611 [details]
failing emerge with normal settings
Created attachment 874612 [details]
failing emerge without native-extensions
Thanks for taking a look. Your theory seems like it might be correct. * $ zfs --version zfs-2.2.0-r0-gentoo zfs-kmod-2.2.0-r0-gentoo * Yes, the zpool has been upgraded. * sys-apps/coreutils-9.3-r3 * $ cp --version cp (GNU coreutils) 9.3 Packaged by Gentoo (9.3-r3 (p0)) * Rebuilding portage with USE="-native-extensions" still results in corrupted files. Portage log attached. * Building with portage's TMPDIR on tmpfs does not exhibit corruption on multiple test runts. OK, that's consistent then (there's _two_ points of failure: 1) when the go build system runs `cp` within PORTAGE_TMPDIR, and 2) when Portage itself merges to the live filesystem fromp tmpdir (affected by native-extensions)). Given you can consistently hit this, and the machine I previously used to consistently hit the previous problem is not running ZFS right now, would you mind reporting it upstream? I'm happy to help with grabbing needed info and such but it's important we get it addressed, especially with someone who can easily reproduce it involved. Sure. Is there a good way to narrow it down to one of those two possibilities before I do so? (In reply to terinjokes@gmail.com from comment #6) > Sure. Is there a good way to narrow it down to one of those two > possibilities before I do so? (We discussed it on IRC after, ftr.) Already a bunch of people seeing the same behaviour on your bug as well at https://github.com/openzfs/zfs/issues/15526... just a +1 > * What version? zfs -V output please zfs-2.2.0-rc3 zfs-kmod-2.2.0-rc3 # uname -r 6.1.42-serv > * Did you enable the new block cloning pool feature in ZFS 2.2? # zpool get feature@block_cloning NAME PROPERTY VALUE SOURCE B100 feature@block_cloning active local <- we are here B102 feature@block_cloning enabled local > * What version of sys-apps/coreutils? (cp --version as well please) sys-apps/coreutils-9.4::gentoo USE="acl caps openssl xattr -gmp -hostname -kill -multicall (-nls) (-selinux) (-split-usr) -static -test -vanilla -verify-sig" cp (GNU coreutils) 9.4 Packaged by Gentoo (9.4 (p0)) > * Please try to grab the build.log by running e.g. PORTAGE_LOGDIR="/var/log/portage" emerge -v1 dev-lang/go. It should put a log in /var/log/portage/build or so when it's done. (You have to do this for Portage to save a "successful" build log.) > Workaround: > * It's likely that setting USE="-native-extensions" on sys-apps/portage will work. sys-apps/portage-3.0.55::gentoo USE="(ipc) xattr -apidoc -build -doc -gentoo-dev -native-extensions -rsync-verify (-selinux) -test" PYTHON_TARGETS="python3_11 -pypy3 -python3_10 -python3_12" # file /usr/lib/go/pkg/tool/linux_amd64/* | grep data /usr/lib/go/pkg/tool/linux_amd64/asm: data /usr/lib/go/pkg/tool/linux_amd64/cgo: data /usr/lib/go/pkg/tool/linux_amd64/compile: data /usr/lib/go/pkg/tool/linux_amd64/covdata: ELF 64-bit LSB executable, x86-64, version [...] not stripped /usr/lib/go/pkg/tool/linux_amd64/cover: data /usr/lib/go/pkg/tool/linux_amd64/link: data /usr/lib/go/pkg/tool/linux_amd64/vet: data Pretty sure this is the third time that me using a tmpfs-backed PORTAGE_TMPDIR has saved me from a random silent-data-corruption bug in ZFS. Is block_cloning a toggle somewhere? I know there's the pool feature flag, but is there a way to turn it off so it's not used anymore? I dug around in /proc and /sys, but I am not seeing a parameter file or anything that appears to control it. My FreeBSD systems have a sysctl tunable that controls whether block_cloning is used or not, even if the feature flag is enabled on the pool. (In reply to Joshua Kinard from comment #9) > Pretty sure this is the third time that me using a tmpfs-backed > PORTAGE_TMPDIR has saved me from a random silent-data-corruption bug in ZFS. > > Is block_cloning a toggle somewhere? I know there's the pool feature flag, > but is there a way to turn it off so it's not used anymore? I dug around in > /proc and /sys, but I am not seeing a parameter file or anything that > appears to control it. My FreeBSD systems have a sysctl tunable that > controls whether block_cloning is used or not, even if the feature flag is > enabled on the pool. A bit too tire, so can't find the issues rn. Pretty sure I read it was decided against a toggle in linux for 2.2.0. Might be introduced in 2.2.1. One workaround is downgrading to coreutils-8.32. (In reply to Joshua Kinard from comment #9) > Pretty sure this is the third time that me using a tmpfs-backed > PORTAGE_TMPDIR has saved me from a random silent-data-corruption bug in ZFS. > This won't fully help because c_f_r might be used when merging from PORTAGE_TMPDIR->live filesystem, but also, it could happen with anything else using c_f_r anyway. > Is block_cloning a toggle somewhere? I know there's the pool feature flag, > but is there a way to turn it off so it's not used anymore? I dug around in > /proc and /sys, but I am not seeing a parameter file or anything that > appears to control it. My FreeBSD systems have a sysctl tunable that > controls whether block_cloning is used or not, even if the feature flag is > enabled on the pool. There is a new toggle being added at https://github.com/openzfs/zfs/pull/15529 but note that this _isn't_ sufficient to prevent the corruption. See https://github.com/openzfs/zfs/issues/15526#issuecomment-1815457739 but also note that both this and the previous bug are ultimately to do with freshness of state in memory vs on disk and race conditions. The code is broken anyway and users can and will hit it. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=a6c49ddd0067b6e4a272a9b9c1f9ade21da535d9 commit a6c49ddd0067b6e4a272a9b9c1f9ade21da535d9 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-11-22 10:42:26 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-11-22 10:43:00 +0000 sys-fs/zfs: add 2.2.1 Note that it may not fix the issues reported entirely as the race still exists. Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs/Manifest | 2 + sys-fs/zfs/zfs-2.2.1.ebuild | 306 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 308 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e798aa5a89a092be0a82ed2302ada3d1b7951c21 commit e798aa5a89a092be0a82ed2302ada3d1b7951c21 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-11-22 10:41:57 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-11-22 10:43:00 +0000 sys-fs/zfs-kmod: add 2.2.1 Note that it may not fix the issues reported entirely as the race still exists. Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs-kmod/Manifest | 2 + sys-fs/zfs-kmod/zfs-kmod-2.2.1.ebuild | 217 ++++++++++++++++++++++++++++++++++ sys-fs/zfs-kmod/zfs-kmod-9999.ebuild | 2 +- 3 files changed, 220 insertions(+), 1 deletion(-) See https://github.com/openzfs/zfs/issues/15554 and in particular: * https://github.com/openzfs/zfs/issues/15554#issuecomment-1822154030 * https://github.com/openzfs/zfs/issues/15554#issuecomment-1822435731 as a summary of where we are. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=d6a9c7f40ffb7f393a707b6d0face1c2f39d3901 commit d6a9c7f40ffb7f393a707b6d0face1c2f39d3901 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-11-22 19:12:13 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-11-22 19:15:00 +0000 profiles: mask buggy zfs-2.2.0 Further bugs with CoW via copy_file_range (bug #917224, https://github.com/openzfs/zfs/issues/15526). The issue is very similar to bug #815469. ZFS 2.2.1 has a workaround but if you haven't already upgraded your pool to use the new block cloning feature, consider using <zfs-2.2 for now. Bug: https://github.com/openzfs/zfs/issues/15526 Bug: https://bugs.gentoo.org/815469 Bug: https://bugs.gentoo.org/91722 Signed-off-by: Sam James <sam@gentoo.org> profiles/package.mask | 8 ++++++++ 1 file changed, 8 insertions(+) Is it possible to detect which files/folders/etc. are damaged by executing zdb? (In reply to Mike from comment #15) > Is it possible to detect which files/folders/etc. are damaged by executing > zdb? Update: zdb pull request is still not merged: https://github.com/openzfs/zfs/pull/15541 It may be very helpful. (In reply to Mike from comment #16) > (In reply to Mike from comment #15) > > Is it possible to detect which files/folders/etc. are damaged by executing > > zdb? > > Update: zdb pull request is still not merged: > > https://github.com/openzfs/zfs/pull/15541 > > It may be very helpful. Useful script: https://github.com/0x5c/zfs-bclonecheck I suggest updating Gentoo news (eselect news list) after zdb pull request will be merged. All Gentoo users that are using zfs 2.2.* should receive news. Should execute zfs-bclonecheck and then manually re-create (e.g., re-emerge) all the corrupted files. (In reply to Mike from comment #17) Note that it's not comprehensive and there may be corrupted files not returned by it, as corruption can happen outside of cloning, it appears. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ea74809fc56791c2f45fc46815a7d5a8fd462961 commit ea74809fc56791c2f45fc46815a7d5a8fd462961 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-11-24 21:48:39 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-11-24 21:51:35 +0000 sys-fs/zfs-kmod: disable zfs_dmu_offset_next_sync tunable by default As a mitigation until more is understood and fixes are tested & reviewed, change the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before 05b3eb6d232009db247882a39d518e7282630753 upstream. There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1: that does not mean this is a cure or a real fix, but it _appears_ to be at least effective in reducing the chances of it happening. By itself, it's a safe change anyway, so it feels worth us doing while we wait. Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both 2.1.13 and 2.2.1. Bug: https://github.com/openzfs/zfs/issues/11900 Bug: https://github.com/openzfs/zfs/issues/15526 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> ...s_dmu_offset_next_sync-tunable-by-default.patch | 40 ++++ ...s_dmu_offset_next_sync-tunable-by-default.patch | 43 ++++ sys-fs/zfs-kmod/zfs-kmod-2.1.13-r1.ebuild | 178 +++++++++++++++++ sys-fs/zfs-kmod/zfs-kmod-2.2.1-r1.ebuild | 218 +++++++++++++++++++++ sys-fs/zfs-kmod/zfs-kmod-9999.ebuild | 1 + 5 files changed, 480 insertions(+) The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4301b22c2a2b3909bea574678b160ed4161c9009 commit 4301b22c2a2b3909bea574678b160ed4161c9009 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-11-24 22:13:18 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-11-24 22:13:18 +0000 sys-fs/zfs-kmod: stabilize 2.1.13-r1 for amd64, arm64, ppc64 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs-kmod/zfs-kmod-2.1.13-r1.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Adding a link to an article by The Register which gives a good overview of the bug, so I think it's quite suitable for the URL field. See also: zfs-2.2.2 patchset by tonyhutter · Pull Request #15602 · openzfs/zfs (The URL, which I can not yet post, includes 15602.) Patchset in the oven for OpenZFS 2.2.2: https://github.com/openzfs/zfs/pull/15602 2.1.14 and 2.2.2 are out now. They will be in tree shortly. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4caaee5dcb723d594ceae8fe4dc2f889ca13d0b0 commit 4caaee5dcb723d594ceae8fe4dc2f889ca13d0b0 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-12-01 03:25:33 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-12-01 03:25:33 +0000 sys-fs/zfs-kmod: add 2.2.2 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs-kmod/Manifest | 2 + sys-fs/zfs-kmod/zfs-kmod-2.2.2.ebuild | 217 ++++++++++++++++++++++++++++++++++ 2 files changed, 219 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f514cb6977d2532915365753e4be976b994acc4c commit f514cb6977d2532915365753e4be976b994acc4c Author: Sam James <sam@gentoo.org> AuthorDate: 2023-12-01 03:24:49 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-12-01 03:24:49 +0000 sys-fs/zfs: add 2.2.2 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs/Manifest | 2 + sys-fs/zfs/zfs-2.2.2.ebuild | 306 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 308 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=44de969fbb5705ebb658700ee0d5cc2da361a107 commit 44de969fbb5705ebb658700ee0d5cc2da361a107 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-12-01 03:21:52 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-12-01 03:21:52 +0000 sys-fs/zfs: add 2.1.14 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs/Manifest | 2 + sys-fs/zfs/zfs-2.1.14.ebuild | 311 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 313 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e29451c2bdb20d489bff977e1892fdf4f0582c6b commit e29451c2bdb20d489bff977e1892fdf4f0582c6b Author: Sam James <sam@gentoo.org> AuthorDate: 2023-12-01 03:21:34 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-12-01 03:21:44 +0000 sys-fs/zfs-kmod: add 2.1.14 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs-kmod/Manifest | 2 + sys-fs/zfs-kmod/zfs-kmod-2.1.14.ebuild | 177 +++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) I've been unable to reproduce this after upgrading to zfs-2.2.2 I think we're all done here. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=33c9e9ebd07e6d158e6064a267db3876f3a1f130 commit 33c9e9ebd07e6d158e6064a267db3876f3a1f130 Author: Sam James <sam@gentoo.org> AuthorDate: 2024-05-03 04:52:30 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-05-03 04:53:35 +0000 sys-fs/zfs-kmod: add 2.2.4 This release contains a fix for https://github.com/openzfs/zfs/issues/15933. Closes: https://bugs.gentoo.org/928518 Bug: https://bugs.gentoo.org/815469 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs-kmod/Manifest | 2 + sys-fs/zfs-kmod/zfs-kmod-2.2.4.ebuild | 217 ++++++++++++++++++++++++++++++++++ 2 files changed, 219 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=54cd669fae4c965242510044481ac5df0beaaba0 commit 54cd669fae4c965242510044481ac5df0beaaba0 Author: Sam James <sam@gentoo.org> AuthorDate: 2024-05-03 04:49:50 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-05-03 04:53:34 +0000 sys-fs/zfs: add 2.2.4 This release contains a fix for https://github.com/openzfs/zfs/issues/15933. Closes: https://bugs.gentoo.org/928518 Bug: https://bugs.gentoo.org/815469 Bug: https://bugs.gentoo.org/917224 Signed-off-by: Sam James <sam@gentoo.org> sys-fs/zfs/Manifest | 2 + sys-fs/zfs/zfs-2.2.4.ebuild | 308 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 310 insertions(+) |