|Summary:||sys-apps/coreutils-8.10 copy silently fails on btrfs with compress on amd64 with <=linux-2.6.37|
|Product:||Gentoo Linux||Reporter:||Zac Medico <zmedico>|
|Component:||[OLD] Core system||Assignee:||Gentoo's Team for Core System packages <base-system>|
|Severity:||critical||CC:||bug, cochet.tina, dberkholz, hiyuh.root, jasiupsota, jlec, kernel, m.debruijne, mattm, mschiff, nirbheek, pchrist, rhill, travisghansen, vivo75|
|Package list:||Runtime testing required:||---|
log of tests/cp/fiemap-2 failure on btrfs with amd64 linux-2.6.37
Description Zac Medico 2011-02-06 22:09:54 UTC
With coreutils-8.10, the internal copy() function, which is used for critical tasks such as the 'install' command, fails silently with btrfs on amd64. This issue was previously mentioned in 353783, comment #5. It results in installation of broken packages (extremely serious problem). Maybe tests/cp/fiemap-2, included with coreutils-8.10, is useful for detecting this issue. On the same system this test fails when run on btrfs, but succeeds when run on tmpfs. Portage 2.2.0_alpha20 (default/linux/amd64/10.0/desktop, gcc-4.5.2, glibc-2.13-r0, 2.6.37 x86_64) ================================================================= System uname: Linux-2.6.37-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T9300_@_2.50GHz-with-gentoo-2.0.1 Timestamp of tree: Sun, 06 Feb 2011 01:45:01 +0000 ccache version 3.1.4 [disabled] app-shells/bash: 4.1_p9 dev-java/java-config: 2.1.11-r3 dev-lang/python: 2.6.6-r1, 3.1.3 dev-util/ccache: 3.1.4 dev-util/cmake: 2.8.3-r1 sys-apps/baselayout: 2.0.1-r1 sys-apps/openrc: 0.7.0 sys-apps/sandbox: 2.4 sys-devel/autoconf: 2.68 sys-devel/automake: 1.9.6-r3, 1.11.1 sys-devel/binutils: 2.20.1-r1 sys-devel/gcc: 4.4.5, 4.5.2 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.4-r1 sys-devel/make: 3.82 virtual/os-headers: 18.104.22.168 (sys-kernel/linux-headers)
Comment 1 Zac Medico 2011-02-06 22:17:01 UTC
Created attachment 261685 [details] log of tests/cp/fiemap-2 failure on btrfs with amd64 linux-2.6.37
Comment 2 Zac Medico 2011-02-06 22:32:19 UTC
Also, tests/cp/fiemap-2 succeeds with btrfs on the same system/kernel when built and executed in a 32-bit i686 chroot.
Comment 3 SpanKY 2011-02-06 22:43:32 UTC
try using --sparse=never when running `cp` ...
Comment 4 Zac Medico 2011-02-06 23:44:49 UTC
If I modify the test like "for i in never; do", it still fails like this: + printf x + dd bs=1k seek=128 of=k 0+0 records in 0+0 records out 0 bytes (0 B) copied, 1.5923e-05 s, 0.0 kB/s + for append in no yes + test no = yes + for i in never + cp --sparse=never k k2 + cmp k k2 k k2 differ: byte 1, line 1 + fail=1 + for append in no yes + test yes = yes + printf y + for i in never + cp --sparse=never k k2 + cmp k k2 + rm -f k
Comment 5 SpanKY 2011-02-07 01:09:53 UTC
seems to work fine for me on a small btrfs mount of mine. i only have ext4 fs's everywhere, so i had to create a small btrfs to test with. dd if=/dev/zero of=f count=1 seek=1000000 losetup /dev/loop7 f mkfs.btrfs /dev/loop7 mount /dev/loop7 /mnt/tmp/ cd /mnt/tmp <copy over fiemap-2> while ./fiemap-2 ; do :; done doesnt fail for me
Comment 6 Zac Medico 2011-02-07 06:46:12 UTC
Now I've experimented with a variety of btrfs filesystems, and it turns out that I can only reproduce this for filesystems that are on logical volumes created by lvm2, and it only happens with particular volume groups on particular disks. I'm going to try recreating the physical volumes and volume groups on these disks, in order to see if it resolves the issue.
Comment 7 Zac Medico 2011-02-07 07:03:12 UTC
Actually, it's not just logical volumes. It happens with normal partitions too. However, the test can succeed in one run and fail in the next, so it's important to test multiple times. I've only seen it happen with the "compress" mount option enabled. When I've remounted the same partition with the "compress" option disabled, the test doesn't fail anymore.
Comment 8 Zac Medico 2011-02-07 07:19:25 UTC
I can reproduce it using a 1G btrfs filesystem created on a loopback device, when mounted with the "compress" option. The steps I use are like this: dd if=/dev/zero of=/dev/shm/btrfs.img bs=1M count=0 seek=1024 mkfs.btrfs /dev/shm/btrfs.img mount -t btrfs -o compress /dev/shm/btrfs.img /mnt/btrfs_1g cp -a /var/tmp/portage/sys-apps/work/coreutils-8.10 /mnt/btrfs_1g cd /mnt/btrfs_1g/coreutils-8.10/tests cp/fiemap-2 cp/fiemap-2 cp/fiemap-2 Make sure to run cp/fiemap-2 multiple times, because failure is intermittent (though it seems to fail most of the time).
Comment 9 SpanKY 2011-02-07 21:38:55 UTC
thanks, i'll try that. and for the record, i'm running fiemap-2 in a while loop, so it runs many many times before i ctrl+c to kill it.
Comment 10 Travis Hansen 2011-02-08 05:34:42 UTC
If it helps at all my system is md raided and then that raid device shoved into an lvm.
Comment 11 Francesco Riosa 2011-02-09 20:58:17 UTC
just a "me too" / is on ext4 PORTAGE_TMPDIR is on /dev/md3 on /srv type btrfs (rw,noatime,compress,nodatasum) md3 : active raid10 sdd7 sdc7 sdb7 sda7 sde7(S) kernel is: 2.6.37-vs22.214.171.124-rc2
Comment 12 markus 2011-02-13 15:48:51 UTC
Hi, shoudn't we mask coreutils-8.10 since it can definitly harm a system when e.g. portages compile space is on btrfs? Just had some of portage's files themselves filled with \x0s after an emerge portage -- compiled on a md raided lvm2ed btrfs on an amd64 box. greetings, markus
Comment 13 Diego Elio Pettenò (RETIRED) 2011-02-13 15:57:53 UTC
Not really, btrfs doesn't look like the kind of stuff we should mask packages for on a global level.
Comment 14 SpanKY 2011-02-13 19:08:33 UTC
what Diego said. a bit ironic coming from someone who is using a fs clearly labeled "unstable disk format". simply avoid the compress option for now.
Comment 15 SpanKY 2011-02-23 04:15:42 UTC
so they've found at least one bug in btrfs which causes this problem ...
Comment 17 Matthew Marlowe 2011-02-25 11:44:15 UTC
Quick comment -- should this bug summary also be updated to note that LWN claims ext4 fs is also impacted, not just btrfs. And, are we certain that 8.10 is the first coreutils that triggers it?
Comment 18 Jan Psota 2011-02-25 11:57:17 UTC
(In reply to comment #17) > [...] And, are we certain that 8.10 > is the first coreutils that triggers it? > 1. I use btrfs with -ocompress and identified such behaviour on 8.10. So I reverted to 8.9 that worked and works right. 2. From NEWS of coreutils-8.10: ** New features cp now copies sparse files efficiently on file systems with FIEMAP support (ext4, btrfs, xfs, ocfs2) [...]
Comment 19 Tina 2011-03-02 09:32:10 UTC
i just told about this in btrfs irc channel and they say this should be fixed in 2.6.38 kernel , maybe someone can test and confirm this
Comment 20 Tina 2011-03-02 09:33:55 UTC
In #btrfs they say this should be fixed with 2.6.38 kernel , maybe someone can confirm this?
Comment 21 Ryan Hill (RETIRED) 2011-03-02 19:41:00 UTC
It's in 2.6.38-rc7. commit 4660ba63f1c4e07c20a435e084f12ba48a82bd2b Merge: 958ede7 ec29ed5 Author: Linus Torvalds <firstname.lastname@example.org> Date: Fri Feb 25 14:03:39 2011 -0800 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix fiemap bugs with delalloc Btrfs: set FMODE_EXCL in btrfs_device->mode Btrfs: make btrfs_rm_device() fail gracefully Btrfs: Avoid accessing unmapped kernel address Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs: put ENOSPC debugging under a mount option
Comment 22 Ryan Hill (RETIRED) 2011-03-02 19:42:26 UTC
ext4 patch is here, hasn't been pulled yet: http://www.spinics.net/lists/linux-ext4/msg23430.html
Comment 23 SpanKY 2011-03-17 01:53:30 UTC
people should upgrade to 2.6.38 and see if the issue is fixed for them. because it should be. i could add an `elog` when the active kernel version is before 2.6.38, but otherwise upstream doesnt seem too keen on trying to handle this in cp. any other suggestions ?
Comment 24 Donnie Berkholz (RETIRED) 2011-03-17 03:35:48 UTC
That seems reasonable; might even couple the .38 check with one for mounted btrfs/ext4 filesystems. Ever since the patch went in during a .38 rc, things have been fine for me.
Comment 25 SpanKY 2011-03-17 04:06:02 UTC
i havent had any problems with ext4, and that's what i run on my systems now. i guess i could do `grep -qs btrfs /etc/fstab /proc/mounts`. http://sources.gentoo.org/sys-apps/coreutils/coreutils-8.10.ebuild?r1=1.2&r2=1.3
Comment 26 Matthew Marlowe 2011-03-17 06:10:04 UTC
I'd be wary of unmasking anything that could cause corruption w/ ext4 filesystems if the latest gentoo hardened and sources kernel aren't patched to fix. Is there an urgent need to unmask this version of coreutils? I assume 2.6.38 didn't include the ext4 patch? Perhaps we can also look at having the gentoo-sources/gentoo-hardened kernel include the ext4 patch. Hopefully the above makes sense.
Comment 27 SpanKY 2011-03-17 07:12:21 UTC
coreutils isnt masked, nor are there plans to mask it
Comment 28 Francesco Riosa 2011-03-19 13:00:13 UTC
(In reply to comment #25) > i havent had any problems with ext4, and that's what i run on my systems now. > > i guess i could do `grep -qs btrfs /etc/fstab /proc/mounts`. > > http://sources.gentoo.org/sys-apps/coreutils/coreutils-8.10.ebuild?r1=1.2&r2=1.3 Like it but I would prefer the ebuild to die. Just my 0.02€
Comment 29 SpanKY 2011-04-05 00:56:42 UTC
Created attachment 268535 [details, diff] 100_all_coreutils-no-linux-sparse.patch could people see if this makes things work for them with <2.6.39 ?
Comment 30 Zac Medico 2011-04-08 03:07:24 UTC
(In reply to comment #29) > Created attachment 268535 [details, diff] > 100_all_coreutils-no-linux-sparse.patch > > could people see if this makes things work for them with <2.6.39 ? I've tested this patch with 126.96.36.199 and the cp/fiemap-2 test still fails. With 188.8.131.52, cp/fiemap-2 succeeds even without this patch.
Comment 31 SpanKY 2011-04-15 08:24:17 UTC
Comment on attachment 268535 [details, diff] 100_all_coreutils-no-linux-sparse.patch you could give 8.11 a try ... it's in the tree
Comment 32 SpanKY 2011-04-26 20:01:28 UTC
coreutils-8.12 is in the tree now with even more changes related to this
Comment 33 SpanKY 2012-01-06 19:55:24 UTC
everything should have shaken itself out at this point