Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 353907

Summary: sys-apps/coreutils-8.10 copy silently fails on btrfs with compress on amd64 with <=linux-2.6.37
Product: Gentoo Linux Reporter: Zac Medico <zmedico>
Component: [OLD] Core systemAssignee: Gentoo's Team for Core System packages <base-system>
Status: RESOLVED FIXED    
Severity: critical CC: bug, cochet.tina, dberkholz, hiyuh.root, jasiupsota, jlec, kernel, m.debruijne, mattm, mschiff, nirbheek, pchrist, rhill, travisghansen, vivo75
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: http://lists.gnu.org/archive/html/bug-coreutils/2011-02/msg00045.html
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: log of tests/cp/fiemap-2 failure on btrfs with amd64 linux-2.6.37
100_all_coreutils-no-linux-sparse.patch

Description Zac Medico gentoo-dev 2011-02-06 22:09:54 UTC
With coreutils-8.10, the internal copy() function, which is used for critical tasks such as the 'install' command, fails silently with btrfs on amd64. This issue was previously mentioned in 353783, comment #5. It results in installation of broken packages (extremely serious problem).

Maybe tests/cp/fiemap-2, included with coreutils-8.10, is useful for detecting this issue. On the same system this test fails when run on btrfs, but succeeds when run on tmpfs.

Portage 2.2.0_alpha20 (default/linux/amd64/10.0/desktop, gcc-4.5.2, glibc-2.13-r0, 2.6.37 x86_64)
=================================================================
System uname: Linux-2.6.37-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T9300_@_2.50GHz-with-gentoo-2.0.1
Timestamp of tree: Sun, 06 Feb 2011 01:45:01 +0000
ccache version 3.1.4 [disabled]
app-shells/bash:     4.1_p9
dev-java/java-config: 2.1.11-r3
dev-lang/python:     2.6.6-r1, 3.1.3
dev-util/ccache:     3.1.4
dev-util/cmake:      2.8.3-r1
sys-apps/baselayout: 2.0.1-r1
sys-apps/openrc:     0.7.0
sys-apps/sandbox:    2.4
sys-devel/autoconf:  2.68
sys-devel/automake:  1.9.6-r3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.5, 4.5.2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.4-r1
sys-devel/make:      3.82
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
Comment 1 Zac Medico gentoo-dev 2011-02-06 22:17:01 UTC
Created attachment 261685 [details]
log of tests/cp/fiemap-2 failure on btrfs with amd64 linux-2.6.37
Comment 2 Zac Medico gentoo-dev 2011-02-06 22:32:19 UTC
Also, tests/cp/fiemap-2 succeeds with btrfs on the same system/kernel when built and executed in a 32-bit i686 chroot.
Comment 3 SpanKY gentoo-dev 2011-02-06 22:43:32 UTC
try using --sparse=never when running `cp` ...
Comment 4 Zac Medico gentoo-dev 2011-02-06 23:44:49 UTC
If I modify the test like "for i in never; do", it still fails like this:

+ printf x
+ dd bs=1k seek=128 of=k
0+0 records in
0+0 records out
0 bytes (0 B) copied, 1.5923e-05 s, 0.0 kB/s
+ for append in no yes
+ test no = yes
+ for i in never
+ cp --sparse=never k k2
+ cmp k k2
k k2 differ: byte 1, line 1
+ fail=1
+ for append in no yes
+ test yes = yes
+ printf y
+ for i in never
+ cp --sparse=never k k2
+ cmp k k2
+ rm -f k
Comment 5 SpanKY gentoo-dev 2011-02-07 01:09:53 UTC
seems to work fine for me on a small btrfs mount of mine.  i only have ext4 fs's everywhere, so i had to create a small btrfs to test with.

dd if=/dev/zero of=f count=1 seek=1000000
losetup /dev/loop7 f
mkfs.btrfs /dev/loop7 
mount /dev/loop7 /mnt/tmp/
cd /mnt/tmp
<copy over fiemap-2>
while ./fiemap-2 ; do :; done

doesnt fail for me
Comment 6 Zac Medico gentoo-dev 2011-02-07 06:46:12 UTC
Now I've experimented with a variety of btrfs filesystems, and it turns out that I can only reproduce this for filesystems that are on logical volumes created by lvm2, and it only happens with particular volume groups on particular disks. I'm going to try recreating the physical volumes and volume groups on these disks, in order to see if it resolves the issue.
Comment 7 Zac Medico gentoo-dev 2011-02-07 07:03:12 UTC
Actually, it's not just logical volumes. It happens with normal partitions too. However, the test can succeed in one run and fail in the next, so it's important to test multiple times. I've only seen it happen with the "compress" mount option enabled. When I've remounted the same partition with the "compress" option disabled, the test doesn't fail anymore.
Comment 8 Zac Medico gentoo-dev 2011-02-07 07:19:25 UTC
I can reproduce it using a 1G btrfs filesystem created on a loopback device, when mounted with the "compress" option. The steps I use are like this:

dd if=/dev/zero of=/dev/shm/btrfs.img bs=1M count=0 seek=1024
mkfs.btrfs /dev/shm/btrfs.img
mount -t btrfs -o compress /dev/shm/btrfs.img /mnt/btrfs_1g
cp -a /var/tmp/portage/sys-apps/work/coreutils-8.10 /mnt/btrfs_1g
cd /mnt/btrfs_1g/coreutils-8.10/tests
cp/fiemap-2
cp/fiemap-2
cp/fiemap-2

Make sure to run cp/fiemap-2 multiple times, because failure is intermittent (though it seems to fail most of the time).
Comment 9 SpanKY gentoo-dev 2011-02-07 21:38:55 UTC
thanks, i'll try that.  and for the record, i'm running fiemap-2 in a while loop, so it runs many many times before i ctrl+c to kill it.
Comment 10 Travis Hansen 2011-02-08 05:34:42 UTC
If it helps at all my system is md raided and then that raid device shoved into an lvm.
Comment 11 Francesco Riosa 2011-02-09 20:58:17 UTC
just a "me too"

/ is on ext4

PORTAGE_TMPDIR is on 
/dev/md3 on /srv type btrfs (rw,noatime,compress,nodatasum)
md3 : active raid10 sdd7[5] sdc7[7] sdb7[6] sda7[4] sde7[8](S)

kernel is: 2.6.37-vs2.3.0.37-rc2
Comment 12 markus 2011-02-13 15:48:51 UTC
Hi, 
shoudn't we mask coreutils-8.10 since it can definitly harm a system when e.g. portages compile space is on btrfs?

Just had some of portage's files themselves filled with \x0s after an emerge portage -- compiled on a md raided lvm2ed btrfs on an amd64 box.

greetings,
markus
Comment 13 Diego Elio Pettenò (RETIRED) gentoo-dev 2011-02-13 15:57:53 UTC
Not really, btrfs doesn't look like the kind of stuff we should mask packages for on a global level.
Comment 14 SpanKY gentoo-dev 2011-02-13 19:08:33 UTC
what Diego said.  a bit ironic coming from someone who is using a fs clearly labeled "unstable disk format".  simply avoid the compress option for now.
Comment 15 SpanKY gentoo-dev 2011-02-23 04:15:42 UTC
so they've found at least one bug in btrfs which causes this problem ...
Comment 16 Ryan Hill (RETIRED) gentoo-dev 2011-02-25 08:16:58 UTC
http://lwn.net/Articles/429345/ ?
Comment 17 Matthew Marlowe (RETIRED) gentoo-dev 2011-02-25 11:44:15 UTC
Quick comment -- should this bug summary also be updated to note that LWN claims ext4 fs is also impacted, not just btrfs.  And, are we certain that 8.10 is the first coreutils that triggers it?   
Comment 18 Jan Psota 2011-02-25 11:57:17 UTC
(In reply to comment #17)
> [...]  And, are we certain that 8.10
> is the first coreutils that triggers it?   
> 
1. I use btrfs with -ocompress and identified such behaviour on 8.10.
   So I reverted to 8.9 that worked and works right.

2. From NEWS of coreutils-8.10:
** New features
cp now copies sparse files efficiently on file systems with FIEMAP
support (ext4, btrfs, xfs, ocfs2) [...]
Comment 19 Tina 2011-03-02 09:32:10 UTC
i just told about this in btrfs irc channel and they say this should be fixed in 2.6.38 kernel , maybe someone can test and confirm this
Comment 20 Tina 2011-03-02 09:33:55 UTC
In #btrfs they say this should be fixed with 2.6.38 kernel , maybe someone can confirm this?
Comment 21 Ryan Hill (RETIRED) gentoo-dev 2011-03-02 19:41:00 UTC
It's in 2.6.38-rc7.

commit 4660ba63f1c4e07c20a435e084f12ba48a82bd2b
Merge: 958ede7 ec29ed5
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Feb 25 14:03:39 2011 -0800

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
      Btrfs: fix fiemap bugs with delalloc
      Btrfs: set FMODE_EXCL in btrfs_device->mode
      Btrfs: make btrfs_rm_device() fail gracefully
      Btrfs: Avoid accessing unmapped kernel address
      Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl
      Btrfs: allow balance to explicitly allocate chunks as it relocates
      Btrfs: put ENOSPC debugging under a mount option
Comment 22 Ryan Hill (RETIRED) gentoo-dev 2011-03-02 19:42:26 UTC
ext4 patch is here, hasn't been pulled yet:

http://www.spinics.net/lists/linux-ext4/msg23430.html
Comment 23 SpanKY gentoo-dev 2011-03-17 01:53:30 UTC
people should upgrade to 2.6.38 and see if the issue is fixed for them.  because it should be.

i could add an `elog` when the active kernel version is before 2.6.38, but otherwise upstream doesnt seem too keen on trying to handle this in cp.

any other suggestions ?
Comment 24 Donnie Berkholz (RETIRED) gentoo-dev 2011-03-17 03:35:48 UTC
That seems reasonable; might even couple the .38 check with one for mounted btrfs/ext4 filesystems. Ever since the patch went in during a .38 rc, things have been fine for me.
Comment 25 SpanKY gentoo-dev 2011-03-17 04:06:02 UTC
i havent had any problems with ext4, and that's what i run on my systems now.

i guess i could do `grep -qs btrfs /etc/fstab /proc/mounts`.

http://sources.gentoo.org/sys-apps/coreutils/coreutils-8.10.ebuild?r1=1.2&r2=1.3
Comment 26 Matthew Marlowe (RETIRED) gentoo-dev 2011-03-17 06:10:04 UTC
I'd be wary of unmasking anything that could cause corruption w/ ext4 filesystems if the latest gentoo hardened and sources kernel aren't patched to fix.  Is there an urgent need to unmask this version of coreutils?  I assume 2.6.38 didn't include the ext4 patch?  Perhaps we can also look at having the gentoo-sources/gentoo-hardened kernel include the ext4 patch.   Hopefully the above makes sense.
Comment 27 SpanKY gentoo-dev 2011-03-17 07:12:21 UTC
coreutils isnt masked, nor are there plans to mask it
Comment 28 Francesco Riosa 2011-03-19 13:00:13 UTC
(In reply to comment #25)
> i havent had any problems with ext4, and that's what i run on my systems now.
> 
> i guess i could do `grep -qs btrfs /etc/fstab /proc/mounts`.
> 
> http://sources.gentoo.org/sys-apps/coreutils/coreutils-8.10.ebuild?r1=1.2&r2=1.3

Like it but I would prefer the ebuild to die. Just my 0.02€
Comment 29 SpanKY gentoo-dev 2011-04-05 00:56:42 UTC
Created attachment 268535 [details, diff]
100_all_coreutils-no-linux-sparse.patch

could people see if this makes things work for them with <2.6.39 ?
Comment 30 Zac Medico gentoo-dev 2011-04-08 03:07:24 UTC
(In reply to comment #29)
> Created attachment 268535 [details, diff]
> 100_all_coreutils-no-linux-sparse.patch
> 
> could people see if this makes things work for them with <2.6.39 ?

I've tested this patch with 2.6.37.4 and the cp/fiemap-2 test still fails. With 2.6.38.2, cp/fiemap-2 succeeds even without this patch.
Comment 31 SpanKY gentoo-dev 2011-04-15 08:24:17 UTC
Comment on attachment 268535 [details, diff]
100_all_coreutils-no-linux-sparse.patch

you could give 8.11 a try ... it's in the tree
Comment 32 SpanKY gentoo-dev 2011-04-26 20:01:28 UTC
coreutils-8.12 is in the tree now with even more changes related to this
Comment 33 SpanKY gentoo-dev 2012-01-06 19:55:24 UTC
everything should have shaken itself out at this point