Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 412703 - sys-kernel/gentoo-sources-3.3.2 btrfs oops
Summary: sys-kernel/gentoo-sources-3.3.2 btrfs oops
Status: RESOLVED OBSOLETE
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-19 22:04 UTC by László Szalma
Modified: 2012-05-21 11:06 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (file_412703.txt,4.46 KB, text/plain)
2012-04-21 18:07 UTC, László Szalma
Details
emerge --info 2 (intel930) (file_412703.txt,5.13 KB, text/plain)
2012-04-21 18:27 UTC, László Szalma
Details

Note You need to log in before you can comment on or make changes to this bug.
Description László Szalma 2012-04-19 22:04:38 UTC
I have a filesystem used for incremental backups with snapshots. It worked for a few weeks, but now running

btrfs subvolume delete /mnt/bak/xyz

dies with segmentation fault. The dmesg prints

[  820.902800] ------------[ cut here ]------------
[  820.902830] kernel BUG at fs/btrfs/inode.c:2951!
[  820.902850] invalid opcode: 0000 [#1] SMP 
[  820.902877] CPU 1 
[  820.902884] Modules linked in: tun ext2
[  820.902915] 
[  820.902922] Pid: 4109, comm: btrfs Not tainted 3.3.2-gentoo #1 MSI 8480000/MS-9161
[  820.902965] RIP: 0010:[<ffffffff8122bc51>]  [<ffffffff8122bc51>] btrfs_unlink_subvol+0x1ae/0x1d0
[  820.903009] RSP: 0018:ffff8802f1a23d08  EFLAGS: 00010286
[  820.903033] RAX: 00000000ffffffe4 RBX: ffff8802fa308590 RCX: ffff8802fa4b6a38
[  820.903064] RDX: ffff8802fa4b6a38 RSI: 0000000000000000 RDI: ffff8801ef526108
[  820.903094] RBP: ffff8802f1a23d88 R08: 0000000000000050 R09: ffffffff8125e5ce
[  820.903126] R10: 0000000000015136 R11: 00000000000150e8 R12: ffff8801f763e090
[  820.903157] R13: ffff8802f9751000 R14: 000000000000002c R15: ffff8801f7625000
[  820.903188] FS:  00007f25bd22b740(0000) GS:ffff8801f7d00000(0000) knlGS:0000000000000000
[  820.903223] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  820.903248] CR2: 00007ff3d0dfcff2 CR3: 00000001efcb4000 CR4: 00000000000006e0
[  820.903279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  820.903548] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  820.903816] Process btrfs (pid: 4109, threadinfo ffff8802f1a22000, task ffff8802f1996ea0)
[  820.904320] Stack:
[  820.904564]  ffff8801f77af878 ffffffff00000016 00000000000001ba ffff8801f77af878
[  820.905073]  0000000000000908 0000000000000100 00000000000001ba ffffffffffffff84
[  820.905590]  00000000000001ff 00000000000000c7 ffff8801f77af840 0000000000000000
[  820.906100] Call Trace:
[  820.906356]  [<ffffffff81248e99>] btrfs_ioctl_snap_destroy+0x2fd/0x405
[  820.906543]  [<ffffffff8124abb4>] btrfs_ioctl+0x521/0xfd9
[  820.906859]  [<ffffffff8127a81f>] ? inode_has_perm.clone.22+0x2e/0x30
[  820.906859]  [<ffffffff8127a8b5>] ? file_has_perm+0x94/0xa2
[  820.906859]  [<ffffffff810f6b23>] do_vfs_ioctl+0x40e/0x44f
[  820.906859]  [<ffffffff810f6bb5>] sys_ioctl+0x51/0x74
[  820.906859]  [<ffffffff8161f5e2>] system_call_fastpath+0x16/0x1b
[  820.906859] Code: 48 89 43 c8 e8 5c 13 e3 ff 4c 89 ee 48 89 53 78 48 89 53 68 48 89 43 70 48 89 43 60 48 89 da 4c 89 ff e8 95 d8 ff ff 85 c0 74 02 <0f> 0b 4c 89 e7 e8 af e4 fd ff 31 c0 eb 05 b8 f4 ff ff ff 48 83 
[  820.906859] RIP  [<ffffffff8122bc51>] btrfs_unlink_subvol+0x1ae/0x1d0
[  820.906859]  RSP <ffff8802f1a23d08>
[  820.910236] ---[ end trace 885c6c07449d20cd ]---

The system is not stable anymore, sync doesn't finish, reboot -fn helps, but nothing else I know.

I don't know if this report is enough for something, but it can be reproduced. It started with 3.2.9, and the upgrade to 3.3.2 didn't help.

sys-fs/btrfs-progs-0.19-r3 installed, but the bug is in the kernel, so I don't think it does something wrong.

Reproducible: Always

Steps to Reproduce:
1. using the fs with many data, many snapshots, about 40M files on it (all together)
2. btrfs subvolume delete a specific snapshot kills the system.

Actual Results:  
segfault, and oops.

Expected Results:  
btrfs subvolume delete

I don't know any hw errors, no bad blocks, or memory corruptions. The system is stable if i don't use this (damaged??) btrfs.

I tried btrfsck but it didn't finish in reasonable time, and it didn't print anything. It does something as I can see it in 'top'.
Comment 1 László Szalma 2012-04-20 07:26:25 UTC
Well, the btrfsck finished:

# btrfsck /dev/devoyo_bak/bak
fs tree 257 refs 17
        unresolved ref root 257 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 284 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 313 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 359 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 407 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 438 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 444 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 450 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 456 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 462 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 468 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 474 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 480 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 487 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 493 dir 256 index 7 namelen 5 name zengo error 600
        unresolved ref root 499 dir 256 index 7 namelen 5 name zengo error 600
found 1892417769473 bytes used err is 1
total csum bytes: 187916812
total tree bytes: 19129901056
total fs tree bytes: 17648214016
btree space waste bytes: 5324780685
file data blocks allocated: 20060907683840
 referenced 2187211923456
Btrfs Btrfs v0.19


I don't know why this happened, but I think the kernel shouldn't fail on fs error.
Comment 2 László Szalma 2012-04-21 18:06:42 UTC
Well, I investigated it further.

The oops is still there.

I created a network block device, and mounted it on another system, ran btrfs subvolume delete ..., and it works...

Conclusion: the problem is not in the fs. It's in the kernel somehow. I recompiled some packages I think to matter...

This subvolume delete used to work a few days ago... There were only changes in the filesystem, the machine wasn't rebooted even before the problem first happened...

The "bad" machine is an AMD Opteron 280 cpu in a dual cpu mainboard.

I attached the emerge --info
Comment 3 László Szalma 2012-04-21 18:07:07 UTC
Created attachment 309733 [details]
emerge --info
Comment 4 László Szalma 2012-04-21 18:26:11 UTC
Another information:

It fails on another machine too. The oops is almost exactly the same.

[89471.907795] ------------[ cut here ]------------
[89471.907800] kernel BUG at fs/btrfs/inode.c:2951!
[89471.907802] invalid opcode: 0000 [#1] PREEMPT SMP 
[89471.907806] CPU 0 
[89471.907807] Modules linked in: nbd vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO)
[89471.907814] 
[89471.907819] Pid: 26228, comm: btrfs Tainted: P           O 3.3.0-gentoo #2 Gigabyte Technol
ogy Co., Ltd. X58A-UD3R/X58A-UD3R
[89471.907821] RIP: 0010:[<ffffffff811fc31e>]  [<ffffffff811fc31e>] btrfs_unlink_subvol+0x1b6/
0x1d8
[89471.907826] RSP: 0018:ffff8801ed15bd28  EFLAGS: 00010286
[89471.907828] RAX: 00000000ffffffe4 RBX: ffff88031ed28940 RCX: ffff8803210e7738
[89471.907829] RDX: ffff8803210e7738 RSI: 0000000000000000 RDI: ffff88021ec66228
[89471.907830] RBP: ffff880121ee1090 R08: ffff880227547100 R09: ffffffff8122dccd
[89471.907832] R10: 0000000000000000 R11: 000001d79c000000 R12: ffff88022f00ac00
[89471.907833] R13: 000000000000002e R14: ffff880172132000 R15: ffff8801eae56c38
[89471.907835] FS:  00007fc02f22f740(0000) GS:ffff88032fc00000(0000) knlGS:0000000000000000
[89471.907836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[89471.907837] CR2: 00007fc02eb05312 CR3: 0000000235b36000 CR4: 00000000000006f0
[89471.907839] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[89471.907840] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[89471.907842] Process btrfs (pid: 26228, threadinfo ffff8801ed15a000, task ffff8802b100dbe0)
[89471.907843] Stack:
[89471.907844]  ffff8801eae56c38 ffffffff00000017 0000000000000000 0000000000000100
[89471.907846]  0000000000000afc 00000000000001c3 00000000000001c3 ffffffffffffff84
[89471.907848]  00000000000000ff 00000000000000d0 ffff8801eae56c00 0000000000000000
[89471.907850] Call Trace:
[89471.907854]  [<ffffffff81218b9c>] ? btrfs_ioctl_snap_destroy+0x2f2/0x401
[89471.907856]  [<ffffffff8121a850>] ? btrfs_ioctl+0x512/0xfc6
[89471.907859]  [<ffffffff8104b617>] ? __srcu_read_unlock+0x3f/0x54
[89471.907862]  [<ffffffff810f9aa5>] ? fsnotify+0x235/0x25b
[89471.907864]  [<ffffffff8104d667>] ? __wake_up+0x35/0x46
[89471.907867]  [<ffffffff810dd0df>] ? do_vfs_ioctl+0x407/0x448
[89471.907870]  [<ffffffff810cf99d>] ? vfs_write+0xcb/0xf9
[89471.907872]  [<ffffffff810dd15c>] ? sys_ioctl+0x3c/0x60
[89471.907875]  [<ffffffff8155f5a2>] ? system_call_fastpath+0x16/0x1b
[89471.907877] Code: 48 89 43 c8 e8 01 34 e6 ff 4c 89 e6 48 89 53 70 48 89 53 60 48 89 43 68 48 89 43 58 48 89 da 4c 89 f7 e8 6c d8 ff ff 85 c0 74 02 <0f> 0b 48 89 ef e8 c7 e9 fd ff 31 c0 eb 05 b8 f4 ff ff ff 48 83 
[89471.907897] RIP  [<ffffffff811fc31e>] btrfs_unlink_subvol+0x1b6/0x1d8
[89471.907900]  RSP <ffff8801ed15bd28>
[89471.907901] ---[ end trace 11188ae4fa50e83b ]---


I attached the emerge --info of this machine too. This is an Intel 930 Cpu

And the interesting part:

It works when monuted in systemresccd 2.5.1 in virtual machine! (I tested it on and Intel and an AMD computer)



Maybe this has something to do with gcc or something...
I stop blogging atm. If anyone has any question I can reproduce this anytime.
Comment 5 László Szalma 2012-04-21 18:27:03 UTC
Created attachment 309735 [details]
emerge --info 2 (intel930)
Comment 6 László Szalma 2012-05-21 11:06:04 UTC
I deleted the test backup of the "bad" partition, and in the 3.4 kernel there are several fixes, so i think this bug is obsolete now.