I have a filesystem used for incremental backups with snapshots. It worked for a few weeks, but now running btrfs subvolume delete /mnt/bak/xyz dies with segmentation fault. The dmesg prints [ 820.902800] ------------[ cut here ]------------ [ 820.902830] kernel BUG at fs/btrfs/inode.c:2951! [ 820.902850] invalid opcode: 0000 [#1] SMP [ 820.902877] CPU 1 [ 820.902884] Modules linked in: tun ext2 [ 820.902915] [ 820.902922] Pid: 4109, comm: btrfs Not tainted 3.3.2-gentoo #1 MSI 8480000/MS-9161 [ 820.902965] RIP: 0010:[<ffffffff8122bc51>] [<ffffffff8122bc51>] btrfs_unlink_subvol+0x1ae/0x1d0 [ 820.903009] RSP: 0018:ffff8802f1a23d08 EFLAGS: 00010286 [ 820.903033] RAX: 00000000ffffffe4 RBX: ffff8802fa308590 RCX: ffff8802fa4b6a38 [ 820.903064] RDX: ffff8802fa4b6a38 RSI: 0000000000000000 RDI: ffff8801ef526108 [ 820.903094] RBP: ffff8802f1a23d88 R08: 0000000000000050 R09: ffffffff8125e5ce [ 820.903126] R10: 0000000000015136 R11: 00000000000150e8 R12: ffff8801f763e090 [ 820.903157] R13: ffff8802f9751000 R14: 000000000000002c R15: ffff8801f7625000 [ 820.903188] FS: 00007f25bd22b740(0000) GS:ffff8801f7d00000(0000) knlGS:0000000000000000 [ 820.903223] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 820.903248] CR2: 00007ff3d0dfcff2 CR3: 00000001efcb4000 CR4: 00000000000006e0 [ 820.903279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 820.903548] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 820.903816] Process btrfs (pid: 4109, threadinfo ffff8802f1a22000, task ffff8802f1996ea0) [ 820.904320] Stack: [ 820.904564] ffff8801f77af878 ffffffff00000016 00000000000001ba ffff8801f77af878 [ 820.905073] 0000000000000908 0000000000000100 00000000000001ba ffffffffffffff84 [ 820.905590] 00000000000001ff 00000000000000c7 ffff8801f77af840 0000000000000000 [ 820.906100] Call Trace: [ 820.906356] [<ffffffff81248e99>] btrfs_ioctl_snap_destroy+0x2fd/0x405 [ 820.906543] [<ffffffff8124abb4>] btrfs_ioctl+0x521/0xfd9 [ 820.906859] [<ffffffff8127a81f>] ? inode_has_perm.clone.22+0x2e/0x30 [ 820.906859] [<ffffffff8127a8b5>] ? file_has_perm+0x94/0xa2 [ 820.906859] [<ffffffff810f6b23>] do_vfs_ioctl+0x40e/0x44f [ 820.906859] [<ffffffff810f6bb5>] sys_ioctl+0x51/0x74 [ 820.906859] [<ffffffff8161f5e2>] system_call_fastpath+0x16/0x1b [ 820.906859] Code: 48 89 43 c8 e8 5c 13 e3 ff 4c 89 ee 48 89 53 78 48 89 53 68 48 89 43 70 48 89 43 60 48 89 da 4c 89 ff e8 95 d8 ff ff 85 c0 74 02 <0f> 0b 4c 89 e7 e8 af e4 fd ff 31 c0 eb 05 b8 f4 ff ff ff 48 83 [ 820.906859] RIP [<ffffffff8122bc51>] btrfs_unlink_subvol+0x1ae/0x1d0 [ 820.906859] RSP <ffff8802f1a23d08> [ 820.910236] ---[ end trace 885c6c07449d20cd ]--- The system is not stable anymore, sync doesn't finish, reboot -fn helps, but nothing else I know. I don't know if this report is enough for something, but it can be reproduced. It started with 3.2.9, and the upgrade to 3.3.2 didn't help. sys-fs/btrfs-progs-0.19-r3 installed, but the bug is in the kernel, so I don't think it does something wrong. Reproducible: Always Steps to Reproduce: 1. using the fs with many data, many snapshots, about 40M files on it (all together) 2. btrfs subvolume delete a specific snapshot kills the system. Actual Results: segfault, and oops. Expected Results: btrfs subvolume delete I don't know any hw errors, no bad blocks, or memory corruptions. The system is stable if i don't use this (damaged??) btrfs. I tried btrfsck but it didn't finish in reasonable time, and it didn't print anything. It does something as I can see it in 'top'.
Well, the btrfsck finished: # btrfsck /dev/devoyo_bak/bak fs tree 257 refs 17 unresolved ref root 257 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 284 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 313 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 359 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 407 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 438 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 444 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 450 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 456 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 462 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 468 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 474 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 480 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 487 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 493 dir 256 index 7 namelen 5 name zengo error 600 unresolved ref root 499 dir 256 index 7 namelen 5 name zengo error 600 found 1892417769473 bytes used err is 1 total csum bytes: 187916812 total tree bytes: 19129901056 total fs tree bytes: 17648214016 btree space waste bytes: 5324780685 file data blocks allocated: 20060907683840 referenced 2187211923456 Btrfs Btrfs v0.19 I don't know why this happened, but I think the kernel shouldn't fail on fs error.
Well, I investigated it further. The oops is still there. I created a network block device, and mounted it on another system, ran btrfs subvolume delete ..., and it works... Conclusion: the problem is not in the fs. It's in the kernel somehow. I recompiled some packages I think to matter... This subvolume delete used to work a few days ago... There were only changes in the filesystem, the machine wasn't rebooted even before the problem first happened... The "bad" machine is an AMD Opteron 280 cpu in a dual cpu mainboard. I attached the emerge --info
Created attachment 309733 [details] emerge --info
Another information: It fails on another machine too. The oops is almost exactly the same. [89471.907795] ------------[ cut here ]------------ [89471.907800] kernel BUG at fs/btrfs/inode.c:2951! [89471.907802] invalid opcode: 0000 [#1] PREEMPT SMP [89471.907806] CPU 0 [89471.907807] Modules linked in: nbd vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia(PO) [89471.907814] [89471.907819] Pid: 26228, comm: btrfs Tainted: P O 3.3.0-gentoo #2 Gigabyte Technol ogy Co., Ltd. X58A-UD3R/X58A-UD3R [89471.907821] RIP: 0010:[<ffffffff811fc31e>] [<ffffffff811fc31e>] btrfs_unlink_subvol+0x1b6/ 0x1d8 [89471.907826] RSP: 0018:ffff8801ed15bd28 EFLAGS: 00010286 [89471.907828] RAX: 00000000ffffffe4 RBX: ffff88031ed28940 RCX: ffff8803210e7738 [89471.907829] RDX: ffff8803210e7738 RSI: 0000000000000000 RDI: ffff88021ec66228 [89471.907830] RBP: ffff880121ee1090 R08: ffff880227547100 R09: ffffffff8122dccd [89471.907832] R10: 0000000000000000 R11: 000001d79c000000 R12: ffff88022f00ac00 [89471.907833] R13: 000000000000002e R14: ffff880172132000 R15: ffff8801eae56c38 [89471.907835] FS: 00007fc02f22f740(0000) GS:ffff88032fc00000(0000) knlGS:0000000000000000 [89471.907836] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [89471.907837] CR2: 00007fc02eb05312 CR3: 0000000235b36000 CR4: 00000000000006f0 [89471.907839] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [89471.907840] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [89471.907842] Process btrfs (pid: 26228, threadinfo ffff8801ed15a000, task ffff8802b100dbe0) [89471.907843] Stack: [89471.907844] ffff8801eae56c38 ffffffff00000017 0000000000000000 0000000000000100 [89471.907846] 0000000000000afc 00000000000001c3 00000000000001c3 ffffffffffffff84 [89471.907848] 00000000000000ff 00000000000000d0 ffff8801eae56c00 0000000000000000 [89471.907850] Call Trace: [89471.907854] [<ffffffff81218b9c>] ? btrfs_ioctl_snap_destroy+0x2f2/0x401 [89471.907856] [<ffffffff8121a850>] ? btrfs_ioctl+0x512/0xfc6 [89471.907859] [<ffffffff8104b617>] ? __srcu_read_unlock+0x3f/0x54 [89471.907862] [<ffffffff810f9aa5>] ? fsnotify+0x235/0x25b [89471.907864] [<ffffffff8104d667>] ? __wake_up+0x35/0x46 [89471.907867] [<ffffffff810dd0df>] ? do_vfs_ioctl+0x407/0x448 [89471.907870] [<ffffffff810cf99d>] ? vfs_write+0xcb/0xf9 [89471.907872] [<ffffffff810dd15c>] ? sys_ioctl+0x3c/0x60 [89471.907875] [<ffffffff8155f5a2>] ? system_call_fastpath+0x16/0x1b [89471.907877] Code: 48 89 43 c8 e8 01 34 e6 ff 4c 89 e6 48 89 53 70 48 89 53 60 48 89 43 68 48 89 43 58 48 89 da 4c 89 f7 e8 6c d8 ff ff 85 c0 74 02 <0f> 0b 48 89 ef e8 c7 e9 fd ff 31 c0 eb 05 b8 f4 ff ff ff 48 83 [89471.907897] RIP [<ffffffff811fc31e>] btrfs_unlink_subvol+0x1b6/0x1d8 [89471.907900] RSP <ffff8801ed15bd28> [89471.907901] ---[ end trace 11188ae4fa50e83b ]--- I attached the emerge --info of this machine too. This is an Intel 930 Cpu And the interesting part: It works when monuted in systemresccd 2.5.1 in virtual machine! (I tested it on and Intel and an AMD computer) Maybe this has something to do with gcc or something... I stop blogging atm. If anyone has any question I can reproduce this anytime.
Created attachment 309735 [details] emerge --info 2 (intel930)
I deleted the test backup of the "bad" partition, and in the 3.4 kernel there are several fixes, so i think this bug is obsolete now.