The system is running with periodic snapshots, and is taking snaps at 5min, hourly, daily, weekly and monthly intervals. A cron job creates new snaps and cleans expired snaps. No apparent loss of functionality, system instability or filesystem corruption. Reproducible: Sometimes Steps to Reproduce: 1. Linux on btrfs root filesystem, separate subvolume 2. Other factors correlate weakly, but user activity seems the trigger (zero instances overnight) Actual Results: Frequent kernel oops, indicating: WARNING: CPU: 3 PID: 22121 at fs/btrfs/inode.c:2206 record_one_backref+0x3c9/0x440() Expected Results: Kernel should continue to service the filesystem without oops System runs a variety of workloads - tomcat, mysql. Errors do not coincide with any tasks, but appear absent overnight when no clients are active, but this also means snaps have far fewer changes at this time. Issue seems to have occurred on previous 3.10.10 kernel. Kernel error message do not coincide directly with the cron snap tasks, but this may be relevant. sys-fs/btrfs-progs-0.20_rc1_p358 sys-kernel/gentoo-sources-3.11.0
Created attachment 359346 [details] Kernel Config 3.11
Created attachment 359348 [details] Obligatory emerge --info
Created attachment 359352 [details] /proc/self/mounts Output of /proc/self/mounts
Created attachment 359354 [details] dmesg output
Created attachment 359424 [details, diff] Upstream patch
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=50f1319cb5f7690e4d9de18d1a75ea89296d0e53 As apparently my Patch did not include the whole email including the commit name
Thanks Emil, I've built a new kernel with this patch incorporated (wow, small) and I'll let it run for a few days, hopefully with zero indications.
After a day and a half, referenced errors are absent. Server oopsed the same day after the patched kernel was booted, nothing captured in /var/log/messages due to /var partition going read-only, no further details. Will continue to monitor for the remainder of the week.
Created attachment 360296 [details] BTRFS oops Second instance of a kernal oops since applying the patch recommended. Fairly certain both oopses are the same, but did not note details of the first instance. Apologies for the photo, but login fails so cannot dmesg, and this error doesn't make it through syslog-ng to disk (which happens to be on the volume being snapped).
Can you please try to reproduce with 3.12
After two days at 3.12.0-gentoo previously described log errors are not apparent. Occasional (previously highly frequent) lines of: kernel: [782682.259647] BTRFS debug (device dm-1): unlinked 2 orphans also seem to have disappeared entirely. Will continue to monitor for another week before considering this resolved.
It's been two months, if everything is still not functional, please comment here and I will reopen.