When I register "cache device" for bcache, load is increased by 1. When bcache is used, deadlock appears often, its not possible to write on that cached disk and output of dmesg says that: -------------------------------------------------------------- [ 2138.360002] INFO: rcu_sched self-detected stall on CPU { 0} (t=21000 jiffies g=23827 c=23826 q=2794) [ 2138.360002] Task dump for CPU 0: [ 2138.360002] bcache_gc R running task 0 3010 2 0x00080008 [ 2138.360002] 0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13 [ 2138.360002] ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08 [ 2138.360002] ffffffff81116c68 001dcd6500000000 ffff880272411140 00000000000cachetest ~ # emerge --info Portage 2.2.14 (python 3.4.1-final-0, default/linux/amd64/13.0, gcc-4.8.3, glibc-2.19-r1, 3.19.0-gentoo x86_64) ================================================================= System uname: Linux-3.19.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-2400_CPU_@_3.10GHz-with-gentoo-2.2 KiB Mem: 8969520 total, 8830580 free KiB Swap: 0 total, 0 free Timestamp of tree: Sat, 14 Feb 2015 20:30:01 +0000 ld GNU ld (Gentoo 2.24 p1.4) 2.24 app-shells/bash: 4.2_p53 dev-lang/perl: 5.20.1-r4 dev-lang/python: 2.7.9-r1, 3.4.1 dev-util/pkgconfig: 0.28-r1 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.13.9 sys-apps/sandbox: 2.6-r1 sys-devel/autoconf: 2.69 sys-devel/automake: 1.13.4 sys-devel/binutils: 2.24-r3 sys-devel/gcc: 4.8.3 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4.4 sys-devel/make: 4.0-r1 sys-kernel/linux-headers: 3.16 (virtual/os-headers) sys-libs/glibc: 2.19-r1 Repositories: gentoo x-portage ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=nocona -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=nocona -O2 -pipe" DISTDIR="/mnt/data/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LC_ALL="cs_CZ.UTF8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j5 -l5" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/mnt/data/portage/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" USE="acl acpi amd64 bash-completion berkdb bzip2 caps cracklib crypt curl cxx dane fam fuse iconv icu inotify ipv6 libnotify lzma lzo mmap mmx mmxext ncurses network nls nptl openssl pam pcre pie rar readline slang sse sse2 sse3 ssl ssse3 syslog tcpd threads udev unicode xattr zlib" ABI_X86="64" CURL_SSL="openssl" ELIBC="glibc" KERNEL="linux" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 arm armeb" QEMU_USER_TARGETS="x86_64 arm armeb" USERLAND="GNU" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON 00096 [ 2138.360002] Call Trace: [ 2138.360002] <IRQ> [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 2138.360002] [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0 [ 2138.360002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2138.360002] [<ffffffff811190b1>] ? update_process_times+0x31/0x60 [ 2138.360002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2138.360002] [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60 [ 2138.360002] [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 2138.360002] [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230 [ 2138.360002] [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 2138.360002] [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80 [ 2138.360002] <EOI> [<ffffffff81523ea5>] ? bch_extent_bad+0xf5/0x1b0 [ 2138.360002] [<ffffffff81523ddc>] ? bch_extent_bad+0x2c/0x1b0 [ 2138.360002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2138.360002] [<ffffffff8151c0d1>] ? bch_btree_iter_next_filter+0x21/0x50 [ 2138.360002] [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50 [ 2138.360002] [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0 [ 2138.360002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2138.360002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2138.360002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2138.360002] [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510 [ 2138.360002] [<ffffffff81105360>] ? wait_woken+0x80/0x80 [ 2138.360002] [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110 [ 2138.360002] [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510 [ 2138.360002] [<ffffffff810ecedc>] ? kthread+0xbc/0xe0 [ 2138.360002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2138.360002] [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0 [ 2138.360002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2201.363002] INFO: rcu_sched self-detected stall on CPU { 0} (t=84003 jiffies g=23827 c=23826 q=10658) [ 2201.363002] Task dump for CPU 0: [ 2201.363002] bcache_gc R running task 0 3010 2 0x00080008 [ 2201.363002] 0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13 [ 2201.363002] ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08 [ 2201.363002] ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096 [ 2201.363002] Call Trace: [ 2201.363002] <IRQ> [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 2201.363002] [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0 [ 2201.363002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2201.363002] [<ffffffff811190b1>] ? update_process_times+0x31/0x60 [ 2201.363002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2201.363002] [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60 [ 2201.363002] [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 2201.363002] [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230 [ 2201.363002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2201.363002] [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 2201.363002] [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80 [ 2201.363002] <EOI> [<ffffffff8151b3ae>] ? bch_btree_iter_next+0x9e/0x320 [ 2201.363002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2201.363002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2201.363002] [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50 [ 2201.363002] [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0 [ 2201.363002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2201.363002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2201.363002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2201.363002] [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510 [ 2201.363002] [<ffffffff81105360>] ? wait_woken+0x80/0x80 [ 2201.363002] [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110 [ 2201.363002] [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510 [ 2201.363002] [<ffffffff810ecedc>] ? kthread+0xbc/0xe0 [ 2201.363002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2201.363002] [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0 [ 2201.363002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 -------------------------------------------------------------- It happens on kernels higher than 3.12. Tested on gentoo-sources 3.12.35 (works without problems), 3.14.33, 3.17.8-r1 a 3.19.0. Reproducible: Always Steps to Reproduce: pstree && echo \ && lsblk && echo \ && w && echo \ && make-bcache -C /dev/vg/cache1 \ && echo /dev/vg/cache1 > /sys/fs/bcache/register \ && echo && sleep 300 && w \ && echo && sleep 300 && w && echo\ && make-bcache -C /dev/vg/cache2 \ && echo /dev/vg/cache2 > /sys/fs/bcache/register \ && echo && sleep 300 && w \ && echo && sleep 300 && w Actual Results: init─┬─6*[agetty] └─sshd───sshd───bash───pstree NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT fd0 2:0 1 4K 0 disk sda 8:0 0 15G 0 disk / sdb 8:16 0 30G 0 disk ├─vg-cache1 251:0 0 5G 0 lvm └─vg-cache2 251:1 0 5G 0 lvm loop0 7:0 0 101,6M 1 loop /usr/portage 15:39:38 up 2 min, 1 user, load average: 0,00, 0,00, 0,00 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 2.00s 0.00s 0.00s w UUID: 7648c99a-8762-4a44-a021-4e60d94698f6 Set UUID: 75c20ebf-b8a8-4c6a-a35a-0a00c25709b3 version: 0 nbuckets: 10240 block_size: 1 bucket_size: 1024 nr_in_set: 1 nr_this_dev: 0 first_bucket: 1 15:44:38 up 7 min, 1 user, load average: 0,99, 0,63, 0,28 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 5:02 0.01s 0.00s w 15:49:38 up 12 min, 1 user, load average: 1,00, 0,87, 0,48 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 10:02 0.01s 0.00s w UUID: 16c1a207-54de-4c8e-a723-0e13087123c7 Set UUID: 045e53f3-f592-41ec-b404-a845e99b9e87 version: 0 nbuckets: 10240 block_size: 1 bucket_size: 1024 nr_in_set: 1 nr_this_dev: 0 first_bucket: 1 15:54:39 up 17 min, 1 user, load average: 1,99, 1,58, 0,90 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 15:03 0.01s 0.00s w 15:59:39 up 22 min, 1 user, load average: 2,00, 1,85, 1,20 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 20:03 0.01s 0.00s w Expected Results: ... ... ... 15:59:39 up 22 min, 1 user, load average: 0,00, 0,00, 0,00 USER TTY LOGIN@ IDLE JCPU PCPU WHAT root pts/0 15:38 20:03 0.01s 0.00s w cachetest ~ # emerge --info Portage 2.2.14 (python 3.4.1-final-0, default/linux/amd64/13.0, gcc-4.8.3, glibc-2.19-r1, 3.19.0-gentoo x86_64) ================================================================= System uname: Linux-3.19.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-2400_CPU_@_3.10GHz-with-gentoo-2.2 KiB Mem: 8969520 total, 8830580 free KiB Swap: 0 total, 0 free Timestamp of tree: Sat, 14 Feb 2015 20:30:01 +0000 ld GNU ld (Gentoo 2.24 p1.4) 2.24 app-shells/bash: 4.2_p53 dev-lang/perl: 5.20.1-r4 dev-lang/python: 2.7.9-r1, 3.4.1 dev-util/pkgconfig: 0.28-r1 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.13.9 sys-apps/sandbox: 2.6-r1 sys-devel/autoconf: 2.69 sys-devel/automake: 1.13.4 sys-devel/binutils: 2.24-r3 sys-devel/gcc: 4.8.3 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4.4 sys-devel/make: 4.0-r1 sys-kernel/linux-headers: 3.16 (virtual/os-headers) sys-libs/glibc: 2.19-r1 Repositories: gentoo x-portage ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=nocona -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=nocona -O2 -pipe" DISTDIR="/mnt/data/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LC_ALL="cs_CZ.UTF8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j5 -l5" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/mnt/data/portage/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" USE="acl acpi amd64 bash-completion berkdb bzip2 caps cracklib crypt curl cxx dane fam fuse iconv icu inotify ipv6 libnotify lzma lzo mmap mmx mmxext ncurses network nls nptl openssl pam pcre pie rar readline slang sse sse2 sse3 ssl ssse3 syslog tcpd threads udev unicode xattr zlib" ABI_X86="64" CURL_SSL="openssl" ELIBC="glibc" KERNEL="linux" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 arm armeb" QEMU_USER_TARGETS="x86_64 arm armeb" USERLAND="GNU" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON
I'm sorry, the correct output of dmesg is here: -------------------------------------------------------------- [ 2138.360002] INFO: rcu_sched self-detected stall on CPU { 0} (t=21000 jiffies g=23827 c=23826 q=2794) [ 2138.360002] Task dump for CPU 0: [ 2138.360002] bcache_gc R running task 0 3010 2 0x00080008 [ 2138.360002] 0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13 [ 2138.360002] ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08 [ 2138.360002] ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096 [ 2138.360002] Call Trace: [ 2138.360002] <IRQ> [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 2138.360002] [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0 [ 2138.360002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2138.360002] [<ffffffff811190b1>] ? update_process_times+0x31/0x60 [ 2138.360002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2138.360002] [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60 [ 2138.360002] [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 2138.360002] [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230 [ 2138.360002] [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 2138.360002] [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80 [ 2138.360002] <EOI> [<ffffffff81523ea5>] ? bch_extent_bad+0xf5/0x1b0 [ 2138.360002] [<ffffffff81523ddc>] ? bch_extent_bad+0x2c/0x1b0 [ 2138.360002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2138.360002] [<ffffffff8151c0d1>] ? bch_btree_iter_next_filter+0x21/0x50 [ 2138.360002] [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50 [ 2138.360002] [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0 [ 2138.360002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2138.360002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2138.360002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2138.360002] [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510 [ 2138.360002] [<ffffffff81105360>] ? wait_woken+0x80/0x80 [ 2138.360002] [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110 [ 2138.360002] [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510 [ 2138.360002] [<ffffffff810ecedc>] ? kthread+0xbc/0xe0 [ 2138.360002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2138.360002] [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0 [ 2138.360002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2201.363002] INFO: rcu_sched self-detected stall on CPU { 0} (t=84003 jiffies g=23827 c=23826 q=10658) [ 2201.363002] Task dump for CPU 0: [ 2201.363002] bcache_gc R running task 0 3010 2 0x00080008 [ 2201.363002] 0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13 [ 2201.363002] ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08 [ 2201.363002] ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096 [ 2201.363002] Call Trace: [ 2201.363002] <IRQ> [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 2201.363002] [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0 [ 2201.363002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2201.363002] [<ffffffff811190b1>] ? update_process_times+0x31/0x60 [ 2201.363002] [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30 [ 2201.363002] [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60 [ 2201.363002] [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 2201.363002] [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230 [ 2201.363002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2201.363002] [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 2201.363002] [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80 [ 2201.363002] <EOI> [<ffffffff8151b3ae>] ? bch_btree_iter_next+0x9e/0x320 [ 2201.363002] [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10 [ 2201.363002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2201.363002] [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50 [ 2201.363002] [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0 [ 2201.363002] [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 2201.363002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2201.363002] [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220 [ 2201.363002] [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510 [ 2201.363002] [<ffffffff81105360>] ? wait_woken+0x80/0x80 [ 2201.363002] [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110 [ 2201.363002] [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510 [ 2201.363002] [<ffffffff810ecedc>] ? kthread+0xbc/0xe0 [ 2201.363002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 [ 2201.363002] [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0 [ 2201.363002] [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170 --------------------------------------------------------------
1) How is this a Gentoo specific problem? 2) Are you using it correctly?
1) it looks not like gentoo specific problem try this patch in bcache mailing list: http://www.spinics.net/lists/linux-bcache/msg02669.html --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1741,6 +1741,7 @@ static void bch_btree_gc(struct cache_set *c) do { ret = btree_root(gc_root, c, &op, &writes, &stats); closure_sync(&writes); + cond_resched(); if (ret && ret != -EAGAIN) pr_warn("gc failed!"); -- for me it solves similar problem with bcache unsuitability. bcache maintainer looks busy. maybe it should be included in genpatches.
(In reply to Jeroen Roovers from comment #2) > 1) How is this a Gentoo specific problem? > 2) Are you using it correctly? 1) i do not know, I did not try it elsewhere. High load is also reflected with vanilla-sources-3.19.0 (and during the test is started only kernel, init ssh and agetty), so it probably will be general. 2) see https://www.kernel.org/doc/Documentation/bcache.txt 'make-bcache -C /dev/device && echo /dev/device > /sys/fs/bcache/register' is a standard procedure. PS: Excuse my poor English :-)
(In reply to Jan Hudoba from comment #3) > 1) it looks not like gentoo specific problem > > try this patch in bcache mailing list: > http://www.spinics.net/lists/linux-bcache/msg02669.html On gentoo-sources-3.19.0 with this patch the problem persists ...:-(
> 1) i do not know, I did not try it elsewhere. High load is also reflected > with vanilla-sources-3.19.0 (and during the test is started only kernel, > init ssh and agetty), so it probably will be general. vanilla-sources-3.19.0 also ends with deadlock: ----------------------------------------------------------- [ 7179.873002] INFO: rcu_sched self-detected stall on CPU { 1} (t=273016 jiffies g=51336 c=51335 q=35664) [ 7179.873002] Task dump for CPU 1: [ 7179.873002] bcache_gc R running task 0 3007 2 0x00080008 [ 7179.873002] 0000000000000002 ffffffff81877100 ffffffff81113a7f 000000000000c888 [ 7179.873002] ffff880272511d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08 [ 7179.873002] ffffffff81116c28 0000000000000001 ffff880272511140 ffff880272503e80 [ 7179.873002] Call Trace: [ 7179.873002] <IRQ> [<ffffffff81113a7f>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 7179.873002] [<ffffffff81116c28>] ? rcu_check_callbacks+0x368/0x5a0 [ 7179.873002] [<ffffffff81127010>] ? tick_sched_handle.isra.18+0x30/0x30 [ 7179.873002] [<ffffffff81119071>] ? update_process_times+0x31/0x60 [ 7179.873002] [<ffffffff81127043>] ? tick_sched_timer+0x33/0x60 [ 7179.873002] [<ffffffff81119655>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 7179.873002] [<ffffffff81119c27>] ? hrtimer_interrupt+0xd7/0x230 [ 7179.873002] [<ffffffff8151c1b0>] ? bch_ptr_invalid+0x10/0x10 [ 7179.873002] [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 7179.873002] [<ffffffff816621bd>] ? apic_timer_interrupt+0x6d/0x80 [ 7179.873002] <EOI> [<ffffffff8151b255>] ? bch_btree_iter_next+0x45/0x320 [ 7179.873002] [<ffffffff8151c1b0>] ? bch_ptr_invalid+0x10/0x10 [ 7179.873002] [<ffffffff8151bfdd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 7179.873002] [<ffffffff8151c910>] ? btree_gc_count_keys+0x40/0x50 [ 7179.873002] [<ffffffff815211cd>] ? btree_gc_recurse+0x18d/0x2f0 [ 7179.873002] [<ffffffff8151bfdd>] ? bch_btree_iter_next_filter+0x2d/0x50 [ 7179.873002] [<ffffffff8151d3b4>] ? btree_gc_mark_node+0x54/0x220 [ 7179.873002] [<ffffffff8151d3b4>] ? btree_gc_mark_node+0x54/0x220 [ 7179.873002] [<ffffffff8152264e>] ? closure_sync+0x7e/0x90 [ 7179.873002] [<ffffffff815218aa>] ? bch_btree_gc+0x3ea/0x510 [ 7179.873002] [<ffffffff81105320>] ? wait_woken+0x80/0x80 [ 7179.873002] [<ffffffff81521a00>] ? bch_gc_thread+0x30/0x110 [ 7179.873002] [<ffffffff815219d0>] ? bch_btree_gc+0x510/0x510 [ 7179.873002] [<ffffffff810ece9c>] ? kthread+0xbc/0xe0 [ 7179.873002] [<ffffffff810ecde0>] ? kthread_create_on_node+0x170/0x170 [ 7179.873002] [<ffffffff8166127c>] ? ret_from_fork+0x7c/0xb0 [ 7179.873002] [<ffffffff810ecde0>] ? kthread_create_on_node+0x170/0x170 ----------------------------------------------------------- My test environment with RAMdisk (problem appeared even when using SSD): ----------------------------------------------------------- mount -t tmpfs -o nodev,nosuid,size=6G none /mnt/tmp/ dd if=/dev/zero of=/mnt/tmp/loop.bin bs=1M count=6100 losetup -f /mnt/tmp/loop.bin losetup -a make-bcache -C /dev/loop1 make-bcache -B /dev/vg/data echo /dev/loop1 > /sys/fs/bcache/register echo /dev/vg/data > /sys/fs/bcache/register echo c8a360ca-bc1e-4b77-b50b-47b00c5b3166 > /sys/block/bcache0/bcache/attach echo writeback > /sys/block/bcache0/bcache/cache_mode mkfs -t ext4 /dev/bcache0 mount -o relatime /dev/bcache0 /mnt/gentoo/ cd /mnt/gentoo echo "[global] randrepeat=1 ioengine=libaio bs=512 ba=512 size=512M #size=128M direct=1 gtod_reduce=1 norandommap iodepth=64 [mix] rw=randrw #rw=randwrite stonewall timeout=30" > ../fio.config while true; do fio ../fio.config && sync && rm mix.1.0 && sleep 10; done ----------------------------------------------------------- ... and leave a few hours to run.
(In reply to Jan Hudoba from comment #3) > try this patch in bcache mailing list: > http://www.spinics.net/lists/linux-bcache/msg02669.html Patched vanilla-sources-3.19.0 survived afternoon tests. Load is still high, but it's probably not fall. Thanks at least for that.
Is this still an issue with later kernels?
(In reply to Mike Pagano from comment #8) > Is this still an issue with later kernels? No change. On unpatched vanilla-sources-4.0.5: ----------------------------------------------------------- [ 1292.954002] INFO: rcu_sched self-detected stall on CPU { 0} (t=21000 jiffies g=24909 c=24908 q=5066) [ 1292.954002] Task dump for CPU 0: [ 1292.954002] bcache_gc R running task 0 5403 2 0x00080008 [ 1292.954002] 0000000000000001 ffffffff817d5080 ffffffff811102ff 000000000000614d [ 1292.954002] ffff88003fc13080 ffffffff817d5080 ffffffff817d5080 ffffffff81814fc8 [ 1292.954002] ffffffff8111379a ffffffff81037dee 00000000000013ca 0000012d0a10621a [ 1292.954002] Call Trace: [ 1292.954002] <IRQ> [<ffffffff811102ff>] ? rcu_dump_cpu_stacks+0x7f/0xc0 [ 1292.954002] [<ffffffff8111379a>] ? rcu_check_callbacks+0x3ca/0x6e0 [ 1292.954002] [<ffffffff81037dee>] ? kvm_clock_read+0x1e/0x30 [ 1292.954002] [<ffffffff810f57b4>] ? account_process_tick+0x154/0x170 [ 1292.954002] [<ffffffff81123bb0>] ? tick_sched_handle.isra.18+0x30/0x30 [ 1292.954002] [<ffffffff81115c11>] ? update_process_times+0x31/0x60 [ 1292.954002] [<ffffffff81123bb0>] ? tick_sched_handle.isra.18+0x30/0x30 [ 1292.954002] [<ffffffff81123be3>] ? tick_sched_timer+0x33/0x60 [ 1292.954002] [<ffffffff811161b5>] ? __run_hrtimer.isra.34+0x45/0xc0 [ 1292.954002] [<ffffffff811167e5>] ? hrtimer_interrupt+0xd5/0x210 [ 1292.954002] [<ffffffff8102efe5>] ? smp_apic_timer_interrupt+0x35/0x50 [ 1292.954002] [<ffffffff815dcced>] ? apic_timer_interrupt+0x6d/0x80 [ 1292.954002] <EOI> [<ffffffffa013cf65>] ? bch_extent_bad+0xf5/0x1b0 [bcache] [ 1292.954002] [<ffffffffa013ce9c>] ? bch_extent_bad+0x2c/0x1b0 [bcache] [ 1292.954002] [<ffffffffa0135370>] ? bch_ptr_invalid+0x10/0x10 [bcache] [ 1292.954002] [<ffffffffa0135191>] ? bch_btree_iter_next_filter+0x21/0x50 [bcache] [ 1292.954002] [<ffffffffa0135ad0>] ? btree_gc_count_keys+0x40/0x50 [bcache] [ 1292.954002] [<ffffffffa013a38d>] ? btree_gc_recurse+0x18d/0x2f0 [bcache] [ 1292.954002] [<ffffffffa013519d>] ? bch_btree_iter_next_filter+0x2d/0x50 [bcache] [ 1292.954002] [<ffffffffa0136574>] ? btree_gc_mark_node+0x54/0x220 [bcache] [ 1292.954002] [<ffffffffa0136574>] ? btree_gc_mark_node+0x54/0x220 [bcache] [ 1292.954002] [<ffffffffa013aa6a>] ? bch_btree_gc+0x3ea/0x510 [bcache] [ 1292.954002] [<ffffffff81101e40>] ? wait_woken+0x80/0x80 [ 1292.954002] [<ffffffffa013abc0>] ? bch_gc_thread+0x30/0x110 [bcache] [ 1292.954002] [<ffffffffa013ab90>] ? bch_btree_gc+0x510/0x510 [bcache] [ 1292.954002] [<ffffffffa013ab90>] ? bch_btree_gc+0x510/0x510 [bcache] [ 1292.954002] [<ffffffff810e961c>] ? kthread+0xbc/0xe0 [ 1292.954002] [<ffffffff810e9560>] ? kthread_create_on_node+0x170/0x170 [ 1292.954002] [<ffffffff815dbdd8>] ? ret_from_fork+0x58/0x90 [ 1292.954002] [<ffffffff810e9560>] ? kthread_create_on_node+0x170/0x170 -----------------------------------------------------------
Created attachment 405210 [details, diff] Patch to fix a bug that stops all concurrency writes unconditionally Can you apply this patch against gentoo-sources-4.0.5 and let me know if it helps?
4.1.12 gentoo-sources still affected. I use the patch collection from https://lkml.org/lkml/2015/12/5/38 which also fixes a few other things with bcache. The above message indicates that these patches might finally be merged. BillK
vanilla-sources-4.0.0: [ 1329.423081] INFO: rcu_sched self-detected stall on CPU [ 1329.423089] 2-...: (147005 ticks this GP) idle=ab7/140000000000001/0 softirq=54105/54105 fqs=48983 [ 1329.423091] (t=147006 jiffies g=32655 c=32654 q=26897) [ 1329.423095] Task dump for CPU 2: [ 1329.423096] bcache_gc R running task 0 4160 2 0x00000008 [ 1329.423101] ffffffff81a37700 ffffffff8111b82b ffff88007fb14bc0 ffffffff81a37700 [ 1329.423105] 0000000000000000 ffffffff81a7e0e0 ffffffff8111ecb8 0000000000000082 [ 1329.423108] ffff88007c9beac0 0000000000000000 0000000000000082 ffffffff81126e78 [ 1329.423112] Call Trace: [ 1329.423114] <IRQ> [<ffffffff8111b82b>] ? rcu_dump_cpu_stacks+0x7b/0xb0 [ 1329.423125] [<ffffffff8111ecb8>] ? rcu_check_callbacks+0x3c8/0x690 [ 1329.423128] [<ffffffff81126e78>] ? update_wall_time+0x408/0x690 [ 1329.423131] [<ffffffff8112104d>] ? update_process_times+0x2d/0x50 [ 1329.423134] [<ffffffff8112dfc3>] ? tick_sched_timer+0x33/0x60 [ 1329.423137] [<ffffffff81121604>] ? __hrtimer_run_queues+0xa4/0x120 [ 1329.423139] [<ffffffff81121b74>] ? hrtimer_interrupt+0x94/0x190 [ 1329.423147] [<ffffffffa026e130>] ? bch_ptr_invalid+0x10/0x10 [bcache] [ 1329.423151] [<ffffffff81033c74>] ? smp_apic_timer_interrupt+0x34/0x50 [ 1329.423155] [<ffffffff8164f622>] ? apic_timer_interrupt+0x82/0x90 [ 1329.423156] <EOI> [<ffffffffa026e130>] ? bch_ptr_invalid+0x10/0x10 [bcache] [ 1329.423165] [<ffffffffa026df92>] ? bch_btree_iter_next_filter+0x32/0x40 [bcache] [ 1329.423169] [<ffffffffa026e83e>] ? btree_gc_count_keys+0x3e/0x50 [bcache] [ 1329.423174] [<ffffffffa0272e3c>] ? btree_gc_recurse+0xcc/0x300 [bcache] [ 1329.423178] [<ffffffffa026ef30>] ? btree_gc_mark_node+0x50/0x1c0 [bcache] [ 1329.423181] [<ffffffff8110e681>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [ 1329.423186] [<ffffffffa0273604>] ? bch_btree_gc+0x3f4/0x500 [bcache] [ 1329.423189] [<ffffffff81109d60>] ? wait_woken+0x80/0x80 [ 1329.423194] [<ffffffffa027373d>] ? bch_gc_thread+0x2d/0x110 [bcache] [ 1329.423198] [<ffffffffa0273710>] ? bch_btree_gc+0x500/0x500 [bcache] [ 1329.423202] [<ffffffffa0273710>] ? bch_btree_gc+0x500/0x500 [bcache] [ 1329.423205] [<ffffffff810efaa8>] ? kthread+0xb8/0xd0 [ 1329.423208] [<ffffffff810ef9f0>] ? kthread_create_on_node+0x170/0x170 [ 1329.423210] [<ffffffff8164ec4f>] ? ret_from_fork+0x3f/0x70 [ 1329.423213] [<ffffffff810ef9f0>] ? kthread_create_on_node+0x170/0x170