Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 540178 - kernel >3.12 bcache shows high load and deadlocks
Summary: kernel >3.12 bcache shows high load and deadlocks
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-15 19:04 UTC by Jan Jaros
Modified: 2017-10-11 11:59 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Patch to fix a bug that stops all concurrency writes unconditionally (bcache-rcu-sched-bugfix.patch,2.66 KB, patch)
2015-06-15 23:12 UTC, Mike Pagano
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Jaros 2015-02-15 19:04:01 UTC
When I register "cache device" for bcache, load is increased by 1.
When bcache is used, deadlock appears often, its not possible to write on
that cached disk and output of dmesg says that:

--------------------------------------------------------------
[ 2138.360002] INFO: rcu_sched self-detected stall on CPU { 0}  (t=21000 jiffies g=23827 c=23826 q=2794)
[ 2138.360002] Task dump for CPU 0:
[ 2138.360002] bcache_gc       R  running task        0  3010      2 0x00080008
[ 2138.360002]  0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13
[ 2138.360002]  ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08
[ 2138.360002]  ffffffff81116c68 001dcd6500000000 ffff880272411140 00000000000cachetest ~ # emerge --info
Portage 2.2.14 (python 3.4.1-final-0, default/linux/amd64/13.0, gcc-4.8.3, glibc-2.19-r1, 3.19.0-gentoo x86_64)
=================================================================
System uname: Linux-3.19.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-2400_CPU_@_3.10GHz-with-gentoo-2.2
KiB Mem:     8969520 total,   8830580 free
KiB Swap:          0 total,         0 free
Timestamp of tree: Sat, 14 Feb 2015 20:30:01 +0000
ld GNU ld (Gentoo 2.24 p1.4) 2.24
app-shells/bash:          4.2_p53
dev-lang/perl:            5.20.1-r4
dev-lang/python:          2.7.9-r1, 3.4.1
dev-util/pkgconfig:       0.28-r1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.13.9
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.69
sys-devel/automake:       1.13.4
sys-devel/binutils:       2.24-r3
sys-devel/gcc:            4.8.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4.4
sys-devel/make:           4.0-r1
sys-kernel/linux-headers: 3.16 (virtual/os-headers)
sys-libs/glibc:           2.19-r1
Repositories: gentoo x-portage
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/mnt/data/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LC_ALL="cs_CZ.UTF8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j5 -l5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/mnt/data/portage/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
USE="acl acpi amd64 bash-completion berkdb bzip2 caps cracklib crypt curl cxx dane fam fuse iconv icu inotify ipv6 libnotify lzma lzo mmap mmx mmxext ncurses network nls nptl openssl pam pcre pie rar readline slang sse sse2 sse3 ssl ssse3 syslog tcpd threads udev unicode xattr zlib" ABI_X86="64" CURL_SSL="openssl" ELIBC="glibc" KERNEL="linux" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 arm armeb" QEMU_USER_TARGETS="x86_64 arm armeb" USERLAND="GNU"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON
00096
[ 2138.360002] Call Trace:
[ 2138.360002]  <IRQ>  [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 2138.360002]  [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0
[ 2138.360002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2138.360002]  [<ffffffff811190b1>] ? update_process_times+0x31/0x60
[ 2138.360002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2138.360002]  [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60
[ 2138.360002]  [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 2138.360002]  [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230
[ 2138.360002]  [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 2138.360002]  [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80
[ 2138.360002]  <EOI>  [<ffffffff81523ea5>] ? bch_extent_bad+0xf5/0x1b0
[ 2138.360002]  [<ffffffff81523ddc>] ? bch_extent_bad+0x2c/0x1b0
[ 2138.360002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2138.360002]  [<ffffffff8151c0d1>] ? bch_btree_iter_next_filter+0x21/0x50
[ 2138.360002]  [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50
[ 2138.360002]  [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0
[ 2138.360002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2138.360002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2138.360002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2138.360002]  [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510
[ 2138.360002]  [<ffffffff81105360>] ? wait_woken+0x80/0x80
[ 2138.360002]  [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110
[ 2138.360002]  [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510
[ 2138.360002]  [<ffffffff810ecedc>] ? kthread+0xbc/0xe0
[ 2138.360002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2138.360002]  [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0
[ 2138.360002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2201.363002] INFO: rcu_sched self-detected stall on CPU { 0}  (t=84003 jiffies g=23827 c=23826 q=10658)
[ 2201.363002] Task dump for CPU 0:
[ 2201.363002] bcache_gc       R  running task        0  3010      2 0x00080008
[ 2201.363002]  0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13
[ 2201.363002]  ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08
[ 2201.363002]  ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096
[ 2201.363002] Call Trace:
[ 2201.363002]  <IRQ>  [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 2201.363002]  [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0
[ 2201.363002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2201.363002]  [<ffffffff811190b1>] ? update_process_times+0x31/0x60
[ 2201.363002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2201.363002]  [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60
[ 2201.363002]  [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 2201.363002]  [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230
[ 2201.363002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2201.363002]  [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 2201.363002]  [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80
[ 2201.363002]  <EOI>  [<ffffffff8151b3ae>] ? bch_btree_iter_next+0x9e/0x320
[ 2201.363002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2201.363002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2201.363002]  [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50
[ 2201.363002]  [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0
[ 2201.363002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2201.363002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2201.363002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2201.363002]  [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510
[ 2201.363002]  [<ffffffff81105360>] ? wait_woken+0x80/0x80
[ 2201.363002]  [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110
[ 2201.363002]  [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510
[ 2201.363002]  [<ffffffff810ecedc>] ? kthread+0xbc/0xe0
[ 2201.363002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2201.363002]  [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0
[ 2201.363002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
--------------------------------------------------------------

It happens on kernels higher than 3.12. Tested on gentoo-sources 3.12.35
(works without problems), 3.14.33, 3.17.8-r1 a 3.19.0.

Reproducible: Always

Steps to Reproduce:
pstree && echo \
&& lsblk && echo \
&& w && echo \
&& make-bcache -C /dev/vg/cache1 \
&& echo /dev/vg/cache1 > /sys/fs/bcache/register \
&& echo && sleep 300 && w \
&& echo && sleep 300 && w && echo\
&& make-bcache -C /dev/vg/cache2 \
&& echo /dev/vg/cache2 > /sys/fs/bcache/register \
&& echo && sleep 300 && w \
&& echo && sleep 300 && w
Actual Results:  
init─┬─6*[agetty]
     └─sshd───sshd───bash───pstree

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
fd0           2:0    1     4K  0 disk 
sda           8:0    0    15G  0 disk /
sdb           8:16   0    30G  0 disk 
├─vg-cache1 251:0    0     5G  0 lvm  
└─vg-cache2 251:1    0     5G  0 lvm  
loop0         7:0    0 101,6M  1 loop /usr/portage

 15:39:38 up 2 min,  1 user,  load average: 0,00, 0,00, 0,00
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38    2.00s  0.00s  0.00s w

UUID:                   7648c99a-8762-4a44-a021-4e60d94698f6
Set UUID:               75c20ebf-b8a8-4c6a-a35a-0a00c25709b3
version:                0
nbuckets:               10240
block_size:             1
bucket_size:            1024
nr_in_set:              1
nr_this_dev:            0
first_bucket:           1

 15:44:38 up 7 min,  1 user,  load average: 0,99, 0,63, 0,28
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38    5:02   0.01s  0.00s w

 15:49:38 up 12 min,  1 user,  load average: 1,00, 0,87, 0,48
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38   10:02   0.01s  0.00s w

UUID:                   16c1a207-54de-4c8e-a723-0e13087123c7
Set UUID:               045e53f3-f592-41ec-b404-a845e99b9e87
version:                0
nbuckets:               10240
block_size:             1
bucket_size:            1024
nr_in_set:              1
nr_this_dev:            0
first_bucket:           1

 15:54:39 up 17 min,  1 user,  load average: 1,99, 1,58, 0,90
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38   15:03   0.01s  0.00s w

 15:59:39 up 22 min,  1 user,  load average: 2,00, 1,85, 1,20
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38   20:03   0.01s  0.00s w


Expected Results:  
...
...
...
 15:59:39 up 22 min,  1 user,  load average: 0,00, 0,00, 0,00
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0     15:38   20:03   0.01s  0.00s w


cachetest ~ # emerge --info
Portage 2.2.14 (python 3.4.1-final-0, default/linux/amd64/13.0, gcc-4.8.3, glibc-2.19-r1, 3.19.0-gentoo x86_64)
=================================================================
System uname: Linux-3.19.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-2400_CPU_@_3.10GHz-with-gentoo-2.2
KiB Mem:     8969520 total,   8830580 free
KiB Swap:          0 total,         0 free
Timestamp of tree: Sat, 14 Feb 2015 20:30:01 +0000
ld GNU ld (Gentoo 2.24 p1.4) 2.24
app-shells/bash:          4.2_p53
dev-lang/perl:            5.20.1-r4
dev-lang/python:          2.7.9-r1, 3.4.1
dev-util/pkgconfig:       0.28-r1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.13.9
sys-apps/sandbox:         2.6-r1
sys-devel/autoconf:       2.69
sys-devel/automake:       1.13.4
sys-devel/binutils:       2.24-r3
sys-devel/gcc:            4.8.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4.4
sys-devel/make:           4.0-r1
sys-kernel/linux-headers: 3.16 (virtual/os-headers)
sys-libs/glibc:           2.19-r1
Repositories: gentoo x-portage
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/mnt/data/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LC_ALL="cs_CZ.UTF8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j5 -l5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/mnt/data/portage/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
USE="acl acpi amd64 bash-completion berkdb bzip2 caps cracklib crypt curl cxx dane fam fuse iconv icu inotify ipv6 libnotify lzma lzo mmap mmx mmxext ncurses network nls nptl openssl pam pcre pie rar readline slang sse sse2 sse3 ssl ssse3 syslog tcpd threads udev unicode xattr zlib" ABI_X86="64" CURL_SSL="openssl" ELIBC="glibc" KERNEL="linux" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 arm armeb" QEMU_USER_TARGETS="x86_64 arm armeb" USERLAND="GNU"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON
Comment 1 Jan Jaros 2015-02-15 19:07:57 UTC
I'm sorry, the correct output of dmesg is here:
--------------------------------------------------------------
[ 2138.360002] INFO: rcu_sched self-detected stall on CPU { 0}  (t=21000 jiffies g=23827 c=23826 q=2794)
[ 2138.360002] Task dump for CPU 0:
[ 2138.360002] bcache_gc       R  running task        0  3010      2 0x00080008
[ 2138.360002]  0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13
[ 2138.360002]  ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08
[ 2138.360002]  ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096
[ 2138.360002] Call Trace:
[ 2138.360002]  <IRQ>  [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 2138.360002]  [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0
[ 2138.360002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2138.360002]  [<ffffffff811190b1>] ? update_process_times+0x31/0x60
[ 2138.360002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2138.360002]  [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60
[ 2138.360002]  [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 2138.360002]  [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230
[ 2138.360002]  [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 2138.360002]  [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80
[ 2138.360002]  <EOI>  [<ffffffff81523ea5>] ? bch_extent_bad+0xf5/0x1b0
[ 2138.360002]  [<ffffffff81523ddc>] ? bch_extent_bad+0x2c/0x1b0
[ 2138.360002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2138.360002]  [<ffffffff8151c0d1>] ? bch_btree_iter_next_filter+0x21/0x50
[ 2138.360002]  [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50
[ 2138.360002]  [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0
[ 2138.360002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2138.360002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2138.360002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2138.360002]  [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510
[ 2138.360002]  [<ffffffff81105360>] ? wait_woken+0x80/0x80
[ 2138.360002]  [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110
[ 2138.360002]  [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510
[ 2138.360002]  [<ffffffff810ecedc>] ? kthread+0xbc/0xe0
[ 2138.360002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2138.360002]  [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0
[ 2138.360002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2201.363002] INFO: rcu_sched self-detected stall on CPU { 0}  (t=84003 jiffies g=23827 c=23826 q=10658)
[ 2201.363002] Task dump for CPU 0:
[ 2201.363002] bcache_gc       R  running task        0  3010      2 0x00080008
[ 2201.363002]  0000000000000001 ffffffff81877100 ffffffff81113abf 0000000000005d13
[ 2201.363002]  ffff880272411d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08
[ 2201.363002]  ffffffff81116c68 001dcd6500000000 ffff880272411140 0000000000000096
[ 2201.363002] Call Trace:
[ 2201.363002]  <IRQ>  [<ffffffff81113abf>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 2201.363002]  [<ffffffff81116c68>] ? rcu_check_callbacks+0x368/0x5a0
[ 2201.363002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2201.363002]  [<ffffffff811190b1>] ? update_process_times+0x31/0x60
[ 2201.363002]  [<ffffffff81127050>] ? tick_sched_handle.isra.18+0x30/0x30
[ 2201.363002]  [<ffffffff81127083>] ? tick_sched_timer+0x33/0x60
[ 2201.363002]  [<ffffffff81119695>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 2201.363002]  [<ffffffff81119c67>] ? hrtimer_interrupt+0xd7/0x230
[ 2201.363002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2201.363002]  [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 2201.363002]  [<ffffffff816622bd>] ? apic_timer_interrupt+0x6d/0x80
[ 2201.363002]  <EOI>  [<ffffffff8151b3ae>] ? bch_btree_iter_next+0x9e/0x320
[ 2201.363002]  [<ffffffff8151c2b0>] ? bch_ptr_invalid+0x10/0x10
[ 2201.363002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2201.363002]  [<ffffffff8151ca10>] ? btree_gc_count_keys+0x40/0x50
[ 2201.363002]  [<ffffffff815212cd>] ? btree_gc_recurse+0x18d/0x2f0
[ 2201.363002]  [<ffffffff8151c0dd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 2201.363002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2201.363002]  [<ffffffff8151d4b4>] ? btree_gc_mark_node+0x54/0x220
[ 2201.363002]  [<ffffffff815219aa>] ? bch_btree_gc+0x3ea/0x510
[ 2201.363002]  [<ffffffff81105360>] ? wait_woken+0x80/0x80
[ 2201.363002]  [<ffffffff81521b00>] ? bch_gc_thread+0x30/0x110
[ 2201.363002]  [<ffffffff81521ad0>] ? bch_btree_gc+0x510/0x510
[ 2201.363002]  [<ffffffff810ecedc>] ? kthread+0xbc/0xe0
[ 2201.363002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
[ 2201.363002]  [<ffffffff8166137c>] ? ret_from_fork+0x7c/0xb0
[ 2201.363002]  [<ffffffff810ece20>] ? kthread_create_on_node+0x170/0x170
--------------------------------------------------------------
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2015-02-16 09:05:15 UTC
1) How is this a Gentoo specific problem?
2) Are you using it correctly?
Comment 3 Jan Hudoba 2015-02-16 11:06:08 UTC
1) it looks not like gentoo specific problem

try this patch in bcache mailing list:
http://www.spinics.net/lists/linux-bcache/msg02669.html

--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1741,6 +1741,7 @@ static void bch_btree_gc(struct cache_set *c)
        do {
                ret = btree_root(gc_root, c, &op, &writes, &stats);
                closure_sync(&writes);
+               cond_resched();

                if (ret && ret != -EAGAIN)
                        pr_warn("gc failed!");
--


for me it solves similar problem with bcache unsuitability.
bcache maintainer looks busy. maybe it should be included in genpatches.
Comment 4 Jan Jaros 2015-02-16 11:20:38 UTC
(In reply to Jeroen Roovers from comment #2)
> 1) How is this a Gentoo specific problem?
> 2) Are you using it correctly?

1) i do not know, I did not try it elsewhere. High load is also reflected with vanilla-sources-3.19.0 (and during the test is started only kernel, init ssh and agetty), so it probably will be general.
2) see https://www.kernel.org/doc/Documentation/bcache.txt
'make-bcache -C /dev/device && echo /dev/device > /sys/fs/bcache/register' is a standard procedure. 

PS: Excuse my poor English :-)
Comment 5 Jan Jaros 2015-02-16 11:21:12 UTC
(In reply to Jan Hudoba from comment #3)
> 1) it looks not like gentoo specific problem
> 
> try this patch in bcache mailing list:
> http://www.spinics.net/lists/linux-bcache/msg02669.html

On gentoo-sources-3.19.0 with this patch the problem persists ...:-(
Comment 6 Jan Jaros 2015-02-16 12:10:33 UTC
> 1) i do not know, I did not try it elsewhere. High load is also reflected
> with vanilla-sources-3.19.0 (and during the test is started only kernel,
> init ssh and agetty), so it probably will be general.

vanilla-sources-3.19.0 also ends with deadlock:
-----------------------------------------------------------
[ 7179.873002] INFO: rcu_sched self-detected stall on CPU { 1}  (t=273016 jiffies g=51336 c=51335 q=35664)
[ 7179.873002] Task dump for CPU 1:
[ 7179.873002] bcache_gc       R  running task        0  3007      2 0x00080008
[ 7179.873002]  0000000000000002 ffffffff81877100 ffffffff81113a7f 000000000000c888
[ 7179.873002]  ffff880272511d80 ffffffff81877100 ffffffff81877100 ffffffff818bbb08
[ 7179.873002]  ffffffff81116c28 0000000000000001 ffff880272511140 ffff880272503e80
[ 7179.873002] Call Trace:
[ 7179.873002]  <IRQ>  [<ffffffff81113a7f>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 7179.873002]  [<ffffffff81116c28>] ? rcu_check_callbacks+0x368/0x5a0
[ 7179.873002]  [<ffffffff81127010>] ? tick_sched_handle.isra.18+0x30/0x30
[ 7179.873002]  [<ffffffff81119071>] ? update_process_times+0x31/0x60
[ 7179.873002]  [<ffffffff81127043>] ? tick_sched_timer+0x33/0x60
[ 7179.873002]  [<ffffffff81119655>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 7179.873002]  [<ffffffff81119c27>] ? hrtimer_interrupt+0xd7/0x230
[ 7179.873002]  [<ffffffff8151c1b0>] ? bch_ptr_invalid+0x10/0x10
[ 7179.873002]  [<ffffffff8102e6f5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 7179.873002]  [<ffffffff816621bd>] ? apic_timer_interrupt+0x6d/0x80
[ 7179.873002]  <EOI>  [<ffffffff8151b255>] ? bch_btree_iter_next+0x45/0x320
[ 7179.873002]  [<ffffffff8151c1b0>] ? bch_ptr_invalid+0x10/0x10
[ 7179.873002]  [<ffffffff8151bfdd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 7179.873002]  [<ffffffff8151c910>] ? btree_gc_count_keys+0x40/0x50
[ 7179.873002]  [<ffffffff815211cd>] ? btree_gc_recurse+0x18d/0x2f0
[ 7179.873002]  [<ffffffff8151bfdd>] ? bch_btree_iter_next_filter+0x2d/0x50
[ 7179.873002]  [<ffffffff8151d3b4>] ? btree_gc_mark_node+0x54/0x220
[ 7179.873002]  [<ffffffff8151d3b4>] ? btree_gc_mark_node+0x54/0x220
[ 7179.873002]  [<ffffffff8152264e>] ? closure_sync+0x7e/0x90
[ 7179.873002]  [<ffffffff815218aa>] ? bch_btree_gc+0x3ea/0x510
[ 7179.873002]  [<ffffffff81105320>] ? wait_woken+0x80/0x80
[ 7179.873002]  [<ffffffff81521a00>] ? bch_gc_thread+0x30/0x110
[ 7179.873002]  [<ffffffff815219d0>] ? bch_btree_gc+0x510/0x510
[ 7179.873002]  [<ffffffff810ece9c>] ? kthread+0xbc/0xe0
[ 7179.873002]  [<ffffffff810ecde0>] ? kthread_create_on_node+0x170/0x170
[ 7179.873002]  [<ffffffff8166127c>] ? ret_from_fork+0x7c/0xb0
[ 7179.873002]  [<ffffffff810ecde0>] ? kthread_create_on_node+0x170/0x170
-----------------------------------------------------------

My test environment with RAMdisk (problem appeared even when using SSD):
-----------------------------------------------------------
mount -t tmpfs -o nodev,nosuid,size=6G none /mnt/tmp/
dd if=/dev/zero of=/mnt/tmp/loop.bin bs=1M count=6100
losetup -f /mnt/tmp/loop.bin
losetup -a

make-bcache -C /dev/loop1
make-bcache -B /dev/vg/data
echo /dev/loop1 > /sys/fs/bcache/register
echo /dev/vg/data > /sys/fs/bcache/register
echo c8a360ca-bc1e-4b77-b50b-47b00c5b3166 > /sys/block/bcache0/bcache/attach
echo writeback > /sys/block/bcache0/bcache/cache_mode

mkfs -t ext4 /dev/bcache0
mount -o relatime /dev/bcache0 /mnt/gentoo/
cd /mnt/gentoo

echo "[global]
randrepeat=1
ioengine=libaio
bs=512
ba=512
size=512M
#size=128M
direct=1
gtod_reduce=1
norandommap
iodepth=64

[mix]
rw=randrw
#rw=randwrite
stonewall
timeout=30" > ../fio.config

while true; do fio ../fio.config && sync && rm mix.1.0 && sleep 10; done
-----------------------------------------------------------

... and leave a few hours to run.
Comment 7 Jan Jaros 2015-02-16 18:29:15 UTC
(In reply to Jan Hudoba from comment #3)
> try this patch in bcache mailing list:
> http://www.spinics.net/lists/linux-bcache/msg02669.html

Patched vanilla-sources-3.19.0 survived afternoon tests. Load is still high, but it's probably not fall. Thanks at least for that.
Comment 8 Mike Pagano gentoo-dev 2015-06-13 00:04:28 UTC
Is this still an issue with later kernels?
Comment 9 Jan Jaros 2015-06-13 20:09:42 UTC
(In reply to Mike Pagano from comment #8)
> Is this still an issue with later kernels?

No change. On unpatched vanilla-sources-4.0.5:
-----------------------------------------------------------
[ 1292.954002] INFO: rcu_sched self-detected stall on CPU { 0}  (t=21000 jiffies g=24909 c=24908 q=5066)
[ 1292.954002] Task dump for CPU 0:
[ 1292.954002] bcache_gc       R  running task        0  5403      2 0x00080008
[ 1292.954002]  0000000000000001 ffffffff817d5080 ffffffff811102ff 000000000000614d
[ 1292.954002]  ffff88003fc13080 ffffffff817d5080 ffffffff817d5080 ffffffff81814fc8
[ 1292.954002]  ffffffff8111379a ffffffff81037dee 00000000000013ca 0000012d0a10621a
[ 1292.954002] Call Trace:
[ 1292.954002]  <IRQ>  [<ffffffff811102ff>] ? rcu_dump_cpu_stacks+0x7f/0xc0
[ 1292.954002]  [<ffffffff8111379a>] ? rcu_check_callbacks+0x3ca/0x6e0
[ 1292.954002]  [<ffffffff81037dee>] ? kvm_clock_read+0x1e/0x30
[ 1292.954002]  [<ffffffff810f57b4>] ? account_process_tick+0x154/0x170
[ 1292.954002]  [<ffffffff81123bb0>] ? tick_sched_handle.isra.18+0x30/0x30
[ 1292.954002]  [<ffffffff81115c11>] ? update_process_times+0x31/0x60
[ 1292.954002]  [<ffffffff81123bb0>] ? tick_sched_handle.isra.18+0x30/0x30
[ 1292.954002]  [<ffffffff81123be3>] ? tick_sched_timer+0x33/0x60
[ 1292.954002]  [<ffffffff811161b5>] ? __run_hrtimer.isra.34+0x45/0xc0
[ 1292.954002]  [<ffffffff811167e5>] ? hrtimer_interrupt+0xd5/0x210
[ 1292.954002]  [<ffffffff8102efe5>] ? smp_apic_timer_interrupt+0x35/0x50
[ 1292.954002]  [<ffffffff815dcced>] ? apic_timer_interrupt+0x6d/0x80
[ 1292.954002]  <EOI>  [<ffffffffa013cf65>] ? bch_extent_bad+0xf5/0x1b0 [bcache]
[ 1292.954002]  [<ffffffffa013ce9c>] ? bch_extent_bad+0x2c/0x1b0 [bcache]
[ 1292.954002]  [<ffffffffa0135370>] ? bch_ptr_invalid+0x10/0x10 [bcache]
[ 1292.954002]  [<ffffffffa0135191>] ? bch_btree_iter_next_filter+0x21/0x50 [bcache]
[ 1292.954002]  [<ffffffffa0135ad0>] ? btree_gc_count_keys+0x40/0x50 [bcache]
[ 1292.954002]  [<ffffffffa013a38d>] ? btree_gc_recurse+0x18d/0x2f0 [bcache]
[ 1292.954002]  [<ffffffffa013519d>] ? bch_btree_iter_next_filter+0x2d/0x50 [bcache]
[ 1292.954002]  [<ffffffffa0136574>] ? btree_gc_mark_node+0x54/0x220 [bcache]
[ 1292.954002]  [<ffffffffa0136574>] ? btree_gc_mark_node+0x54/0x220 [bcache]
[ 1292.954002]  [<ffffffffa013aa6a>] ? bch_btree_gc+0x3ea/0x510 [bcache]
[ 1292.954002]  [<ffffffff81101e40>] ? wait_woken+0x80/0x80
[ 1292.954002]  [<ffffffffa013abc0>] ? bch_gc_thread+0x30/0x110 [bcache]
[ 1292.954002]  [<ffffffffa013ab90>] ? bch_btree_gc+0x510/0x510 [bcache]
[ 1292.954002]  [<ffffffffa013ab90>] ? bch_btree_gc+0x510/0x510 [bcache]
[ 1292.954002]  [<ffffffff810e961c>] ? kthread+0xbc/0xe0
[ 1292.954002]  [<ffffffff810e9560>] ? kthread_create_on_node+0x170/0x170
[ 1292.954002]  [<ffffffff815dbdd8>] ? ret_from_fork+0x58/0x90
[ 1292.954002]  [<ffffffff810e9560>] ? kthread_create_on_node+0x170/0x170
-----------------------------------------------------------
Comment 10 Mike Pagano gentoo-dev 2015-06-15 23:12:24 UTC
Created attachment 405210 [details, diff]
Patch to fix a bug that stops all concurrency writes unconditionally

Can you apply this patch against gentoo-sources-4.0.5 and let me know if it helps?
Comment 11 Bill Kenworthy 2015-12-17 22:13:47 UTC
4.1.12 gentoo-sources still affected. I use the patch collection from https://lkml.org/lkml/2015/12/5/38 which also fixes a few other things with bcache.

The above message indicates that these patches might finally be merged.

BillK
Comment 12 Jan Jaros 2016-01-11 22:39:30 UTC
vanilla-sources-4.0.0:

[ 1329.423081] INFO: rcu_sched self-detected stall on CPU
[ 1329.423089]  2-...: (147005 ticks this GP) idle=ab7/140000000000001/0 softirq=54105/54105 fqs=48983 
[ 1329.423091]   (t=147006 jiffies g=32655 c=32654 q=26897)
[ 1329.423095] Task dump for CPU 2:
[ 1329.423096] bcache_gc       R  running task        0  4160      2 0x00000008
[ 1329.423101]  ffffffff81a37700 ffffffff8111b82b ffff88007fb14bc0 ffffffff81a37700
[ 1329.423105]  0000000000000000 ffffffff81a7e0e0 ffffffff8111ecb8 0000000000000082
[ 1329.423108]  ffff88007c9beac0 0000000000000000 0000000000000082 ffffffff81126e78
[ 1329.423112] Call Trace:
[ 1329.423114]  <IRQ>  [<ffffffff8111b82b>] ? rcu_dump_cpu_stacks+0x7b/0xb0
[ 1329.423125]  [<ffffffff8111ecb8>] ? rcu_check_callbacks+0x3c8/0x690
[ 1329.423128]  [<ffffffff81126e78>] ? update_wall_time+0x408/0x690
[ 1329.423131]  [<ffffffff8112104d>] ? update_process_times+0x2d/0x50
[ 1329.423134]  [<ffffffff8112dfc3>] ? tick_sched_timer+0x33/0x60
[ 1329.423137]  [<ffffffff81121604>] ? __hrtimer_run_queues+0xa4/0x120
[ 1329.423139]  [<ffffffff81121b74>] ? hrtimer_interrupt+0x94/0x190
[ 1329.423147]  [<ffffffffa026e130>] ? bch_ptr_invalid+0x10/0x10 [bcache]
[ 1329.423151]  [<ffffffff81033c74>] ? smp_apic_timer_interrupt+0x34/0x50
[ 1329.423155]  [<ffffffff8164f622>] ? apic_timer_interrupt+0x82/0x90
[ 1329.423156]  <EOI>  [<ffffffffa026e130>] ? bch_ptr_invalid+0x10/0x10 [bcache]
[ 1329.423165]  [<ffffffffa026df92>] ? bch_btree_iter_next_filter+0x32/0x40 [bcache]
[ 1329.423169]  [<ffffffffa026e83e>] ? btree_gc_count_keys+0x3e/0x50 [bcache]
[ 1329.423174]  [<ffffffffa0272e3c>] ? btree_gc_recurse+0xcc/0x300 [bcache]
[ 1329.423178]  [<ffffffffa026ef30>] ? btree_gc_mark_node+0x50/0x1c0 [bcache]
[ 1329.423181]  [<ffffffff8110e681>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 1329.423186]  [<ffffffffa0273604>] ? bch_btree_gc+0x3f4/0x500 [bcache]
[ 1329.423189]  [<ffffffff81109d60>] ? wait_woken+0x80/0x80
[ 1329.423194]  [<ffffffffa027373d>] ? bch_gc_thread+0x2d/0x110 [bcache]
[ 1329.423198]  [<ffffffffa0273710>] ? bch_btree_gc+0x500/0x500 [bcache]
[ 1329.423202]  [<ffffffffa0273710>] ? bch_btree_gc+0x500/0x500 [bcache]
[ 1329.423205]  [<ffffffff810efaa8>] ? kthread+0xb8/0xd0
[ 1329.423208]  [<ffffffff810ef9f0>] ? kthread_create_on_node+0x170/0x170
[ 1329.423210]  [<ffffffff8164ec4f>] ? ret_from_fork+0x3f/0x70
[ 1329.423213]  [<ffffffff810ef9f0>] ? kthread_create_on_node+0x170/0x170
Comment 13 Mike Pagano gentoo-dev 2016-12-30 22:44:32 UTC
Is this still an issue with later kernels?