Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 685262

Summary: sys-kernel/gentoo-sources-4.14.114-r1: bcache.cached_dev_detach_finish : invalid opcode: 0000 [#1] SMP PTI, 'BUG_ON(atomic_read(&dc->count));'
Product: Gentoo Linux Reporter: grzybowskik
Component: Current packagesAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED NEEDINFO    
Severity: major    
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: detailed_configuration_and_traceback
bcache patch bundle

Description grzybowskik 2019-05-07 10:36:57 UTC
Created attachment 575462 [details]
detailed_configuration_and_traceback

This kernel module BUG is triggered on manual cached dev detach.

Tested kernels 4.14.101 - 4.14.114

To test create bcached backing and caching device attach and attempt to detach.
make-bcache -B /dev/sda
make-bcache -C /dev/sdb
echo cdev.uuid > /sys/block/bcache0/bcache/attach
echo cdev.uuid > /sys/block/bcache0/bcache/detach

To improve logs, enabled dynamic_debug by writing 
echo  'options bcache dyndbg=+pt' > /etc/modprobe.d/bcache.conf

in dmesg following BUG will appear:

kernel: [10279] bcache: __write_super() ver 1, flags 2305843009213693952, seq 0
kernel: ------------[ cut here ]------------
kernel: Kernel BUG at ffffffffa0784f25 [verbose debug info unavailable]
kernel: invalid opcode: 0000 [#1] SMP PTI
kernel: Modules linked in: drbd lru_cache iptable_filter ip_tables nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc mlx4_ib ib_core mlx4_en intel_rapl x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass intel_cstate iTCO_wdt iTCO_vendor_support mlx4_core igb input_leds bcac
he devlink intel_rapl_perf pcspkr led_class i2c_i801 ptp lpc_ich joydev ioatdma pps_core ipmi_si dca ipmi_devintf rtc_cmos ipmi_msghandler pcc_cpufreq acpi_pad sch_fq_codel btrfs xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq dm_crypt usbhid mxm_wmi crc32c_intel arcmsr ahci ast xhci_pci libahci ehci
_pci ttm ehci_hcd xhci_hcd libata wmi button dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 27 PID: 295 Comm: kworker/27:1 Not tainted 4.14.114-gentoo-r1-201905 #1
kernel: Hardware name: Supermicro X10DRH LN4/X10DRH-ILN4, BIOS 2.0 01/30/2016
kernel: Workqueue: events cached_dev_detach_finish [bcache]
kernel: task: ffff88885cf84440 task.stack: ffffc90008a48000
kernel: RIP: 0010:cached_dev_detach_finish+0x55/0x1b0 [bcache]
kernel: RSP: 0018:ffffc90008a4bdf8 EFLAGS: 00010286                                                                                                                                                                                                                                        
kernel: RAX: 00000000ffffffff RBX: ffff88905c200af0 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: ffff88905c200af8 RDI: ffffc90008a4be48
kernel: RBP: ffff88905c200000 R08: 0000000000000001 R09: 0000000000000001
kernel: R10: ffffc90008a4be30 R11: 00273bdd233be0d4 R12: ffffc90008a4bdf8
kernel: R13: 0000000000000000 R14: ffff88885fc60880 R15: 0000000000000000
kernel: FS:  0000000000000000(0000) GS:ffff88885fc40000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000055f9fae819f8 CR3: 000000000220a004 CR4: 00000000003606e0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  process_one_work+0x1cf/0x370
kernel:  worker_thread+0x208/0x380
kernel:  ? process_one_work+0x370/0x370
kernel:  kthread+0x11d/0x130
kernel:  ? kthread_destroy_worker+0x40/0x40
kernel:  ret_from_fork+0x35/0x40
kernel: Code: 89 44 24 70 31 c0 49 89 e4 4c 89 e7 f3 48 ab c7 44 24 28 01 00 00 a0 48 8b 83 d0 f5 ff ff a8 02 75 02 0f 0b 8b 43 f8 85 c0 74 02 <0f> 0b 48 c7 c7 00 1d 7a a0 e8 7d 37 0f e1 48 8d 7b 68 e8 54 91 
kernel: RIP: cached_dev_detach_finish+0x55/0x1b0 [bcache] RSP: ffffc90008a4bdf8
kernel: ---[ end trace e213d6c6632e8c72 ]---

In attachment included kernel config, traceback, GDB_disassemble, emerge --info
Comment 1 grzybowskik 2019-05-15 11:02:45 UTC
Created attachment 576770 [details]
bcache patch bundle

Including bundle of 15 patches to fix bcache in kernel based on version 4.14.114/115/116/117/118/119

├── 1001.bcache_avoid_nested_58f913dce281.patch
├── 1002.bcache_fix_comment_type_b1e8139e48b5.patch
├── 1003.bcache_rewrite_multipart_1dbe32ad0a82.patch
├── 1007.bcache_dont_writeback_failed_io_5fa89fb9a86b.patch
├── 1015.bcache_convert_cacheed_dev_atomi_to_refcount_3b304d24a718.patch
├── 1016.bcache_update_bucket_real_time_d44c2f9e7cc0.patch
├── 1019.bcache_4.15_backport_writeback.patch
├── 1020.bcache_convert_timers_8376d3c1f989.patch
├── 1021.bcache_journal_comment_bb22cafd7568.patch
├── 1025.bcache_use_PTR_ERR_OR_ZERO_9d13411784e2.patch
├── 1028.bcache_fix_wrong_return_bch_debug_init_539d39eb2708.patch
├── 1031.bcache_improve_efficiency_closure_e4bf791937d8.patch
├── 1033.bcache_reduce_cache_set_device_iteration_2831231d4c3f.patch
└── 1036.bcache_closure_move_control_bits_3609c471a1b8.patch

patches where created by back-porting from linux kernel git tree.

Current state of bcache in kernel 4.14.114/115/116/117/118/119

Create backing and cache device and attach works but we can't detach at all.
I/O errors are not detected which might cause unexpected problems.

Detach functionality could be fixed by applying patch 1019.bcache_4.15_backport_writeback.patch

After that we can detach but first need to write none into cache_module
echo none > /sys/block/bcacheX/bcache/cache_modue

Following to that we need to fix I/O error detection otherwise when we have active I/O operations on caching device and we attempt to detach, then will find broken symlink in /sys/block/bcacheX/bcache

So to sensible fix bcache in kernel 4.14.X we need to apply all above patches.
Above patches where tested on kernel 4.14.115.
After applying we can create backing, cache device, attach, detach, disable, partition with fixed implementation, I/O error detection works and cache device being detach with following messages in dmesg 
"""
bcache: bch_count_io_errors() sdc: IO error on writing data to cache, recovering
bcache: error on ed80d9c1-5fc2-4390-bd9e-f633dd6a3a4f:
journal io error
, disabling caching
bcache: cached_dev_detach_finish() Caching disabled for dm-1
bcache: bch_count_io_errors() sdc: IO error on writing btree, recovering
bcache: cache_set_free() Cache set ed80d9c1-5fc2-4390-bd9e-f633dd6a3a4f unregistered
"""


Tested performance witch file copy and benchmark with FIO tool.
Comment 2 Mike Pagano gentoo-dev 2019-08-01 12:40:52 UTC
All of these patches except for the 4.15 backport are upstream and in current supported kernels.

Do you experience this issue, anymore? 

1001.bcache_avoid_nested_58f913dce281.patch
Commit: 58f913dce2814a9ea7260e93ed3a949e0d5565e3
Date: 2017-10-16 09:07:26 -0600

1002.bcache_fix_comment_type_b1e8139e48b5.patch
Date: 2017-10-16 09:07:26 -0600
Commit: b1e8139e48b58e3bc1234e619c750ffd1394be2f
	
1003.bcache_rewrite_multipart_1dbe32ad0a82.patch
Commit :1dbe32ad0a82f39c6dfb7667c5da5c23b9333664
Date: 2017-10-16 09:07:26 -0600
	
1007.bcache_dont_writeback_failed_io_5fa89fb9a86b.patch
Commit : 5fa89fb9a86bcc0f0b3f21ab6087a8a4170dcd2c
Date: 2017-10-16 09:07:26 -0600

1015.bcache_convert_cacheed_dev_atomi_to_refcount_3b304d24a718.patch
Commit : 3b304d24a718ae779ee9c7f2014dd3b2d0893b70
Date: 2017-10-30 15:57:54 -0600


1016.bcache_update_bucket_real_time_d44c2f9e7cc0.patch
Commit : d44c2f9e7cc0041f0cd88df1fe7a1fceb713ab14 
Date: 2017-10-30 15:57:54 -0600
	
1019.bcache_4.15_backport_writeback.patch


1020.bcache_convert_timers_8376d3c1f989.patch
Commit : 8376d3c1f98988ae7f9e9bc2d1eeeb7d61fd206c
Date: 2017-11-14 20:11:57 -0700
	
1021.bcache_journal_comment_bb22cafd7568.patch
Commit : bb22cafd75686d799dabfe422571fac4b5c2ed94
Date: 2017-11-24 16:22:55 -0700
	
1025.bcache_use_PTR_ERR_OR_ZERO_9d13411784e2.patch
Date: 2018-01-08 13:29:00 -0700
Commit: 9d13411784e27227162857df25ab6817a1db2a73 


1028.bcache_fix_wrong_return_bch_debug_init_539d39eb2708.patch
Date: 2018-01-08 13:29:00 -0700
Commit: 539d39eb27083405b82b9e604e88af01a9a46c63 

1031.bcache_improve_efficiency_closure_e4bf791937d8.patch
Date: 2018-01-08 13:29:00 -0700
Commit: e4bf791937d82afca79e1df4063f72dbc6960ac7 

1033.bcache_reduce_cache_set_device_iteration_2831231d4c3f.patch
Date: 2018-01-08 13:29:00 -0700
Commit: 2831231d4c3f999d2d062b23dfbc8b0faa4bc6e0 

1036.bcache_closure_move_control_bits_3609c471a1b8.patch
Date: 2018-01-09 12:18:51 -0700
Commit: 3609c471a1b86bffc812d8a2f0299892aa11a5e6