I am hitting this bug https://patchwork.kernel.org/patch/10204381/ at several machines with blk-mq enabled gentoo-sources-4.16.6. Seems that the bug is not yet fixed upstream. The latest patch is https://patchwork.kernel.org/patch/10333887/. [ 5.611081] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 5.611093] IP: scsi_times_out+0x17/0x150 [ 5.611095] PGD 0 P4D 0 [ 5.611100] Oops: 0000 [#1] SMP PTI [ 5.611102] Modules linked in: bbswitch(O) iwlmvm x86_pkg_temp_thermal iwlwifi nouveau ttm [ 5.611111] CPU: 3 PID: 26 Comm: kworker/3:0H Tainted: G O 4.16.6-gentoo #1 [ 5.611113] Hardware name: LENOVO 20ANCTO1WW/20ANCTO1WW, BIOS GLET78WW (2.32 ) 03/03/2015 [ 5.611119] Workqueue: kblockd blk_mq_timeout_work [ 5.611122] RIP: 0010:scsi_times_out+0x17/0x150 [ 5.611124] RSP: 0018:ffffc900019e3d78 EFLAGS: 00010246 [ 5.611127] RAX: 0000000000000000 RBX: ffff88042a7cd100 RCX: 0000000000000000 [ 5.611129] RDX: ffffffff820db520 RSI: 0000000000000000 RDI: ffff88042a7cd100 [ 5.611131] RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 [ 5.611132] R10: 0000000000000000 R11: 0000000000000326 R12: ffff88042a7cd250 [ 5.611134] R13: ffff88042a4b3d80 R14: 0000000000000003 R15: 0000000000000004 [ 5.611137] FS: 0000000000000000(0000) GS:ffff88043e2c0000(0000) knlGS:0000000000000000 [ 5.611139] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.611141] CR2: 0000000000000000 CR3: 000000000360a005 CR4: 00000000001606e0 [ 5.611143] Call Trace: [ 5.611148] blk_mq_terminate_expired+0x45/0x90 [ 5.611152] bt_iter+0x43/0x50 [ 5.611156] blk_mq_queue_tag_busy_iter+0x177/0x280 [ 5.611159] ? blk_mq_add_to_requeue_list+0xc0/0xc0 [ 5.611162] ? blk_mq_add_to_requeue_list+0xc0/0xc0 [ 5.611165] blk_mq_timeout_work+0xcf/0x1b0 [ 5.611170] process_one_work+0x1b4/0x3e0 [ 5.611173] worker_thread+0x26/0x3c0 [ 5.611176] ? trace_event_raw_event_workqueue_execute_start+0xa0/0xa0 [ 5.611181] kthread+0x10e/0x130 [ 5.611185] ? kthread_create_worker_on_cpu+0x40/0x40 [ 5.611189] ret_from_fork+0x35/0x40 [ 5.611192] Code: ff 0f 0b e9 25 ff ff ff 66 90 66 2e 0f 1f 84 00 00 00 00 00 41 55 41 54 4c 8d a7 50 01 00 00 55 53 48 8b 87 88 01 00 00 48 89 fb <48> 8b 28 8b 05 e8 8b e4 00 85 c0 0f 8f df 00 00 00 83 bd 2c 01 [ 5.611229] RIP: scsi_times_out+0x17/0x150 RSP: ffffc900019e3d78 [ 5.611230] CR2: 0000000000000000
Hi, Did that patch work for you? I see some other scsi patches coming out in stable in the next few days. You can either test that one you referenced or wait a bit. Mike
I have not tested the patch. Furthermore, I was unable to reproduce this bug recently. I have not noticed the bug on any machine, the last 2 weeks or so. It was quite frequent a month ago (every second boot or so). I just picket one machine here an examined the logs: It did not happened for 7 boots in a row on 4.16.6. After that, I have upgraded to gentoo-sources 4.16.11 and have not seen the error since (about 10 boots). Well lovely race conditions… I am unsure if this is fixed for 4.6.11.
Ok, can you reopen this if you see it again, please?
(In reply to Mike Pagano from comment #3) > Ok, can you reopen this if you see it again, please? yes