Finally, all those options under 'kernel hacking' had to produce something useful, besides inflating my kernel from 3 to 10 MB! :-) (...but why does it all happen to me? now you know why I chose this nickname! :-)) I am struggling to attach some external disks to a JMicron USB hub identified by: kernel: usb 4-1.4: New USB device found, idVendor=152d, idProduct=3569 kernel: usb 4-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 kernel: usb 4-1.4: Product: USB to ATA/ATAPI Bridge kernel: usb 4-1.4: Manufacturer: JMicron I already found out (but not tested yet!) that this hub obviously needs an entry UNUSUAL_DEV(0x152d, 0x3569, 0x0000, 0x9999, "JMicron", "JMS566", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_BROKEN_FUA | US_FL_NO_REPORT_OPCODES), in both drivers/usb/storage/unusual_devs.h and drivers/usb/storage/unusual_uas.h to remedy 'invalid field in cdb' errors like the following: kernel: sd 9:0:0:0: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 9:0:0:0: [sdj] tag#0 Sense Key : Illegal Request [current] kernel: sd 9:0:0:0: [sdj] tag#0 Add. Sense: Invalid field in cdb kernel: sd 9:0:0:0: [sdj] tag#0 CDB: Write(10) 2a 08 00 00 00 3f 00 00 08 00 kernel: blk_update_request: critical target error, dev sdj, sector 63 kernel: Buffer I/O error on dev sdj1, logical block 0, lost sync page write kernel: EXT4-fs (sdj1): mounted filesystem with ordered data mode. Opts: (null) kernel: sd 9:0:0:0: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 9:0:0:0: [sdj] tag#0 Sense Key : Illegal Request [current] kernel: sd 9:0:0:0: [sdj] tag#0 Add. Sense: Invalid field in cdb kernel: sd 9:0:0:0: [sdj] tag#0 CDB: Write(10) 2a 08 74 6c 04 67 00 00 08 00 kernel: blk_update_request: critical target error, dev sdj, sector 1953236071 kernel: blk_update_request: critical target error, dev sdj, sector 1953236071 ...but this is *not* what I want to report here. What I want to report is a consequence of the above: I had to disconnect the external disks from the hub, disconnect the hub from the laptop, then insert the hub again, insert the first disk (went O.K.), insert the second...this did not show any reaction. What it *did* show, however, was this in the kernel log: kernel: ================================================================================ kernel: UBSAN: Undefined behaviour in drivers/scsi/scsicam.c:173:29 kernel: signed integer overflow: kernel: 62015235 * 63 cannot be represented in type 'int' kernel: CPU: 0 PID: 14131 Comm: fdisk Tainted: P O 4.9.25-gentoo #4 ... kernel: d6629cec d1f444f2 00000007 d6629d1c 0000003f d6629cfc d1fc8ffe d6629cfc kernel: d3037320 d6629d80 d1fc934b d28b15c0 d6629d20 0000002a d6629d48 d3037320 kernel: 0000002a 00003202 31303236 35333235 ecd1f900 ecd1f9a8 d6629d5c d189d121 kernel: Call Trace: kernel: [<d1f444f2>] dump_stack+0x59/0x87 kernel: [<d1fc8ffe>] ubsan_epilogue+0xe/0x40 kernel: [<d1fc934b>] handle_overflow+0xbb/0xf0 kernel: [<d189d121>] ? do_read_cache_page+0x71/0x570 kernel: [<d19fd000>] ? blkdev_readpages+0x20/0x20 kernel: [<d189d646>] ? read_cache_page+0x26/0x50 kernel: [<d1fc93d2>] __ubsan_handle_mul_overflow+0x12/0x20 kernel: [<d224bbf7>] scsi_partsize+0x217/0x2e0 kernel: [<d224bd06>] scsicam_bios_param+0x46/0x380 kernel: [<d2299604>] sd_getgeo+0x174/0x2d0 kernel: [<d1f02c91>] blkdev_ioctl+0x251/0x12c0 kernel: [<d19fd31c>] block_ioctl+0x4c/0xb0 kernel: [<d19ab140>] do_vfs_ioctl+0xc0/0xdf0 kernel: [<d19c7e13>] ? mntput+0x23/0x60 kernel: [<d1987c99>] ? __fput+0x1e9/0x4e0 kernel: [<d1987fd8>] ? ____fput+0x8/0x10 kernel: [<d16d9520>] ? task_work_run+0x60/0xd0 kernel: [<d19abe9e>] SyS_ioctl+0x2e/0x60 kernel: [<d1602c0d>] do_fast_syscall_32+0x11d/0x550 kernel: [<d19abe70>] ? do_vfs_ioctl+0xdf0/0xdf0 kernel: [<d265940a>] sysenter_past_esp+0x47/0x75 kernel: ================================================================================ WOW! An integer overflow in kernel! Nice gem... IMO this should be reported ASAP upstream. It might even have security implications. Please do it for me, as I am not on the kernel mailing list, or similar forum. Feel free to point here. Thank you.
...and yes, the above UBSAN message came as a result of me typing 'fdisk' in an attempt to see if the second disk on the hub was 'seen' by the system at all.
last time drivers/scsi/scsicam.c as been updated looks like 2014-11-12 scsi: PC partition tables are little endian so i'm wondering if is reproducible also on 4.9.27 or on other kernels ?
I don't know about (and don't plan to test) other kernels. Maybe someone else would like to step in. This error doesn't seem to appear on normal circumstances, when USB operations are stable. A possible scenario seems to be the following: 1. Something strange happens (often as a consequence of rsync'ing huge (say, of the order of 1TB) directories, from USB to USB, with millions of files. At some point, this will throw the above-mentioned kernel: sd 9:0:0:0: [sdj] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE kernel: sd 9:0:0:0: [sdj] tag#0 Sense Key : Illegal Request [current] kernel: sd 9:0:0:0: [sdj] tag#0 Add. Sense: Invalid field in cdb error. Your only option is to disconnect your USB disks, since they become read-only. 2. Trying to attach the hub and disks next time, you may still get errors saying that the filesystem journal on one disk or another cannot be replayed. 3. Disconnecting and reconnecting the USB devices (hub and disks) at this point has good chances to reproduce the "undefined behaviour" of the bug, especially when the system does not seem to react to USB events like the insertion of the hub's USB cable into the host. Note that (as far as I understand) for 1) to happen you have to have a hub with a chip that belongs to the 'unusual devices' in the sense of UNUSUAL_DEV(0x152d, 0x3569, 0x0000, 0x9999, "JMicron", "JMS566", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_BROKEN_FUA | US_FL_NO_REPORT_OPCODES), (see my first post) but *without* this entry in drivers/usb/storage/unusual_devs.h and drivers/usb/storage/unusual_uas.h so 1) has a chance of being reproduced. Maybe something has an uninitialized, unusually high, value due to 1) and 2), and causes the bug. I hope this helps a bit to hunt it down.
reported upstream
Fix is upstream and available in kernels >=4.13 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/memcontrol.c?id=6a1a8b80728c3ae327a82a6cd772e0d554eebf2e