Stable systems, x86 and x86-64, smartd won't start under gentoo-sources-6.1.81 during boot, or manually. Rebooting into gentoo-sources-6.1.74 solves problem. Discussed in forum thread: https://forums.gentoo.org/viewtopic-t-1167833.html End of dmesg shows trace: [Tue Mar 12 23:54:41 2024] ------------[ cut here ]------------ [Tue Mar 12 23:54:41 2024] WARNING: CPU: 7 PID: 4020 at drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x3a/0x240 [Tue Mar 12 23:54:41 2024] Modules linked in: uas x86_pkg_temp_thermal mei_hdcp kvm_intel rt2800pci eeprom_93cx6 rt2x00pci rt2800mmio rt2x00mmio rt2800lib kvm crc_ccitt rt2x00lib at24 mac80211 regmap_i2c libarc4 cfg80211 irqbypass mei_me e1000e firewire_ohci firewire_core mei f71882fg coretemp [Tue Mar 12 23:54:41 2024] CPU: 7 PID: 4020 Comm: smartd Not tainted 6.1.81-gentoo #1 [Tue Mar 12 23:54:41 2024] Hardware name: Hewlett-Packard h8-1260t/2AB5, BIOS 7.12 10/12/2011 [Tue Mar 12 23:54:41 2024] RIP: 0010:scsi_execute_cmd+0x3a/0x240 [Tue Mar 12 23:54:41 2024] Code: f4 89 d6 55 44 89 c5 53 48 83 ec 10 48 8b 5c 24 50 48 89 0c 24 48 85 db 0f 84 9b 01 00 00 48 83 3b 00 74 24 83 7b 08 60 74 1e <0f> 0b 41 bd ea ff ff ff 48 83 c4 10 44 89 e8 5b 5d 41 5c 41 5d 41 [Tue Mar 12 23:54:41 2024] RSP: 0018:ffffaa6f0173fcc0 EFLAGS: 00010287 [Tue Mar 12 23:54:41 2024] RAX: ffffaa6f0173fd20 RBX: ffffaa6f0173fd20 RCX: ffff99b3939c9400 [Tue Mar 12 23:54:41 2024] RDX: 0000000000000022 RSI: 0000000000000022 RDI: ffff99b380d45000 [Tue Mar 12 23:54:41 2024] RBP: 0000000000000200 R08: 0000000000000200 R09: 0000000000002710 [Tue Mar 12 23:54:41 2024] R10: ffffaa6f0173fee8 R11: 0000000000000000 R12: ffffaa6f0173fd50 [Tue Mar 12 23:54:41 2024] R13: ffff99b380d45000 R14: 0000000000002710 R15: 0000000000000000 [Tue Mar 12 23:54:41 2024] FS: 00007f1023662480(0000) GS:ffff99b6af5c0000(0000) knlGS:0000000000000000 [Tue Mar 12 23:54:41 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Tue Mar 12 23:54:41 2024] CR2: 00007ffce0074ff8 CR3: 0000000118fd4005 CR4: 00000000000606e0 [Tue Mar 12 23:54:41 2024] Call Trace: [Tue Mar 12 23:54:41 2024] <TASK> [Tue Mar 12 23:54:41 2024] ? scsi_execute_cmd+0x3a/0x240 [Tue Mar 12 23:54:41 2024] ? __warn+0x74/0xc0 [Tue Mar 12 23:54:41 2024] ? scsi_execute_cmd+0x3a/0x240 [Tue Mar 12 23:54:41 2024] ? report_bug+0xe2/0x150 [Tue Mar 12 23:54:41 2024] ? handle_bug+0x3a/0x70 [Tue Mar 12 23:54:41 2024] ? exc_invalid_op+0x13/0x60 [Tue Mar 12 23:54:41 2024] ? asm_exc_invalid_op+0x16/0x20 [Tue Mar 12 23:54:41 2024] ? scsi_execute_cmd+0x3a/0x240 [Tue Mar 12 23:54:41 2024] ? ata_cmd_ioctl+0x1dd/0x2f0 [Tue Mar 12 23:54:41 2024] ata_cmd_ioctl+0x13f/0x2f0 [Tue Mar 12 23:54:41 2024] scsi_ioctl+0x32b/0x900 [Tue Mar 12 23:54:41 2024] ? ioctl_has_perm.constprop.0.isra.0+0xd8/0x140 [Tue Mar 12 23:54:41 2024] ? scsi_block_when_processing_errors+0x1d/0xf0 [Tue Mar 12 23:54:41 2024] blkdev_ioctl+0x100/0x280 [Tue Mar 12 23:54:41 2024] __x64_sys_ioctl+0x88/0xc0 [Tue Mar 12 23:54:41 2024] do_syscall_64+0x38/0x90 [Tue Mar 12 23:54:41 2024] entry_SYSCALL_64_after_hwframe+0x64/0xce [Tue Mar 12 23:54:41 2024] RIP: 0033:0x7f102332c98b [Tue Mar 12 23:54:41 2024] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [Tue Mar 12 23:54:41 2024] RSP: 002b:00007ffce0074990 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Tue Mar 12 23:54:41 2024] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f102332c98b [Tue Mar 12 23:54:41 2024] RDX: 00007ffce0074bf0 RSI: 000000000000031f RDI: 0000000000000003 [Tue Mar 12 23:54:41 2024] RBP: 000055b9c3a9c760 R08: 0000000000000000 R09: 0000000000000000 [Tue Mar 12 23:54:41 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [Tue Mar 12 23:54:41 2024] R13: 00007ffce0075620 R14: 00007ffce0074bf0 R15: 00007ffce0075620 [Tue Mar 12 23:54:41 2024] </TASK> [Tue Mar 12 23:54:41 2024] ---[ end trace 0000000000000000 ]--- Forum moderator Hu suggests the following: "To your specific problem: Normally, the next step would be to try to find the specific commit (not just kernel release) that broke this. I see 1512 commits present in v6.1.81 and not present in v6.1.74. If we assume this was broken upstream, rather than by a Gentoo-specific patch, then we need to find which of those 1512 commits is at fault. A straight git bisect will need log2(1512) =~ 11 steps to find this. However, after the first draft of this post, I did some basic analysis of the faulting function, and I see the WARN introduced in a commit that is present in v6.1.81 and absent in v6.1.74. Therefore, without any attempt to understand what the code is meant to do, I posit that this check is just wrong as committed. The WARN was introduced in scsi: core: Add struct for args to execution functions, which makes it v6.1.80-4-gcf33e6ca12d8. That is, it is the fourth commit added after 6.1.80, so I expect you would find 6.1.80 to be good and 6.1.81 to be bad. It may (or may not) be the case that the corresponding upstream commit was correct as written, due to a change present in the later kernel that is absent in 6.1. That would require either testing a newer kernel series, or getting input from someone who understands this code and can explain what the check was meant to prevent." The discussion is well over my head.
No problems here with gentoo-sources-6.1.81 and smartd Attach your full dmesg and your smartmontools version.
Created attachment 887703 [details] /var/log/dmesg as requested
sys-apps/smartmontools Installed versions: 7.4(11:11:30 PM 03/13/2024)(daemon update-drivedb -caps -selinux -static -systemd) $ sudo /etc/init.d/smartd start Password: * Starting smartd ... * start-stop-daemon: failed to start `/usr/sbin/smartd' * Failed to start smartd [ !! ] * ERROR: smartd failed to start and from boot log: Fri Mar 15 13:24:19 2024: * Starting saned ... Fri Mar 15 13:24:19 2024: [ ok ] Fri Mar 15 13:24:19 2024: * Starting smartd ... Fri Mar 15 13:24:19 2024: * start-stop-daemon: failed to start `/usr/sbin/smartd' Fri Mar 15 13:24:19 2024: * Failed to start smartd Fri Mar 15 13:24:19 2024: [ !! ] Fri Mar 15 13:24:19 2024: * ERROR: smartd failed to start Fri Mar 15 13:24:20 2024: * Starting sshd ... Fri Mar 15 13:24:21 2024: [ ok ]
Is there something in dmesg after you start smartd ? The full dmesg after you start smartd
After trying to start smartd there is nothing relevant in dmesg, just two elogin lines. [Fri Mar 15 13:24:19 2024] </TASK> [Fri Mar 15 13:24:19 2024] ---[ end trace 0000000000000000 ]--- [Fri Mar 15 13:24:27 2024] elogind-daemon[2216]: Removed session 1. [Fri Mar 15 13:24:27 2024] elogind-daemon[2216]: New session 2 of user figueroa.
Created attachment 887704 [details] This is output of dmesg from bash shell. Output of dmesg from bash shell, dmesg > dmesg.txt in attachment.
Is there any syslog output from smartd?
Actually, yes. I'm running metalog and /var/log/everything shows for for the current boot instance: # grep smart current Mar 15 13:24:19 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build) Mar 15 13:24:19 [smartd] Opened configuration file /etc/smartd.conf Mar 15 13:24:19 [smartd] Configuration file /etc/smartd.conf parsed. Mar 15 13:24:19 [smartd] Device: /dev/sda, opened Mar 15 13:24:19 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure Mar 15 13:24:19 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf Mar 15 13:24:19 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting. Mar 15 13:24:19 [smartd] smartd is exiting (exit status 16) Mar 15 13:24:19 [kernel] [ 38.831326] CPU: 7 PID: 4067 Comm: smartd Not tainted 6.1.81-gentoo #1 Mar 15 13:24:19 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd' Mar 15 13:24:19 [/etc/init.d/smartd] ERROR: smartd failed to start Mar 15 13:26:27 [sudo] figueroa : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/etc/init.d/smartd start Mar 15 13:26:27 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build) Mar 15 13:26:27 [smartd] Opened configuration file /etc/smartd.conf Mar 15 13:26:27 [smartd] Configuration file /etc/smartd.conf parsed. Mar 15 13:26:27 [smartd] Device: /dev/sda, opened Mar 15 13:26:27 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure Mar 15 13:26:27 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf Mar 15 13:26:27 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting. Mar 15 13:26:27 [smartd] smartd is exiting (exit status 16) Mar 15 13:26:27 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd' Mar 15 13:26:27 [/etc/init.d/smartd] ERROR: smartd failed to start Mar 15 13:30:23 [sudo] figueroa : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/etc/init.d/smartd start Mar 15 13:30:24 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build) Mar 15 13:30:24 [smartd] Opened configuration file /etc/smartd.conf Mar 15 13:30:24 [smartd] Configuration file /etc/smartd.conf parsed. Mar 15 13:30:24 [smartd] Device: /dev/sda, opened Mar 15 13:30:24 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure Mar 15 13:30:24 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf Mar 15 13:30:24 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting. Mar 15 13:30:24 [smartd] smartd is exiting (exit status 16) Mar 15 13:30:24 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd' Mar 15 13:30:24 [/etc/init.d/smartd] ERROR: smartd failed to start Log entries from 13:24 are during boot, 13:26 is an attempt to start manually (/etc/init.d/smartd start), and 13:30 another start attempt.
Please test with 6.8.1 to see if a fix has been implemented already
Yes, booting with gentoo-sources-6.8.1 does the trick. smartd runs on startup without error and no other apparent errors. Thank you. Feels strange being out here on the bleeding edge. It would be good to know what changed. Was it fixed in a gentoo kernel patch?
(In reply to Andy Figueroa from comment #10) > Yes, booting with gentoo-sources-6.8.1 does the trick. smartd runs on > startup without error and no other apparent errors. Thank you. Feels strange > being out here on the bleeding edge. > > It would be good to know what changed. Was it fixed in a gentoo kernel patch? Nothing from us, glad to see it's working.
Sounds like a regression in the 6.1 stable series though. Probably best to work with upstream to figure that out.