Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 927079 - sys-kernel/gentoo-sources-6.1.81 prevents sys-apps/smartmontools (smartd) from starting
Summary: sys-kernel/gentoo-sources-6.1.81 prevents sys-apps/smartmontools (smartd) fro...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-15 16:37 UTC by Andy Figueroa
Modified: 2024-03-19 20:08 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
/var/log/dmesg as requested (dmesg,57.63 KB, text/plain)
2024-03-15 17:28 UTC, Andy Figueroa
Details
This is output of dmesg from bash shell. (dmesg.txt,73.23 KB, text/plain)
2024-03-15 17:50 UTC, Andy Figueroa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andy Figueroa 2024-03-15 16:37:39 UTC
Stable systems, x86 and x86-64, smartd won't start under gentoo-sources-6.1.81 during boot, or manually. Rebooting into gentoo-sources-6.1.74 solves problem.

Discussed in forum thread: https://forums.gentoo.org/viewtopic-t-1167833.html

End of dmesg shows trace:

[Tue Mar 12 23:54:41 2024] ------------[ cut here ]------------
[Tue Mar 12 23:54:41 2024] WARNING: CPU: 7 PID: 4020 at drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x3a/0x240
[Tue Mar 12 23:54:41 2024] Modules linked in: uas x86_pkg_temp_thermal mei_hdcp kvm_intel rt2800pci eeprom_93cx6 rt2x00pci rt2800mmio rt2x00mmio rt2800lib kvm crc_ccitt rt2x00lib at24 mac80211 regmap_i2c libarc4 cfg80211 irqbypass mei_me e1000e firewire_ohci firewire_core mei f71882fg coretemp
[Tue Mar 12 23:54:41 2024] CPU: 7 PID: 4020 Comm: smartd Not tainted 6.1.81-gentoo #1
[Tue Mar 12 23:54:41 2024] Hardware name: Hewlett-Packard h8-1260t/2AB5, BIOS 7.12 10/12/2011
[Tue Mar 12 23:54:41 2024] RIP: 0010:scsi_execute_cmd+0x3a/0x240
[Tue Mar 12 23:54:41 2024] Code: f4 89 d6 55 44 89 c5 53 48 83 ec 10 48 8b 5c 24 50 48 89 0c 24 48 85 db 0f 84 9b 01 00 00 48 83 3b 00 74 24 83 7b 08 60 74 1e <0f> 0b 41 bd ea ff ff ff 48 83 c4 10 44 89 e8 5b 5d 41 5c 41 5d 41
[Tue Mar 12 23:54:41 2024] RSP: 0018:ffffaa6f0173fcc0 EFLAGS: 00010287
[Tue Mar 12 23:54:41 2024] RAX: ffffaa6f0173fd20 RBX: ffffaa6f0173fd20 RCX: ffff99b3939c9400
[Tue Mar 12 23:54:41 2024] RDX: 0000000000000022 RSI: 0000000000000022 RDI: ffff99b380d45000
[Tue Mar 12 23:54:41 2024] RBP: 0000000000000200 R08: 0000000000000200 R09: 0000000000002710
[Tue Mar 12 23:54:41 2024] R10: ffffaa6f0173fee8 R11: 0000000000000000 R12: ffffaa6f0173fd50
[Tue Mar 12 23:54:41 2024] R13: ffff99b380d45000 R14: 0000000000002710 R15: 0000000000000000
[Tue Mar 12 23:54:41 2024] FS:  00007f1023662480(0000) GS:ffff99b6af5c0000(0000) knlGS:0000000000000000
[Tue Mar 12 23:54:41 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Mar 12 23:54:41 2024] CR2: 00007ffce0074ff8 CR3: 0000000118fd4005 CR4: 00000000000606e0
[Tue Mar 12 23:54:41 2024] Call Trace:
[Tue Mar 12 23:54:41 2024]  <TASK>
[Tue Mar 12 23:54:41 2024]  ? scsi_execute_cmd+0x3a/0x240
[Tue Mar 12 23:54:41 2024]  ? __warn+0x74/0xc0
[Tue Mar 12 23:54:41 2024]  ? scsi_execute_cmd+0x3a/0x240
[Tue Mar 12 23:54:41 2024]  ? report_bug+0xe2/0x150
[Tue Mar 12 23:54:41 2024]  ? handle_bug+0x3a/0x70
[Tue Mar 12 23:54:41 2024]  ? exc_invalid_op+0x13/0x60
[Tue Mar 12 23:54:41 2024]  ? asm_exc_invalid_op+0x16/0x20
[Tue Mar 12 23:54:41 2024]  ? scsi_execute_cmd+0x3a/0x240
[Tue Mar 12 23:54:41 2024]  ? ata_cmd_ioctl+0x1dd/0x2f0
[Tue Mar 12 23:54:41 2024]  ata_cmd_ioctl+0x13f/0x2f0
[Tue Mar 12 23:54:41 2024]  scsi_ioctl+0x32b/0x900
[Tue Mar 12 23:54:41 2024]  ? ioctl_has_perm.constprop.0.isra.0+0xd8/0x140
[Tue Mar 12 23:54:41 2024]  ? scsi_block_when_processing_errors+0x1d/0xf0
[Tue Mar 12 23:54:41 2024]  blkdev_ioctl+0x100/0x280
[Tue Mar 12 23:54:41 2024]  __x64_sys_ioctl+0x88/0xc0
[Tue Mar 12 23:54:41 2024]  do_syscall_64+0x38/0x90
[Tue Mar 12 23:54:41 2024]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[Tue Mar 12 23:54:41 2024] RIP: 0033:0x7f102332c98b
[Tue Mar 12 23:54:41 2024] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Tue Mar 12 23:54:41 2024] RSP: 002b:00007ffce0074990 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Tue Mar 12 23:54:41 2024] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f102332c98b
[Tue Mar 12 23:54:41 2024] RDX: 00007ffce0074bf0 RSI: 000000000000031f RDI: 0000000000000003
[Tue Mar 12 23:54:41 2024] RBP: 000055b9c3a9c760 R08: 0000000000000000 R09: 0000000000000000
[Tue Mar 12 23:54:41 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Tue Mar 12 23:54:41 2024] R13: 00007ffce0075620 R14: 00007ffce0074bf0 R15: 00007ffce0075620
[Tue Mar 12 23:54:41 2024]  </TASK>
[Tue Mar 12 23:54:41 2024] ---[ end trace 0000000000000000 ]---

Forum moderator Hu suggests the following: "To your specific problem: Normally, the next step would be to try to find the specific commit (not just kernel release) that broke this. I see 1512 commits present in v6.1.81 and not present in v6.1.74. If we assume this was broken upstream, rather than by a Gentoo-specific patch, then we need to find which of those 1512 commits is at fault. A straight git bisect will need log2(1512) =~ 11 steps to find this. However, after the first draft of this post, I did some basic analysis of the faulting function, and I see the WARN introduced in a commit that is present in v6.1.81 and absent in v6.1.74. Therefore, without any attempt to understand what the code is meant to do, I posit that this check is just wrong as committed. The WARN was introduced in scsi: core: Add struct for args to execution functions, which makes it v6.1.80-4-gcf33e6ca12d8. That is, it is the fourth commit added after 6.1.80, so I expect you would find 6.1.80 to be good and 6.1.81 to be bad. It may (or may not) be the case that the corresponding upstream commit was correct as written, due to a change present in the later kernel that is absent in 6.1. That would require either testing a newer kernel series, or getting input from someone who understands this code and can explain what the check was meant to prevent."

The discussion is well over my head.
Comment 1 Mike Pagano gentoo-dev 2024-03-15 16:57:30 UTC
No problems here with gentoo-sources-6.1.81 and smartd

Attach your full dmesg and your smartmontools version.
Comment 2 Andy Figueroa 2024-03-15 17:28:15 UTC
Created attachment 887703 [details]
/var/log/dmesg as requested
Comment 3 Andy Figueroa 2024-03-15 17:31:14 UTC
sys-apps/smartmontools
Installed versions:  7.4(11:11:30 PM 03/13/2024)(daemon update-drivedb -caps -selinux -static -systemd)

$ sudo /etc/init.d/smartd start
Password: 
 * Starting smartd ...
 * start-stop-daemon: failed to start `/usr/sbin/smartd'
 * Failed to start smartd                                                 [ !! ]
 * ERROR: smartd failed to start

and from boot log:

Fri Mar 15 13:24:19 2024:  * Starting saned ...
Fri Mar 15 13:24:19 2024:  [ ok ]
Fri Mar 15 13:24:19 2024:  * Starting smartd ...
Fri Mar 15 13:24:19 2024:  * start-stop-daemon: failed to start `/usr/sbin/smartd'
Fri Mar 15 13:24:19 2024:  * Failed to start smartd
Fri Mar 15 13:24:19 2024:  [ !! ]
Fri Mar 15 13:24:19 2024:  * ERROR: smartd failed to start
Fri Mar 15 13:24:20 2024:  * Starting sshd ...
Fri Mar 15 13:24:21 2024:  [ ok ]
Comment 4 Mike Pagano gentoo-dev 2024-03-15 17:35:13 UTC
Is there something in dmesg after you start smartd ? 

The full dmesg after you start smartd
Comment 5 Andy Figueroa 2024-03-15 17:46:19 UTC
After trying to start smartd there is nothing relevant in dmesg, just two elogin lines.

[Fri Mar 15 13:24:19 2024]  </TASK>
[Fri Mar 15 13:24:19 2024] ---[ end trace 0000000000000000 ]---
[Fri Mar 15 13:24:27 2024] elogind-daemon[2216]: Removed session 1.
[Fri Mar 15 13:24:27 2024] elogind-daemon[2216]: New session 2 of user figueroa.
Comment 6 Andy Figueroa 2024-03-15 17:50:19 UTC
Created attachment 887704 [details]
This is output of dmesg from bash shell.

Output of dmesg from bash shell, dmesg > dmesg.txt in attachment.
Comment 7 Mike Gilbert gentoo-dev 2024-03-15 19:01:54 UTC
Is there any syslog output from smartd?
Comment 8 Andy Figueroa 2024-03-15 19:26:32 UTC
Actually, yes. I'm running metalog and /var/log/everything shows for for the current boot instance:
# grep smart current
Mar 15 13:24:19 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build)
Mar 15 13:24:19 [smartd] Opened configuration file /etc/smartd.conf
Mar 15 13:24:19 [smartd] Configuration file /etc/smartd.conf parsed.
Mar 15 13:24:19 [smartd] Device: /dev/sda, opened
Mar 15 13:24:19 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure
Mar 15 13:24:19 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf
Mar 15 13:24:19 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting.
Mar 15 13:24:19 [smartd] smartd is exiting (exit status 16)
Mar 15 13:24:19 [kernel] [   38.831326] CPU: 7 PID: 4067 Comm: smartd Not tainted 6.1.81-gentoo #1
Mar 15 13:24:19 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd'
Mar 15 13:24:19 [/etc/init.d/smartd] ERROR: smartd failed to start
Mar 15 13:26:27 [sudo] figueroa : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/etc/init.d/smartd start
Mar 15 13:26:27 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build)
Mar 15 13:26:27 [smartd] Opened configuration file /etc/smartd.conf
Mar 15 13:26:27 [smartd] Configuration file /etc/smartd.conf parsed.
Mar 15 13:26:27 [smartd] Device: /dev/sda, opened
Mar 15 13:26:27 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure
Mar 15 13:26:27 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf
Mar 15 13:26:27 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting.
Mar 15 13:26:27 [smartd] smartd is exiting (exit status 16)
Mar 15 13:26:27 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd'
Mar 15 13:26:27 [/etc/init.d/smartd] ERROR: smartd failed to start
Mar 15 13:30:23 [sudo] figueroa : TTY=pts/1 ; PWD=/var/log ; USER=root ; COMMAND=/etc/init.d/smartd start
Mar 15 13:30:24 [smartd] smartd 7.4 2023-08-01 r5530 [x86_64-linux-6.1.81-gentoo] (local build)
Mar 15 13:30:24 [smartd] Opened configuration file /etc/smartd.conf
Mar 15 13:30:24 [smartd] Configuration file /etc/smartd.conf parsed.
Mar 15 13:30:24 [smartd] Device: /dev/sda, opened
Mar 15 13:30:24 [smartd] Device: /dev/sda, not ATA, no IDENTIFY DEVICE Structure
Mar 15 13:30:24 [smartd] Unable to register ATA device /dev/sda at line 24 of file /etc/smartd.conf
Mar 15 13:30:24 [smartd] Unable to register device /dev/sda (no Directive -d removable). Exiting.
Mar 15 13:30:24 [smartd] smartd is exiting (exit status 16)
Mar 15 13:30:24 [/etc/init.d/smartd] start-stop-daemon: failed to start `/usr/sbin/smartd'
Mar 15 13:30:24 [/etc/init.d/smartd] ERROR: smartd failed to start

Log entries from 13:24 are during boot, 13:26 is an attempt to start manually (/etc/init.d/smartd start), and 13:30 another start attempt.
Comment 9 Mike Pagano gentoo-dev 2024-03-17 17:05:59 UTC
Please test with 6.8.1 to see if a fix has been implemented already
Comment 10 Andy Figueroa 2024-03-18 05:32:46 UTC
Yes, booting with gentoo-sources-6.8.1 does the trick. smartd runs on startup without error and no other apparent errors. Thank you. Feels strange being out here on the bleeding edge.

It would be good to know what changed. Was it fixed in a gentoo kernel patch?
Comment 11 Mike Pagano gentoo-dev 2024-03-19 18:35:59 UTC
(In reply to Andy Figueroa from comment #10)
> Yes, booting with gentoo-sources-6.8.1 does the trick. smartd runs on
> startup without error and no other apparent errors. Thank you. Feels strange
> being out here on the bleeding edge.
> 
> It would be good to know what changed. Was it fixed in a gentoo kernel patch?

Nothing from us, glad to see it's working.
Comment 12 Mike Gilbert gentoo-dev 2024-03-19 20:08:11 UTC
Sounds like a regression in the 6.1 stable series though. Probably best to work with upstream to figure that out.