Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 911394 - Gentoo Kernel 6.1.31 System and Installation media suddenly can't find root block device: fail to boot, random hardware failures across multiple machines at the same time.
Summary: Gentoo Kernel 6.1.31 System and Installation media suddenly can't find root b...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-28 18:12 UTC by armouredheart
Modified: 2023-07-30 02:37 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description armouredheart 2023-07-28 18:12:34 UTC
Several of my computers that are running gentoo kernel 6.1.31 have all suddenly crashed and refuse to boot correctly. Even the new boot media (livecd, liveusb) is failing to boot correctly. There were no updates done for the last week. I am writing this from my cellular phone because my computers are borked until I can restore them.
The hardware varies from a T500 Thinkpad to a brand new custom-built desktop with an AMD 5700X processor. The Thinkpad is running with LVM2, but the new desktop is not, yet the same error appears except "/dev/dm-0" is replaced with "/dev/sda3" or "/dev/nvme0n1p3" depending on the machine.

To be clear, the same problem happened to all my gentoo machines on the 6.1.31 kernel within a few hours of each other, both with and without updates. It even happened to machines without gentoo if I tested them with the latest boot media. There is some kind of connection. If it was user error, then it would have happened to only one machine, not all of them, and it wouldn't happen to the boot media.

The error thrown after selecting Gentoo from grub is as follows;

mount: mounting on /dev/dm-0 on /newroot failed: Input/output error
!! Could not mount specified ROOT!
!! Block device /dev/dm-0 is not a valid root device...
!! Could not find the block device in /dev/dm-0.
!! Please specify another value or:
!! - Press Enter for the same
!! - type "shell" for a shell
!! - type "q" to skip ...
root block device(/dev/dm-0) ::


I was talking on the gentoo irc when it happened to my thinkpad. First, the disk read/write went crazy. Second, firefox crashed and refused to start. Then my wallpaper went blank. After a few minutes the MATE desktop vanished and left me with only a mouse cursor. I performed a hard reset and restart, and got the error written above. I took a picture of the whole thing, but the image file is too large for bugzilla.
My thinkpad now hangs for several minutes upon startup with the hdd light going crazy, and grub is now hanging with a blinking cursor and then reporting "hard drive initialization failure" or alternatively "fan error".
I should also note that the exact same error appears on machines that don't have gentoo installed if I boot them with, say, the latest liveusb boot media. 

Workaround:

The older livecd from kernel 5.**** something still seems to work, and in fact the thinkpad is able to boot normally from hard drive as long as the old minimal gentoo cd is in the disk drive. I am utterly confused as to why this works, and have no idea how to make the machine function without the old cd.
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-07-28 18:16:37 UTC
My guess is it's not the kernel version but the genkernel version, perhaps. That output you've shown is from genkernel, I think?

>My thinkpad now hangs for several minutes upon startup with the hdd light going crazy, and grub is now hanging with a blinking cursor and then reporting "hard drive initialization failure" or alternatively "fan error".

This part is where it starts to make way less sense because even if the initramfs is broken, you shouldn't really get "fan error".

>The older livecd from kernel 5.**** something still seems to work, and in fact the thinkpad is able to boot normally from hard drive as long as the old minimal gentoo cd is in the disk drive. I am utterly confused as to why this works, and have no idea how to make the machine function without the old cd.

This again sounds pretty weird. Anyway, I suggest using the live media, downgrading genkernel just in case to rule it out, building an old kernel, and possibly reinstalling grub, and go from there.

But this is going to be better on IRC or the forums, as it's a support issue.
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-07-28 18:18:45 UTC
> I was talking on the gentoo irc when it happened to my thinkpad. First, the
> disk read/write went crazy. Second, firefox crashed and refused to start.
> Then my wallpaper went blank

Also, this part doesn't make sense. How could a kernel, which just got installed but isn't booted into, affect the rest of your system?

This sounds like terribly bad luck or perhaps an electrical surge.

>I am writing this from my cellular phone because my computers are borked until I can restore them.

I thought you could boot the machine with a workaround w/ the old disk?
Comment 3 armouredheart 2023-07-28 18:24:52 UTC
(In reply to Sam James from comment #2)
> > I was talking on the gentoo irc when it happened to my thinkpad. First, the
> > disk read/write went crazy. Second, firefox crashed and refused to start.
> > Then my wallpaper went blank
> 
> Also, this part doesn't make sense. How could a kernel, which just got
> installed but isn't booted into, affect the rest of your system?
> 
> This sounds like terribly bad luck or perhaps an electrical surge.
> 
> >I am writing this from my cellular phone because my computers are borked until I can restore them.
> 
> I thought you could boot the machine with a workaround w/ the old disk?

Ah, sorry, I forgot to delete that bit because I had the idea to test the cd midway through writing the bug report. (I'm still writing from a cellular though)

The reason I went with a bug report was because the problem is also happening with the new liveusb and livecd images, in addition to what I have installed on hard drives. 

If it was an electrical surge, then the problem shouldn't happen to computers that were unplugged and turned off, yet it does (only with the boot media though, the older kernel seems to work fine).
Comment 4 armouredheart 2023-07-28 18:27:05 UTC
(In reply to armouredheart from comment #3)
> (In reply to Sam James from comment #2)
> > > I was talking on the gentoo irc when it happened to my thinkpad. First, the
> > > disk read/write went crazy. Second, firefox crashed and refused to start.
> > > Then my wallpaper went blank
> > 
> > Also, this part doesn't make sense. How could a kernel, which just got
> > installed but isn't booted into, affect the rest of your system?
> > 
> > This sounds like terribly bad luck or perhaps an electrical surge.
> > 
> > >I am writing this from my cellular phone because my computers are borked until I can restore them.
> > 
> > I thought you could boot the machine with a workaround w/ the old disk?
> 
> Ah, sorry, I forgot to delete that bit because I had the idea to test the cd
> midway through writing the bug report. (I'm still writing from a cellular
> though)
> 
> The reason I went with a bug report was because the problem is also
> happening with the new liveusb and livecd images, in addition to what I have
> installed on hard drives. 
> 
> If it was an electrical surge, then the problem shouldn't happen to
> computers that were unplugged and turned off, yet it does (only with the
> boot media though, the older kernel seems to work fine).

Oh, and the SystemRescueCd throws the same error, but the GParted cd and the LinuxMint LiveUSB work fine.
Comment 5 armouredheart 2023-07-29 16:00:35 UTC
Reverting to kernel 5.15.122 seems to have solved the problem.
Comment 6 armouredheart 2023-07-29 22:39:10 UTC
The gentoo 6.1.38 liveusb works perfectly as long as a livecd of gentoo 5.1.74 is in the cd drive. The disk spins up when the usb is accessed. Yes, the liveusb is what is loaded because "gentoo 6.1.38" is the active kernel.

For some reason, the usb needs an older cd  to boot correctly.

No, it doesn't make sense. But it works.
Comment 7 Mike Gilbert gentoo-dev 2023-07-30 02:37:36 UTC
Please seek help in Gentoo support channels, and reopen if you manage to isolate a bug.