Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 894666 - sys-kernel/gentoo-sources-6.1.12 x11-drivers/nvidia-drivers-525.85.05 video output freezes during OpenRC init
Summary: sys-kernel/gentoo-sources-6.1.12 x11-drivers/nvidia-drivers-525.85.05 video o...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-16 01:53 UTC by William Rabbermann
Modified: 2023-02-17 01:33 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
gentoo-sources-6.1.12 .config (.config,169.40 KB, text/plain)
2023-02-16 01:53 UTC, William Rabbermann
Details
lspci -k (lspci.info,5.54 KB, text/plain)
2023-02-16 01:55 UTC, William Rabbermann
Details
syslog snippet (syslog,19.63 KB, text/plain)
2023-02-16 02:01 UTC, William Rabbermann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Rabbermann 2023-02-16 01:53:48 UTC
Created attachment 851494 [details]
gentoo-sources-6.1.12 .config

Hello,

I recently upgraded my kernel to the new release, sys-kernel/gentoo-sources-6.1.12. After installing x11-drivers/nvidia-drivers-525.85.05 into /lib/modules/6.1.12-gentoo/ I restarted my machine. 

My video output freezes at "Populating /dev with existing devices through uevents...". I CTRL+ALT+DEL and reboot or shutdown therefore the system is responsive. My last working kernel was 5.15.88-gentoo.

https://i.imgur.com/yX56oCX.jpeg

I pass through one of my graphics cards using vfio-pci. My other graphics card on slot 2 is my primary video out. Nouveau is blacklisted in my configuration.



/etc/modprobe.d/vfio.conf

softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia* pre: vfio-pci
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
options vfio-pci ids=10de:2504,10de:228e



/etc/modprobe.d/nvidia.conf

# NVIDIA drivers options
# See /usr/share/doc/nvidia-drivers-*/README.txt* for more information.

# nvidia-drivers and nouveau cannot be used at same time.
# Comment out the following line if you wish to allow nouveau.
blacklist nouveau

# Kernel Mode Setting (notably needed for EGLStream/Wayland)
# Enabling may possibly cause issues with SLI and Reverse PRIME.
options nvidia-drm modeset=1

# Suspend options. Allocations=0 recommended over =1 unless enable nvidia's
# systemd sleep services (nvidia-hibernate, nvidia-resume, nvidia-suspend),
# but even then may lead to issues on some setups (keep 0 if in doubt).
options nvidia \
	NVreg_PreserveVideoMemoryAllocations=0 \
	NVreg_TemporaryFilePath=/var/tmp

# !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
	NVreg_DeviceFileGID=27 \
	NVreg_DeviceFileMode=432 \
	NVreg_DeviceFileUID=0 \
	NVreg_ModifyDeviceFiles=1

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia
Comment 1 William Rabbermann 2023-02-16 01:55:58 UTC
Created attachment 851496 [details]
lspci -k

1660ti is the video out
3060 is the vfio pass though
Comment 2 William Rabbermann 2023-02-16 02:01:57 UTC
Created attachment 851498 [details]
syslog snippet
Comment 3 Ionen Wolkens gentoo-dev 2023-02-16 05:12:11 UTC
Does it work without your vfio setup? May want to check if it works normally at least. Perhaps changes in the kernel w/ vfio is making nvidia think the card is in use (rather than blacklisted nouveau), not that it's something I've kept up with.

vfio passthrough works fine for me still, but I only have one nvidia card, can't test two nor am I familiar with using two at once (so can't say what module options or load order may help).
Comment 4 William Rabbermann 2023-02-16 15:07:06 UTC
(In reply to Ionen Wolkens from comment #3)
> Does it work without your vfio setup? May want to check if it works normally
> at least. Perhaps changes in the kernel w/ vfio is making nvidia think the
> card is in use (rather than blacklisted nouveau), not that it's something
> I've kept up with.
> 
> vfio passthrough works fine for me still, but I only have one nvidia card,
> can't test two nor am I familiar with using two at once (so can't say what
> module options or load order may help).

I disabled the vfio.conf. Now it does not freeze when I start up the new kernel.

How did you manage to pass in only one card and still use your linux host as normal? AFAIK if you pass in one card, you have to do some switching to the virtual machine and back to the host once the virtual machine is down.
In the end I like the two GPU method because it means I can use my linux and VM at the same time via the a shared video memory buffer of looking-glass-client.

But back to why my vfio.conf doesnt work for my configuration in this new kernel; I need to see what the kernel has changed and how it has affected the process.
Comment 5 William Rabbermann 2023-02-16 23:34:44 UTC
Tried adding:

options nvidia ids=10de:2182,10de:1aeb,10de:1aec

To the vfio.conf; still won't work. My video just ends up freezing.
Im really not sure now...
Comment 6 William Rabbermann 2023-02-17 01:04:32 UTC
Ok got it to work by adding a custom dracut module in /usr/lib/dracut/20vfio-override/ 

module-setup.sh

#!/usr/bin/bash
check() {
	return 0
}

depends() {
	return 0
}

install() {
	inst_hook pre-udev 00 "$moddir/vfio-pci-override.sh"
}

vfio-pci-override.sh
#!/bin/sh

DEVICES=(
		"0000:01:00.0 " # 3060 (VGA)
		"0000:01:00.1 " # 3060 (audio)
)

for dev in ${DEVICES[@]}; do
	echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override
done

modprobe -i vfio-pci


and adding drivers+=" vfio vfio-pci vfio_iommu_type1 " to /etc/dracut.conf.d/*

as well as adding  rd.driver.pre=vfio-pci  to GRUB_CMDLINE_LINUX_DEFAULT. 

Note lspci -k did not show the vfio-pci module was in use until I started using it... and this is fine. I guess they changed how the vfio-pci module is loaded.

01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
	Subsystem: eVga.com. Corp. GA106 [GeForce RTX 3060 Lite Hash Rate]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau, nvidia_drm, nvidia
Comment 7 William Rabbermann 2023-02-17 01:09:09 UTC
Oh yeah and I went ahead and removed the vfio.conf file in /etc/modprobe.d/ from earlier, since now I am using a dracut module to load the vfio to the specified devices.
Comment 8 Ionen Wolkens gentoo-dev 2023-02-17 01:09:34 UTC
I was more specifically referring to two nvidia cards, there can be other non-nvidia cards in that statement. Single gpu passthrough is possible, but that does need workarounds to control the host.

But anyhow, glad you got it to work :)
Comment 9 William Rabbermann 2023-02-17 01:24:16 UTC
drivers+=" vfio-pci "

Is all I really need loaded in my initrd. What I meant is lspci will look like 

01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
	Subsystem: eVga.com. Corp. GA106 [GeForce RTX 3060 Lite Hash Rate]
	Kernel modules: nouveau, nvidia_drm, nvidia

With no driver in use, which is good. once you start a VM, vfio-pci will start being in use.
Maybe I could get vfio on a single GPU, but then I would not have two GPUs in my PC. lel
Comment 10 William Rabbermann 2023-02-17 01:33:38 UTC
Also use #!/bin/sh not #!/usr/bin/bash
I was following some other dude's guide but for real if you switch to dash you are gonna want to rewrite all your scripts anyway. just saying