Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 919285 - media-libs/mesa-23.3.0 causing issues with vmware vgpu (svga3d 3d accleration)
Summary: media-libs/mesa-23.3.0 causing issues with vmware vgpu (svga3d 3d accleration)
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo X packagers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-05 15:15 UTC by Kyle Rabago
Modified: 2024-08-12 20:58 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kyle Rabago 2023-12-05 15:15:06 UTC
As stated something with the way mesa is built on gentoo is causing the 3d accelerated svga3d to send an error and causes vmware to do a sorta gpu reset where it removes the device and trys to reconnect to it as shown in both the vmware and mksandbox logs, as far as I know mesa is most likely the culprit but if not it could also be due to xorg-server/wayland/libs that use mesas libs/vmwgfx lib. This problem goes away however when 3d acceleration is disabled in the vm pointing to a svga3d problem with GENTOO specifically as all other distributions work fine including bleeding edge arch which also rules out vmware as the problem. therefore the way we build mesa, or other parts of the graphical backend (ie xorg,wayland,libs,libdrm) is causing an error within the vm to host communication over vmware

Reproducible: Always
Comment 1 Kyle Rabago 2023-12-05 15:20:27 UTC
(In reply to Kyle Rabago from comment #0)
> As stated something with the way mesa is built on gentoo is causing the 3d
> accelerated svga3d to send an error and causes vmware to do a sorta gpu
> reset where it removes the device and trys to reconnect to it as shown in
> both the vmware and mksandbox logs, as far as I know mesa is most likely the
> culprit but if not it could also be due to xorg-server/wayland/libs that use
> mesas libs/vmwgfx lib. This problem goes away however when 3d acceleration
> is disabled in the vm pointing to a svga3d problem with GENTOO specifically
> as all other distributions work fine including bleeding edge arch which also
> rules out vmware as the problem. therefore the way we build mesa, or other
> parts of the graphical backend (ie xorg,wayland,libs,libdrm) is causing an
> error within the vm to host communication over vmware
> 
> Reproducible: Always

the device in this context being the graphics card in the vm and the host causing a d3d12 device removal error in vmware logs
Comment 2 Kyle Rabago 2023-12-05 15:21:51 UTC
Also another thing is that gnome specifically is struggling with this the most, especially gdm, as it cant even launch and simply crashes to a black screen and in fact freezes vmware completely
Comment 3 Kyle Rabago 2023-12-05 15:26:04 UTC
As well this seems to only be affecting nvidia as my igpu on my intel cpu works fine with no issues and as for amd hardware i dont have any personally to test. As well my drivers are fully updated so that rules out outdated drivers. just another thing I thought may be relevant to understand. Also windows 10/11 is the host so if that affects anything in any way please take that into account
Comment 4 Kyle Rabago 2023-12-07 00:07:08 UTC
As well, there is no issue in launching gnome, with something like sddm or lightdm but after a random period of time or a random interaction with it, it crashes the desktop entirely and creates a blackscreen and a disconnected cursor from the vmware vm
Comment 5 Kyle Rabago 2023-12-07 00:16:54 UTC
Something else of note is that gnome is much less stable than kde with this issue as kde seems to work fine or at least much more consistenly than gnome as gnomes backend seems to differ from kde's and is much less stable with the vgpu acceleration
Comment 6 Kyle Rabago 2023-12-10 21:59:17 UTC
After doing some more testing on the host side with reseating ram and trying a new nvidia 1030 to see if it was a gpu error im still running into these issues, however it seems to be a purely 3d based error as xorg/wayland does work without this as other display managers work fine with out any issues. After checking my windows logs as well I found a nvlddmkm error id 14 in the event log which is most likely due to the gpu crashing when trying to run gdm/gnome with 3d accleration in vmware and causing vmware to freeze. As for now all I can think of trying to do is figuring's out what is causing gentoo gdm to crash while arch gdm or ubuntu gdm to not do the same and if this is cause of a xorg/mesa/wayland or some other part of the graphical server setup that is specific to the way gentoo is building them compared to other distros other than being a binary package built to work on all hardware. let me know if anyone else has some other ideas or can try to replicate this setup and see if they get the same issues (windows 10-11, vmware workstation 17.5, nvidia gpu ONLY no discrete igpu and using 3d acceleration and try to run gdm/gnome)
Comment 7 Kyle Rabago 2023-12-19 17:14:52 UTC
This is still and issue and does not get fixed with choosing a different display server as well, I suspect something with the way vmware and gentoo handle the pass through to be the issue but as for now all xorg or wayland sessions are prone to a complete gpu crash
Comment 8 Kalin KOZHUHAROV 2024-01-15 07:03:08 UTC
Since recent mesa is built with LLVM, different versions, check yours.
Did you try other mesa versions (e.g. older)?

# grep ^LLVM_MAX_SLOT /var/db/repos/gentoo/media-libs/mesa/mesa-*
/var/db/repos/gentoo/media-libs/mesa/mesa-23.1.8.ebuild:LLVM_MAX_SLOT="16"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.1.9.ebuild:LLVM_MAX_SLOT="16"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.2.1.ebuild:LLVM_MAX_SLOT="16"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.3.0.ebuild:LLVM_MAX_SLOT="17"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.3.0_rc5-r1.ebuild:LLVM_MAX_SLOT="17"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.3.1.ebuild:LLVM_MAX_SLOT="17"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.3.2.ebuild:LLVM_MAX_SLOT="17"
/var/db/repos/gentoo/media-libs/mesa/mesa-23.3.3.ebuild:LLVM_MAX_SLOT="17"
/var/db/repos/gentoo/media-libs/mesa/mesa-9999.ebuild:LLVM_MAX_SLOT="17

Is your clang toolkit up to date, did you rebuild it?
Which version was used to compile it, e.g. look in `grep -P "libLLVM-\d\d\.so" /var/db/pkg/media-libs/mesa-*/REQUIRES`
Try with a mesa version that is built with a different clang/llvm.

While you have long text description, please add some specific log lines that indicate the problem (or screenshots, if you really cannot get text). Add `emerge --info` output.
Comment 9 mbe 2024-01-27 19:59:25 UTC
I have the same issue:
windows 10, vmware workstation 17.5, nvidia gpu ONLY, >=mesa-23.3, crashes with gdm, vmware to freeze - need to kill vmware with task manager in win
Comment 10 Matt Turner gentoo-dev 2024-04-17 19:45:18 UTC
Could you try with https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28616 which is included in mesa-24.0.5?
Comment 11 mbe 2024-04-22 13:01:25 UTC
The same issue with mesa-24.0.5
Comment 12 Matt Turner gentoo-dev 2024-04-22 15:36:31 UTC
Dang.

I think someone that is able to reproduce this issue should file an upstream issue on https://gitlab.freedesktop.org/mesa/mesa/-/issues