Hello, On a fresh Gentoo install on an x86_64 system with and amdgpu managed GPU (RX5700XT), Xorg cannot be started. With the following relevant errors from ~/.local/share/xorg/Xorg.0.log with undefined symbols: ``` (II) LoadModule: "amdgpu" [ 5745.051] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so [ 5745.051] (EE) Failed to load /usr/lib64/xorg/modules/drivers/amdgpu_drv.so: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so: undefined symbol: fbImageGlyphBlt [ 5745.051] (EE) Failed to load module "amdgpu" (loader failed, 0) [...] [ 5745.051] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so [ 5745.051] (EE) Failed to load /usr/lib64/xorg/modules/drivers/modesetting_drv.so: /usr/lib64/xorg/modules/drivers/modesetting_drv.so: undefined symbol: shadowRemove ``` I had an extensive look at the documentation and I believe I have everything setup according to the documentation: - Have a desktop profile `default/linux/amd64/17.1/desktop/gnome/systemd` - have as use flag "X" in /etc/portage/make.conf - Have VIDEO_CARDS="amdgpu radeonsi" in /etc/portage/make.conf -------------- I could work-around this by using modesetting_drv.so from archlinux. additionally using modesetting_drv.so fixes the second fail but isn't mandatory to be able to start the X session. This seems to be the same bug as 661502 from three years ago. -------------- x11-base/xorg-server USE="ipv6 systemd udev wayland xorg xvfb -debug -dmx -doc (-elogind) -kdrive (-libressl) -minimal (-selinux) -suid -test -unwind -xcsecurity -xephyr -xnest" compile flags: -02 -march=native -pipe CPU: Rzyen 3700X GPU: RX 5700 XT
Looking at both that bug and my own system, I can confirm that: Section "Module" Load "fb" Load "shadow" Load "glamoregl" EndSection is still needed for both amdgpu and modesetting DDX to load. Sadly I'm not familiar with the way the modern modular xorg.conf.d infrastructure operates and if it could be automated that way. Considering that the Arch DDX works, it may also be that we could just link those modules in/up to avoid config driven dlopen but I'm not familiar with the Xorg build system either. At least a warning with instructions should be printed by x11-drivers/xf86-video-amdgpu and x11-base/xorg-server for the modesetting driver. This might also affect other related drivers such as nouveau.
After replacing amdgpu_drv.so and modesetting_drv.so by the ones provided by archlinux, the module loading sequence (that works) on my machine is the following: glx, amdgpu,
GLX should get auto-loaded, so there might be some additional issue at play here. Gentoo issues should not be solved by replacing system binaries with ones from other distributions. Could you please restore Gentoo built modules and verify if the issue is still present when using this code in your xorg.conf: Section "Module" Load "fb" Load "shadow" Load "glamoregl" EndSection
Created attachment 695052 [details] Xorg log after 20-modules.conf file creation
Hello, Please ignore my previous message, it got posted while I changed the bug title. I have been reading and trying to figure out what's happening. I tried the config you suggested in /usr/share/X11/xorg.conf.d/20-modules.conf. I also re-emerged xorg-server, xf86-video-fbdev and xf86-video-amdgpu to make sure I am running Gentoo's generated drivers: it indeed enabled my X session to start. One issue remains, fbdev module seems to still lack a symbol: fbdevHWSave, please have a look at the Xorg log I attached.
I got the idea to use the same fix you suggested to me, to fix the other missing symbol. The issue is entirely fixed with the following /usr/share/X11/xorg.conf.d/20-modules.conf file: Section "Module" Load "fb" Load "fbdevhw" Load "shadow" Load "glamoregl" EndSection Thank you for your help. > it may also be that we could just link those modules in/up to avoid config driven dlopen but I'm not familiar with the Xorg build system either. I am not sure I understand, but couldn't it be a valid solution to make the x11-drivers/xf86-video-amdgpu EBUILD install such a file ?
I'm not an expert on xorg.conf.d but I do not believe that it would be inherently safe for a driver to add such files that load it by default. This is because the user can in principle have most if not all of them installed and they should only get loaded when supported hardware is found. The fbdev DDX, provided by x11-drivers/xf86-video-fbdev, is an entirely different one from amdgpu DDX. It should only be used on systems where /dev/fb0 is present but no more advanced driver such as modesetting is viable. Thank you for reporting about it as well but it's not really relevant to your issue. To recap, x11-drivers/xf86-video-amdgpu for you needed only fb, shadow and glamoregl loaded, correct? If yes, I can make a GitHub pull request to at least inform users about this.
> The fbdev DDX, provided by x11-drivers/xf86-video-fbdev, is an entirely different one from amdgpu DDX. It should only be used on systems where /dev/fb0 is present but no more advanced driver such as modesetting is viable. Noted, thank you for the information, I couldn't find fast enough relevant information. I will therefore uninstall x11-drivers/xf86-video-fbdev and remove the extra `Load "fbdevhw"` line from 20-modules.conf > To recap, x11-drivers/xf86-video-amdgpu for you needed only fb, shadow and glamoregl loaded, correct? If yes, I can make a GitHub pull request to at least inform users about this. Yes indeed! Thank you for your help!
Please attach your emerge --info and a build log of xf86-video-amdgpu.
Created attachment 695145 [details] result of emerge --info
Created attachment 695148 [details] xf86-video-amdgpu build log
Please find attached the requested files.
Thanks. Does it work if you don't use -fno-plt in your CFLAGS?
Indeed, I think -fno-plt is the problem. The GCC docs (https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#index-fplt) say > -fno-plt > Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations. On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler. Lazy binding requires use of the PLT; with -fno-plt all external symbols are resolved at load time. Specifically the last sentence. xorg-3.eclass does this: [[ ${PN} == xorg-server || ${PN} == xf86-video-* || ${PN} == xf86-input-* ]] \ && append-ldflags -Wl,-z,lazy So I think you cannot use -fno-plt with x11-drivers/* or x11-base/xorg-server. I would accept a patch to the eclass that warns or strips -fno-plt from *FLAGS, but I'm not going to do it myself.
Alternatively, we may want to see if we can remove the -Wl,-z,lazy flag. It was added in 2006, two eclasses ago: https://gitweb.gentoo.org/repo/gentoo/historical.git/commit/eclass/x-modular.eclass?id=300268a26a90fcc1820e07570af58ad66050407b
I confirm that removing -fno-plt from the compiler flags then re-emerging x11-drivers/xf86-video-amdgpu and x11-base/xorg-server resolves the issue : Xorg session starts without the need of adding a custom /usr/share/X11/xorg.conf.d/20-modules.conf file. I apologize for the inconvenience, my knowledge was limited and I was unaware of a conflicting linker flag -Wl,-z,lazy. > Alternatively, we may want to see if we can remove the -Wl,-z,lazy flag. It was added in 2006 Having a look at GNU ld's documentation [1], lazy binding seems to be on by default. -fno-plt documentation from GCC [2] doesn't warn about this kind of conflict. I suppose that explicitly setting that linker flag overrides some internal mechanisms. I will read documentation to try to locally emerge x11-drivers/xf86-video-amdgpu and x11-base/xorg-server after removing the -Wl,-z,lazy flag in the x-modular.eclass file and report back. [1] https://www.linux.org/docs/man1/ld.html [2] https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
I have developed a fix for xorg-3.eclass in the associated pull request on GitHub. But, since it's an eclass fix, it's not getting automatic assignment. Please consider re-opening this bug and seeing to the PR. Thank you. For convenience, the link to the PR is: https://github.com/gentoo/gentoo/pull/20166
Niklāvs, Matt, I noticed a thread on gentoo-dev with the proposed eclass patch. I am familiar with the topic (I contributed initial implementation of -fno-plt to GCC and have -fno-plt in CFLAGS on my Gentoo installs). It indeed makes necessary to manually arrange for module loading in the correct order as shown in comment #1. I do the same on my amdgpu laptop. Alpine Linux was in a similar position until musl libc started supported deferred symbol binding. They used to ship a config file that would load several modules in the right order: https://git.alpinelinux.org/aports/commit/main/xorg-server?id=e33bb5ae1e54f086fade8a476f744ac745c433e2 Arch Linux was, to my knowledge, the first and so far the only binary distribution to enable -fno-plt for all packages. After hitting this issue with Xorg module loading, they arranged to remove -fno-plt from affected Xorg packages; to show one example, amdgpu: https://github.com/archlinux/svntogit-packages/commit/4a24b97eff5295648659866ad1724dedcbec06e6
This is just for your information, to provide background how other distributions dealt with this. Thank you for your work. Forgive me if this is a naive question, but have you considered another solution where instead of stripping -fno-plt, eclass checks if -fno-plt is present, and then informs the user that they will likely have to configure Xorg modules manually?
Since I expect Xorg to get replaced by Wayland within the next year or two, is the imminent hardening improvement from -fno-plt for Xorg sufficient to justify going that far? Furthermore -fno-plt on my system also broke LTO for x11-base/xorg-server - arguably a minor thing but still a regression. I am not that familiar with the Hardened Gentoo project but my impression was that the old hardened profile flags would eventually get added to the main Gentoo profiles, meaning that at some unknown point in the future, many user may find themselves in this situation with DDX not auto-loading anymore.
Even today people can run most of their desktop on Wayland, and a few applications (such as games) under Xwayland. In this (hopefully increasingly common) scenario, they might prefer to have -fno-plt for xorg-server. My understanding that it is easily achieved now, but not with the proposed eclass change.
I expect it's only a matter of time before Gentoo follows other major distributions in shipping Xwayland as its own package, which will not get affected by this change (the stripping is conditional on package name).
That does not convince me, and I still think that unconditionally stripping -fno-plt is a poor trade-off here. It seems at this point it's better for me to step aside and stop arguing, but I'll be happy to answer any questions.
A middle ground would be to honor a "custom-cflags" USE flag and otherwise strip -fno-plt if it's not set. It may be too much work though ?
As stated before, I'd like to know what benefit does -fno-plt bring to X11, which is known for roughly no effective security between clients connected to the same server at all (or even a nested one if certain extensions are involved). I know it's bad form to devolve security discussions to metaphors but to me the issue at hand feels like trying to secure windows on a house that has the front door barely hanging on one hinge. I can't speak for other people but I'm reluctant to spend extra effort on something that, as far as I can tell, is on the verge of being effectively obsolete for people using updated open source software stacks. If xorg-server were to have custom-cflags IUSE, then I'm sure the eclass will be updated to not enforce lazy binding and by extension allow -fno-plt but adding that is unlikely to be easy. And it's not something I'm interested in pursuing for the above stated reasons.
Hello! I, personally, do not use -fno-plt for hardening but for performance reasons: given the documentation for -fno-plt, the compiler seems to be able to generate more efficient code at the expense of longer load time. For gaming, unfortunately, as things are now, GNOME afaik is the only mainstream DE that has a decent wayland support but its input lag is horrible (for X and wayland, but wayland is worse) and has frame drops, compared to compostor-less mainstream DEs on Xorg like LXDE and LXQt. Although I tried, I am not knowledgeable enough in tiling DEs like Sway to see if xwayland is good (in terms for frame time consistency and input lag) with high refresh rate gaming, when compared to LXDE under Xorg. So I am naively using fno-plt for xorg, hoping it makes it better for my gaming sessions.
Of course, it's not a biggie if I can't easily use -fno-plt with Xorg. Since it's not even proven that it makes it more efficient and there other things to look at to try to squeeze more FPS outta things. About xorg being as secure as an ancient egyptian lock, I agree.
This is really going beyond this bug report but for 3D workloads I would expect the X11 code path to be little used compared to CPU time spent on OpenGL or Vulkan - both of which for AMDGPU are provided by either the proprietary driver or Mesa's open source radeonsi/radv drivers. Using Link Time Optimization (LTO) would likely provide for even more optimization. Compositing can definitely add latency but kwin_wayland (part of kde-plasma/kwin) starting with the version 5.21 has an experimental low latency mode. In the future, I believe, the developer(s) plan to add a way for full-screen applications to be displayed directly, if user is not interacting with the compositor. If this goes well, other Wayland compositors are likely to follow suit. I don't have a precise measurement tool for this but I estimate that the processing latency of kwin_wayland, after application has received the input, is at least 30 ms or about 2 frames at 60 fps. With any luck, the overhead will stay around 2 frames at higher screen refresh rates. Of course my measurements could be off, since I'm merely increasing audio latency until it seems to roughly line up with display also reacting to my actions.
(In reply to Alexander Monakov from comment #18) > Niklāvs, Matt, > > I noticed a thread on gentoo-dev with the proposed eclass patch. I am > familiar with the topic (I contributed initial implementation of -fno-plt to > GCC and have -fno-plt in CFLAGS on my Gentoo installs). > > It indeed makes necessary to manually arrange for module loading in the > correct order as shown in comment #1. I do the same on my amdgpu laptop. > > Alpine Linux was in a similar position until musl libc started supported > deferred symbol binding. They used to ship a config file that would load > several modules in the right order: > > https://git.alpinelinux.org/aports/commit/main/xorg- > server?id=e33bb5ae1e54f086fade8a476f744ac745c433e2 > > Arch Linux was, to my knowledge, the first and so far the only binary > distribution to enable -fno-plt for all packages. After hitting this issue > with Xorg module loading, they arranged to remove -fno-plt from affected > Xorg packages; to show one example, amdgpu: > > https://github.com/archlinux/svntogit-packages/commit/ > 4a24b97eff5295648659866ad1724dedcbec06e6 Hi Alexander, Interesting! I didn't understand that -fno-plt was a meaningful optimization. Can you point me to some information about it, some performance data or something? Thanks!
(In reply to Adel KARA SLIMANE from comment #16) > I apologize for the inconvenience, my knowledge was limited and I was > unaware of a conflicting linker flag -Wl,-z,lazy. Oh don't worry about it at all. I had no idea either, and now we've both learned something :)
(In reply to Matt Turner from comment #29) > Interesting! I didn't understand that -fno-plt was a meaningful > optimization. Can you point me to some information about it, some > performance data or something? Please see my email to gcc-patches, in which I presented performance figures for LLVM compiled with/without -fno-plt: https://gcc.gnu.org/legacy-ml/gcc-patches/2015-05/msg00225.html
I don't want to be a drag talking above their level no less, but to recap: 1) Reading the test description, I'm getting confused. At first it seems that what's being tested is the impact of -fno-plt flag using the same patched compiler but that then the results would completely omit data on actual runtime performance of the resulting binaries beyond time ldd spent on linking them, which is negligible in general. 2) What was the deviation between runs of the same test? As a hobbyist developer I have observed much greater variance in both load and compile times just from the lightest background or foreground process or whether the involved files were already cached by the kernel. 3) Considering that Xorg and its modules are currently set up to always use lazy binding, this would mean not just keeping -fno-plt in flags but also undoing `append-ldflags -Wl,-z,lazy` to use Gentoo's default immediate binding - correct? All in all, I think actual runtime performance data is required, and, while the -fno-plt flag is definitely interesting and perhaps worth considering for inclusion into Gentoo's default CFLAGS just for the alleged compile time reduction alone, the DDX auto-loading bug is not the right place for that discussion. Perhaps this could be continued in a new bug report, if someone is interested in pushing that for Gentoo?
(In reply to Niklāvs Koļesņikovs from comment #32) > I don't want to be a drag talking above their level no less, but to recap: If the discussion helps to understand -fno-plt better, I think there's no problem, and I'll be happy to provide more explanation as needed. > 1) Reading the test description, I'm getting confused. At first it seems > that what's being tested is the impact of -fno-plt flag using the same > patched compiler but that then the results would completely omit data on > actual runtime performance of the resulting binaries beyond time ldd spent > on linking them, which is negligible in general. Please keep in mind that this email is to gcc-patches, implementing -fno-plt for GCC and using LLVM as evaluation vehicle. GCC is patched to support -fno-plt, LLVM is compiled with patched GCC, and performance of resulting LLVM binaries is evaluated. LLVM itself did not support -fno-plt at the time. Note that the last test ('-O2 -g tramp3d compilation') is compiling one C++ source file that is quite large. It is one invocation of LLVM, not many. The part in your sentence starting with 'the results would completely omit...' is unclear to me, sorry. > 2) What was the deviation between runs of the same test? As a hobbyist > developer I have observed much greater variance in both load and compile > times just from the lightest background or foreground process or whether the > involved files were already cached by the kernel. I ran those tests when involved files were indeed cached, and the machine was otherwise idle, and I remember the obtained figures to be repeatable. I did not keep exact numbers, but in the last workload (-O2 -g tramp3d compile) I remember deviation to be more than order of magnitude smaller than PLT/no-PLT difference. > 3) Considering that Xorg and its modules are currently set up to always use > lazy binding, this would mean not just keeping -fno-plt in flags but also > undoing `append-ldflags -Wl,-z,lazy` to use Gentoo's default immediate > binding - correct? With -fno-plt lazy binding is impossible, so it does not matter if the ELF file bears the BIND_NOW flag (i.e. it does not matter if -z lazy or -z now is in effect): symbol resolution will be non-lazy on Glibc. musl libc, which never implemented lazy binding, has "deferred binding" to support this Xorg use case (abusing lazy binding for loading modules in arbitrary order). It requires modules to be compiled with -Wl,-z,lazy (for BIND_NOW flag to be absent). As a result, append-ldflags -Wl,-z-lazy should stay, as otherwise musl libc users (regardless if they use -fno-plt) will be forced to manually specify module load order again. > All in all, I think actual runtime performance data is required, and, while > the -fno-plt flag is definitely interesting and perhaps worth considering > for inclusion into Gentoo's default CFLAGS just for the alleged compile time > reduction alone, the DDX auto-loading bug is not the right place for that > discussion. Perhaps this could be continued in a new bug report, if someone > is interested in pushing that for Gentoo? Here's my take on this: Whether -fno-plt buys performance or security should not matter how you deal with this bug. One of important aspects of Gentoo is that it allows people to test such compiler flags on a wide scale of packages with awesome automation. It would be really sad if every time a package breaks with some obscure compiler flag, the "resolution" to that is stripping that flag in that ebuild. I acknowledge that sometimes stripping flags is the best compromise, but IMO this is not such case. The workaround is not difficult (specifying module load order in a config file), and it may be reasonable to say that the user passing -fno-plt should be responsible for setting up Xorg appropriately. The only thing missing is documentation and a reminder in a post-install message for novice users to do that step. Of course I am saying that with zero experience as Gentoo maintainer, and Matt's perspective may be different.
Niklāvs, what is the story on your side here — in comment #1 you seem to imply you had those lines in your Xorg config. Did you need that because you also compiled with -fno-plt, or for some other reason?
Kinda irrelevant so I did not bother specifying it before but I'm now using Wayland, so I merely looked at my old xorg.conf and confirmed the presence of the workaround there. As for why it's there, it's probably related to bug 661502, though I manually found the missing symbols and added their SOs to the list of modules to load, thinking that was just normal Xorg behavior.
To be clear, -fno-plt lets you claw back some of the load time you lose because of BIND_NOW and non-lazy binding. We force non-lazy binding on hardened profiles. We might one day do it on all profiles, I'm undecided. It's a safe flag to use *IF* you're using RTLD_NOW. It's unsafe otherwise. The patch in the PR forces the two to match.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3cc0092c3e6ce8988912fce260932639e5554d17 commit 3cc0092c3e6ce8988912fce260932639e5554d17 Author: Niklāvs Koļesņikovs <89q1r14hd@relay.firefox.com> AuthorDate: 2021-03-28 12:14:45 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-06-18 02:45:14 +0000 xorg-3.eclass: strip -fno-plt from *FLAGS As discussed in #778494, the GCC flag -fno-plt will break lazy binding, which appears to still be necessary for Xorg. Stripping the offending flag out is the next best solution for reliable user experience on Gentoo. Closes: https://github.com/gentoo/gentoo/pull/20166 Closes: https://bugs.gentoo.org/778494 Signed-off-by: Niklāvs Koļesņikovs <89q1r14hd@relay.firefox.com> Signed-off-by: Sam James <sam@gentoo.org> eclass/xorg-3.eclass | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
(In reply to Sam James from comment #36) > To be clear, -fno-plt lets you claw back some of the load time you lose > because of BIND_NOW and non-lazy binding. We force non-lazy binding on > hardened profiles. We might one day do it on all profiles, I'm undecided. > > It's a safe flag to use *IF* you're using RTLD_NOW. It's unsafe otherwise. > The patch in the PR forces the two to match. ... and as far as I can tell, this is really just about making things consistent with the forcing lazy binding anyway. If someone wants to work on improved docs or some way to fix the module ordering (a PR upstream or something), go wild. I don't see possible future improvements as a reason to leave it broken now.