Summary: | llvm-r1.eclass; llvm-r1_pkg_setup doesn't fix lld version | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Matt Jolly <kangie> |
Component: | Eclasses | Assignee: | Michał Górny <mgorny> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | kangie, llvm |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | foo-1.2.3.ebuild |
Description
Matt Jolly
![]() This is the whole purpose of this function. If your system compiler is clang-18, what the eclass does is that it ensures that it *stays that way* while the ebuild in question uses LLVM 17, so that the whole system is compiled consistently. The eclass was never meant to *override* the compiler. I'm not sure I follow. The eclass selects the llvm 17 slot when 18 is also installed. I get this, it's in the eclass docs and stable vs testing keywords. When pkg_setup is using llvm-r1_pkg_setup, `llvm_fix_clang_version` is called and it _amends CC and friends_ if they're set, but using whatever is first in PATH. _Then_ we prefix the eclass selected impl higher in path than the other llvm impls, which means that if CC was set you get "clang-18", but end up linking using ld.lld from slot 17 and _builds break_. I don't think I'm being a pedant here but I can't see why this situation is desirable at all. I can't see why if the eclass is managing the llvm/clang that we would want other impls available. This is a recipe for hard to debug issues; ask me how I know. Just as an example here's one of the issues I encountered when chromium was accidentally building with mixed impl bits and pieces, and what prompted me to switch to llvm-r1 in the first place: ``` ld.lld: error: obj/third_party/protobuf-javascript/protoc-gen-js/js_generator.o: Unknown attribute kind (91) (Producer: 'LLVM18.1.8' Reader: 'LLVM 17.0.6') ``` Well, then obviously the bug is that we're not fixing the LLD version to match. Look at it the other way around. You're dealing with a bad package that supports only LLVM 15. All your system is built using clang-18. Should that one package suddenly be built with 15? That's just a recipe for failure. Actually, I can't reproduce. FWICS clang picks ld.lld from its own directory, irrespectively of PATH. $ export PATH=/usr/lib/llvm/18/bin:$PATH $ ld.lld --version LLD 18.1.8 (compatible with GNU linkers) $ clang -x c - -Wl,--version </dev/null LLD 18.1.8 (compatible with GNU linkers) $ clang-19 -x c - -Wl,--version </dev/null LLD 19.0.0, compatible with GNU linkers Chromium, being a google product, has a NIH way of doing this; GN invokes a wrapper script that invokes the linker. The example I provided may have come from when we were still using llvm.eclass as well - it's been a frantic few days of ebuild tinkering - but is an example of what can happen when things get out of sync. I wouldn't stress too much about it; prefixing the PATH will catch this case and I have a workaround in place for chromium where we call llvm-r1_pkg_setup twice. I still feel like changing the order of operations in `llvm-r1_pkg_setup` achieves the eclass goal: If a user is providing 'CC="clang"' it's probably safe to assume that they don't particularly care about the version and that whichever impl the eclass has selected should be used to make that more specific. This also applies if we need to override a GCC for some reason. If they're providing 'clang-19' we shouldn't touch it, but I can't see that prefixing the path for a selected impl of 17 would make 'clang-19' evaluate differently. I'd expect that if a user set 'clang-19' and the USE_EXPAND was something else that we'd be in an odd situation anyway. I'm not sure what to do about this. As I see it, the point of the USE_EXPAND is that users can easily manage the specific implementation selection system-wide or per-package, or that they can just ignore it and let profiles and ebuilds select appropriate impls to build their software. > Look at it the other way around. You're dealing with a bad package that > supports only LLVM 15. All your system is built using clang-18. > Should that one package suddenly be built with 15? Why wouldn't that package continue being built against its supported LLVM implementation as long as it's in-tree? We provide users with slotted LLVM after all. As a user I would expect the maintainer of a package to set appropriate LLVM_COMPAT for what the package actually supports. If the package is unable to be updated to support an in-tree LLVM (and it can't build with GCC for some reason) we'd have to remove it from the tree, like we would with any stale package. > That's just a recipe for failure. Why? I'm sure there's some disconnect here. If a user has specified a non-selected impl globally and we remove it from PATH that would cause issues, but I feel like that already conflicts with the concept of LLVM_COMPAT and eclasses making sure we present a sane and supported environment. Maybe we can do better? Well, if Google is doing stupid stuff, then Google needs to fix their stupid stuff. Clang behaves correctly here, so it's up to build system and/or ebuild to select the right linker. The eclass was never meant to select *the compiler*, nor ebuilds were ever really meant to do that. Just like there's no magical eclass to force a different version of GCC, you are just supposed to respect CC/CXX and do not try to override it because Google thinks they know better. While I'm not aware of any risk right now (and I think people are paying more attention to this these days), the reason is simple: different versions may produce ABI-incompatible code (think of the days back when GCC produced different ABI based on -std=). Forcing a different compiler version would produce code that's incompatible with the linked libraries. Or more realistically these days, static libraries with LLVM bytecode — if you force an older compiler than that used to produce the bytecode, things will break. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4fa8d71be382cc4342280c654958441a1ef1e9fb commit 4fa8d71be382cc4342280c654958441a1ef1e9fb Author: Matt Jolly <kangie@gentoo.org> AuthorDate: 2024-11-30 03:35:01 +0000 Commit: Matt Jolly <kangie@gentoo.org> CommitDate: 2024-11-30 04:08:42 +0000 www-client/chromium: do a better job of forcing Clang When enabling the Rust eclass, we started directly using `llvm-r1_pkg_setup`, assuming that this combination would be sufficicent, however due to forcing `CC` (etc) to variations on `${CHOST}-clang" _before_ calling `llvm-r1_pkg_setup`, these would always be forced to the newest version in `PATH` instead of the one matching `LLVM_SLOT` due to the eclass fixing the version before doing any `PATH` manipulation. To ensure a consistent build environment, we will: 1. Explicitly include `-${LLVM_SLOT}` in `CC`, `CPP`, `CXX` 2. Set these variables (and `AR` and `NM`) after `llvm-r1_pkg_setup` has done its PATH manipulation. Bug: https://bugs.gentoo.org/935689 Signed-off-by: Matt Jolly <kangie@gentoo.org> .../chromium/chromium-130.0.6723.116-r1.ebuild | 20 ++++++++------------ www-client/chromium/chromium-131.0.6778.85.ebuild | 16 ++++++++-------- www-client/chromium/chromium-132.0.6834.15.ebuild | 16 ++++++++-------- www-client/chromium/chromium-133.0.6847.2.ebuild | 16 ++++++++-------- 4 files changed, 32 insertions(+), 36 deletions(-) |