I migrated my prefix according to https://www.gentoo.org/support/news-items/2023-01-28-rap-prefix-sysroot.html Actually I just started out rolling system using `emerge --ask --verbose --update --newuse --backtrack=1000 -b --oneshot --keep-going --oneshot -j 8 @world`, and with glibc upgrade things went broken, so I check the news, use the last suggestion to manually prefixify the GNU ld script, and then proceeded the migration without any issue. However after migration succeeded, some of compilation using bfd linker in user space went wrong. I no longer understood the linker's behavior. Simply compiling a program using cblas functions: ```c #include "stdio.h" #include "cblas.h" int main() { const double x[3] = {1,2,3}; const double y[3] = {1,-2,3}; double result = cblas_ddot(3, x, 1, y, 1); printf("%lf\n", result); return 0; } ``` run `gcc -fuse-ld=bfd foo.c -lcblas` I got ``` /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: warning: libgfortran.so.5, needed by /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so, not found (try using -rpath or -rpath-link) /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: warning: libquadmath.so.0, needed by /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so, not found (try using -rpath or -rpath-link) /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_stop_string@GFORTRAN_8' /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_st_write@GFORTRAN_8' /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_string_len_trim@GFORTRAN_8' /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_transfer_character_write@GFORTRAN_8' /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_transfer_integer_write@GFORTRAN_8' /opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld.bfd: /opt/gentoo/usr/lib64/libblas.so.3: undefined reference to `_gfortran_st_write_done@GFORTRAN_8' collect2: error: ld returned 1 exit status ``` Using mold, lld, gold there's no issue. If I perform `ln -sn "${EPREFIX}" "${EPREFIX}${EPREFIX}"`, then bfd will also compile successfully. Reproducible: Always Steps to Reproduce: 1. Migrate a RAP 2. emerge virtual/blas 3. compile a simple program with `-lcblas`
Created attachment 849437 [details] emerge --info output
I don't know what the cause is yet, but I have reproduced it. I think it affects the libraries installed with gcc, but seemingly not libstdc++.
Aha! % strace -f -e trace=file gcc -fuse-ld=bfd foo.c -lcblas 2>&1 | fgrep ld.so.conf [pid 1436] openat(AT_FDCWD, "/mnt/prefix/mnt/prefix/usr/mnt/prefix/etc/ld.so.conf", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 1436] openat(AT_FDCWD, "/mnt/prefix/mnt/prefix/etc/ld.so.conf", O_RDONLY) = -1 ENOENT (No such file or directory) This is similar to a cross problem I saw recently. I'll keep looking.
I think the problem is in binutils. I'll look some more tomorrow.
I'm seeing a similar issue when bootstrapping a new prefix.
Created attachment 849582 [details] tgbugs binutils build.log
Created attachment 849584 [details] tgbugs gentoo/tmp/bin/emerge --infos
I've made some progress, but it's not straightforward. The first problem is this hack in the prefix profile. It should be removed. https://gitweb.gentoo.org/repo/gentoo.git/tree/profiles/features/prefix/standalone/profile.bashrc?id=edf7231cee8509dcc346c3c21891ccb6fbd69602#n18 The second problem is that the paths in files like /mnt/prefix/etc/ld.so.conf.d/05gcc-x86_64-pc-linux-gnu.conf now need to be unprefixed. This fixes things when building, but then ldconfig picks up the wrong libraries when generating the cache. Before: > % ldconfig; ldconfig -p | fgrep fortran > libgfortran.so.5 (libc6,x86-64) => /mnt/prefix/usr/lib/gcc/x86_64-pc-linux-gnu/12/libgfortran.so.5 > libgfortran.so (libc6,x86-64) => /mnt/prefix/usr/lib/gcc/x86_64-pc-linux-gnu/12/libgfortran.so After: > % ldconfig; ldconfig -p | fgrep fortran > libgfortran.so.5 (libc6,x86-64) => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libgfortran.so.5 > libgfortran.so (libc6,x86-64) => /usr/lib/gcc/x86_64-pc-linux-gnu/12/libgfortran.so
I've been thinking about whether the paths in these ld.so.conf files should be prefixed or not, and hence, which direction should go in fixing this. I believe they should be unprefixed, otherwise the files will never make sense when cross-compiling. That presumably means a change to glibc, but that doesn't have any concept of a sysroot.
(In reply to James Le Cuirot from comment #9) > I've been thinking about whether the paths in these ld.so.conf files should > be prefixed or not, and hence, which direction should go in fixing this. I > believe they should be unprefixed, otherwise the files will never make sense > when cross-compiling. That presumably means a change to glibc, but that > doesn't have any concept of a sysroot. The fundamental inconsistency comes from the runtime loading (glibc) and build-time linking (binutils), which coincidentally refer to the same ld.so.conf files. They have to interpret the config file in the same why in order for a system to function.
(In reply to Benda Xu from comment #10) > (In reply to James Le Cuirot from comment #9) > > I've been thinking about whether the paths in these ld.so.conf files should > > be prefixed or not, and hence, which direction should go in fixing this. I > > believe they should be unprefixed, otherwise the files will never make sense > > when cross-compiling. That presumably means a change to glibc, but that > > doesn't have any concept of a sysroot. > > The fundamental inconsistency comes from the runtime loading (glibc) and > build-time linking (binutils), which coincidentally refer to the same > ld.so.conf files. They have to interpret the config file in the same why in > order for a system to function. Note that the above is only true for Prefix, not cross-compiling. In the latter, runtime loading is not prefixed and but build-time linking is prefixed.
(In reply to Benda Xu from comment #11) > (In reply to Benda Xu from comment #10) > > (In reply to James Le Cuirot from comment #9) > > > I've been thinking about whether the paths in these ld.so.conf files should > > > be prefixed or not, and hence, which direction should go in fixing this. I > > > believe they should be unprefixed, otherwise the files will never make sense > > > when cross-compiling. That presumably means a change to glibc, but that > > > doesn't have any concept of a sysroot. > > > > The fundamental inconsistency comes from the runtime loading (glibc) and > > build-time linking (binutils), which coincidentally refer to the same > > ld.so.conf files. They have to interpret the config file in the same why in > > order for a system to function. > > Note that the above is only true for Prefix, not cross-compiling. In the > latter, runtime loading is not prefixed and but build-time linking is > prefixed. If we want both cross-compiling and Prefix as this round of patches to achieve, there are 3 levels, 1. Cross-compile a Gentoo vanilla from a Prefix (BROOT) glibc: vanilla binutils: search for libraries from ESYSROOT=BROOT/usr/CHOST 2. Cross-compile a Prefix (EPREFIX) from a vanilla glibc: prefixed with EPREFIX binutils: search for libraries from ESYSROOT=/usr/CHOST/EPREFIX, because BROOT=/ 3. Cross-compile a Prefix_1 (EPREFIX) from a Prefix_0 (BROOT). glibc: prefixed with EPREFIX binutils: link from ESYSROOT=BROOT/usr/CHOST/EPREFIX If ld.so.conf is not prefixed, and glibc is hacked to automatically inject EPREFIX during runtime loading, the solution would be clean. But we lose the ability to load 3rd party libraries outside the EPREFIX. One way to fix this is to introduction a special grammar (like binutils) of "=/usr/" in ld.so.conf to mean "Prefix me!". Another way is to decouple the file ld.so.conf into runtime and build-time (automatically generated from the former) versions for glibc and binutils separately.
Continuing down the path of decoupling ld.so.conf, it needs the least upstream change: in Gentoo ld.so.conf is completely controlled by env-update and (few) eselect. The development can happen entirely in Gentoo ourselves.
Hi, I would like to see if there is any workaround for this issue to finish a bootstrap of Prefix on linux while it is open? Thanks!
(In reply to sinxccc from comment #14) > Hi, I would like to see if there is any workaround for this issue to finish > a bootstrap of Prefix on linux while it is open? Thanks! Try ```bash mkdir -p "${EPREFIX}${EPREFIX%/*}" ln -sn "${EPREFIX}" "${EPREFIX}${EPREFIX}" ```
(In reply to Benda Xu from comment #12) > > If ld.so.conf is not prefixed, and glibc is hacked to automatically inject > EPREFIX during runtime loading, the solution would be clean. But we lose > the ability to load 3rd party libraries outside the EPREFIX. One way to fix > this is to introduction a special grammar (like binutils) of "=/usr/" in > ld.so.conf to mean "Prefix me!". I had this idea too. I know it's a bit more work, but it's my preferred solution. I'll see if I can knock something up over the weekend. I believe such changes would be upstreamable. It's not like using ld.so.conf for both runtime and build-time is a Gentoo-specific thing. > Another way is to decouple the file ld.so.conf into runtime and build-time > (automatically generated from the former) versions for glibc and binutils > separately. I think that would still need some patching. How else would we tell ld to look in a different place?
(In reply to James Le Cuirot from comment #16) > (In reply to Benda Xu from comment #12) > > > > If ld.so.conf is not prefixed, and glibc is hacked to automatically inject > > EPREFIX during runtime loading, the solution would be clean. But we lose > > the ability to load 3rd party libraries outside the EPREFIX. One way to fix > > this is to introduction a special grammar (like binutils) of "=/usr/" in > > ld.so.conf to mean "Prefix me!". > > I had this idea too. I know it's a bit more work, but it's my preferred > solution. I'll see if I can knock something up over the weekend. I believe > such changes would be upstreamable. It's not like using ld.so.conf for both > runtime and build-time is a Gentoo-specific thing. I can imagine that is an "allow glibc to run on CBUILD just like CHOST" feature. That will be far more overreaching to allow glibc to be relocatable. > > Another way is to decouple the file ld.so.conf into runtime and build-time > > (automatically generated from the former) versions for glibc and binutils > > separately. > > I think that would still need some patching. How else would we tell ld to > look in a different place? It (surprisingly) has long been a feature of ld since 2006 [1]. Now it is handled by ld/ldelf.c[2]. Universally, ld first looks for /usr/etc/ld.so.conf and falls back to /etc/ld.so.conf. So writing a seperate /usr/etc/ (could be symlinked from /etc/ld/) will just work. 1. https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=dfcffada0bf3f6dfd1ba336fb1647694c55d4f22 2. https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldelf.c;h=2e27cf48a816dc78bd76d2f0185a601d2edfb392;hb=ef8f08ca13f6c111cc549a3e13be5c5e2d95ca82#l910
(In reply to Benda Xu from comment #17) > > I can imagine that is an "allow glibc to run on CBUILD just like CHOST" > feature. That will be far more overreaching to allow glibc to be > relocatable. Yes, I think I see your point. Although you can use ld.so.conf for both simultaneously, that's probably not upstream's intention. > It (surprisingly) has long been a feature of ld since 2006 [1]. Now it is > handled by ld/ldelf.c[2]. > > Universally, ld first looks for /usr/etc/ld.so.conf and falls back to > /etc/ld.so.conf. So writing a seperate /usr/etc/ (could be symlinked from > /etc/ld/) will just work. I had noticed that, but /usr/etc seems so weird, I thought it was a bug more than a feature. XD I guess we could go this way if you're happy with it. Maybe it would only need to be applied to prefix.
(In reply to Benda Xu from comment #13) > Continuing down the path of decoupling ld.so.conf, it needs the least > upstream change: in Gentoo ld.so.conf is completely controlled by env-update > and (few) eselect. The development can happen entirely in Gentoo ourselves. For the record, /etc/ld.so.conf is written by env-update, but /etc/ld.so.conf.d/05gcc-${CHOST}.conf is written by gcc-config. The other files in there typically come from eselect. Unfortunately, I think we'll have to fix this on a case by case basis, although such files are rare. Anyone who manually adds custom entries there is probably playing with fire anyway.
Progress report. Before rushing in and fixing this, I wanted to check how it would (or wouldn't) affect musl and other linkers. musl doesn't use ld.so.conf at all, but it would still be used by bfd when building. Other linkers, such as lld and mold, do not use ld.so.conf either. I haven't checked gold yet, but I suspect it shares code with bfd. It's interesting that other linkers do not use ld.so.conf. This means that eselect-blas does not change which blas library these linkers use at build time, only which is used at runtime. The linkers would always use sci-libs/lapack.
(In reply to James Le Cuirot from comment #20) > Progress report. > > Before rushing in and fixing this, I wanted to check how it would (or > wouldn't) affect musl and other linkers. musl doesn't use ld.so.conf at all, > but it would still be used by bfd when building. Other linkers, such as lld > and mold, do not use ld.so.conf either. I haven't checked gold yet, but I > suspect it shares code with bfd. (In reply to Yiyang Wu from comment #0) > Using mold, lld, gold there's no issue. In my test gold works normally. > > It's interesting that other linkers do not use ld.so.conf. This means that > eselect-blas does not change which blas library these linkers use at build > time, only which is used at runtime. The linkers would always use > sci-libs/lapack. In my case `gcc -fuse-ld={mold,gold,ldd} -lcblas` links to the correct cblas implementation.
(In reply to Yiyang Wu from comment #21) > > In my test gold works normally. You're right, I can confirm that gold works. Strange that it behaves differently, despite being part of binutils. > > It's interesting that other linkers do not use ld.so.conf. This means that > > eselect-blas does not change which blas library these linkers use at build > > time, only which is used at runtime. The linkers would always use > > sci-libs/lapack. > > In my case `gcc -fuse-ld={mold,gold,ldd} -lcblas` links to the correct cblas > implementation. Are you sure about that? ldd will show the right one, as chosen at runtime, but my own testing shows that it always uses /usr/lib/libcblas.so at build time. $ strace -f -e trace=file gcc -fuse-ld=bfd foo.c -lcblas -o blas 2>&1 | egrep "libc?blas" [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/libcblas.so", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/libcblas.a", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so", O_RDONLY) = 8 [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so", O_RDONLY) = 9 [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/libblas.so.3", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 21099] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/32/libblas.so.3", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 21099] openat(AT_FDCWD, "/usr/lib64/blas/blis/libblas.so.3", O_RDONLY) = 20 $ ldd blas | fgrep blas libcblas.so.3 => /usr/lib64/blas/blis/libcblas.so.3 (0x00007ff3523df000) $ strace -f -e trace=file gcc -fuse-ld=lld foo.c -lcblas -o blas 2>&1 | egrep "libc?blas" [pid 21114] access("/usr/lib/gcc/x86_64-pc-linux-gnu/12/libcblas.so", F_OK) = -1 ENOENT (No such file or directory) [pid 21114] access("/usr/lib/gcc/x86_64-pc-linux-gnu/12/libcblas.a", F_OK) = -1 ENOENT (No such file or directory) [pid 21114] access("/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so", F_OK) = 0 [pid 21114] openat(AT_FDCWD, "/usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/libcblas.so", O_RDONLY|O_CLOEXEC) = 3 $ ldd blas | fgrep blas libcblas.so.3 => /usr/lib64/blas/blis/libcblas.so.3 (0x00007fdb702f5000)
Just to add to that point, if I uninstall lapack, but keep blis, this happens: $ gcc -I/usr/include/blis -fuse-ld=lld foo.c -lcblas -o blas ld.lld: error: unable to find library -lcblas collect2: error: ld returned 1 exit status I now realise that eselect-blas is only supposed to affect runtime rather than build time though. That's why sci-libs/lapack appears twice in the dependencies of virtual/blas, the first being unconditional.
(In reply to James Le Cuirot from comment #22) > (In reply to Yiyang Wu from comment #21) > > In my case `gcc -fuse-ld={mold,gold,ldd} -lcblas` links to the correct cblas > > implementation. > > Are you sure about that? ldd will show the right one, as chosen at runtime, > but my own testing shows that it always uses /usr/lib/libcblas.so at build > time. You're right, ldd shows the right one, but at build time other linker does not take a look at the desired libcblas.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=80d8703c52f23ca672a0e690f9daa4aff6520ee1 commit 80d8703c52f23ca672a0e690f9daa4aff6520ee1 Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2023-02-09 23:04:15 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2023-02-11 22:27:24 +0000 profiles: Don't prefixify /etc/ld.so.conf path in binutils Now that the compiler's sysroot is being respected, prefixifying the path to /etc/ld.so.conf results in a double prefix. Bug: https://bugs.gentoo.org/892549 Signed-off-by: James Le Cuirot <chewi@gentoo.org> profiles/features/prefix/standalone/profile.bashrc | 10 ---------- 1 file changed, 10 deletions(-)
Note that the above does not fix the issue. It only deals with part of it, although we may disable the bfd ld.so.conf feature altogether in the end. See https://github.com/gentoo/binutils-gdb/pull/4 for the real fix, which we can hopefully land soon.
(In reply to James Le Cuirot from comment #23) > Just to add to that point, if I uninstall lapack, but keep blis, this > happens: > > $ gcc -I/usr/include/blis -fuse-ld=lld foo.c -lcblas -o blas > ld.lld: error: unable to find library -lcblas > collect2: error: ld returned 1 exit status > > I now realise that eselect-blas is only supposed to affect runtime rather > than build time though. That's why sci-libs/lapack appears twice in the > dependencies of virtual/blas, the first being unconditional. Exactly. The ABIs are interchangeable.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=07d598347c2a311c91eacd4303e0517cf0a127c3 commit 07d598347c2a311c91eacd4303e0517cf0a127c3 Author: Sam James <sam@gentoo.org> AuthorDate: 2023-02-22 22:22:46 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-02-22 22:26:22 +0000 sys-devel/binutils: apply linker search path fixes from Chewi for prefix Quoting Chewi on the PR for posterity: """ The first of these changes fixes two related issues with prefixed and crossdev environments. The prefix issue is detailed in Gentoo bug #892549. The crossdev issue can be reproduced by trying something like: USE="-python icu" aarch64-unknown-linux-gnu-emerge libxml2 The second of these changes is not essential, but it does make bfd's behaviour in this area more consistent with the other linkers, which have not experienced these issues at all. I'm not sure what upstream will make of these changes, particularly the second one, but it is interesting that even gold does not behave the same way as bfd here. Perhaps we can give them some exposure in Gentoo for a while before seeing what they think. The second change would not be submitted upstream as-is because fully removing the ld.so.conf feature is a much bigger diff. """ This patch is, for now, only applied for prefix. It should be safe on other systems but the issue is more pressing on prefix given a recent migration. Bug: https://bugs.gentoo.org/892549 Thanks-to: James Le Cuirot <chewi@gentoo.org> Signed-off-by: Sam James <sam@gentoo.org> sys-devel/binutils/binutils-2.40-r2.ebuild | 503 +++++++++++++++++++++ sys-devel/binutils/binutils-9999.ebuild | 12 +- .../files/binutils-2.40-linker-search-path.patch | 74 +++ 3 files changed, 584 insertions(+), 5 deletions(-)
If you're hitting this issue, please try emerge --sync in a few hours, then emerge -v1 sys-devel/binutils.
Thanks, Sam, this had been weighing on my mind. If we get good feedback here, I'll update the news item.
(In reply to Sam James from comment #29) > If you're hitting this issue, please try emerge --sync in a few hours, then > emerge -v1 sys-devel/binutils. And don't forget to run `eselect binutils set x86_64-pc-linux-gnu-2.40` if the user have default to an older one (like one of my prefix), I got stuck for quite a while :( before realizing I'm still using the 2.39 binutils
Now with 07d598347c2a311c91eacd4303e0517cf0a127c3 ld.bfd is able to link my example without the need of double-prefix symlink hack. But I think this issue also affecting gdb, which is part of binutils-gdb project. Using sys-devel/gdb-13.1::gentoo, without double-prefix symlink hack, I ran ``` gcc -O0 -ggdb foo.c -o foo -lcblas gdb foo GNU gdb (Gentoo 13.1 vanilla) 13.1 ....... (gdb) run Starting program: /tmp/foo warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: Could not load shared library symbols for 9 libraries, e.g. /opt/gentoo/usr/lib64/blas/openblas/libcblas.so.3. Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? ``` I need to use either double-prefix symlink hack, or `set sysroot /` in gdb to mitigate this issue.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/proj/prefix.git/commit/?id=f4bae8f7128a0a7977d4cf765f21301a2275f32e commit f4bae8f7128a0a7977d4cf765f21301a2275f32e Author: Sam James <sam@gentoo.org> AuthorDate: 2023-03-01 00:51:07 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-03-01 00:51:07 +0000 sys-devel/binutils: add 2.40(-r2) Bug: https://bugs.gentoo.org/895240 Bug: https://bugs.gentoo.org/892549 Signed-off-by: Sam James <sam@gentoo.org> sys-devel/binutils/Manifest | 2 + sys-devel/binutils/binutils-2.40-r2.ebuild | 509 +++++++++++++++++++++ .../files/binutils-2.40-linker-search-path.patch | 74 +++ 3 files changed, 585 insertions(+)
Let's deal with the gdb issue in bug 896008.
This change has brought another issue. I attached a tarball containing files for reproducing. Run make with the default binutils (using bfd linker) will result in: warning: libfoo.so, needed by libbar/libbar.so, not found (try using -rpath or -rpath-link) libbar/libbar.so: undefined reference to `funca(int)' While specifying CXXFLAGS=-fuse-ld=gold does not have the problem. So, when exe uses symbol from libbar.so but do not depend on libfoo.so, and libbar.so depend on libfoo.so, ld.bfd tries to find libfoo.so when linking exe. But if libfoo.so is not in standard path, nor not explicitly specify -L, then it cannot be found, and throw `libbar.so: undefined reference to <symbol in libfoo used by libbar> when linking exe. This issue is very rare in the past, because the path of libfoo can be found in ld.conf, or LD_LIBRARY_PATH, or even rpath of libbar, ld.bfd will ultimately find out the location of libfoo. But now we patched out ldelf_check_ld_so_conf, which is one step in ldelf_handle_dt_needed in ld/ldelf.c, then ld.fd loose the ability to locate libraries in ld.conf. I guess, the fundamental cause is in ldelf_handle_dt_needed, ld.bfd tracks down the dt_needed entries of all explicitly linked shared objects, like the runtime linker [1]. This is not an issue before we change anything, when libfoo.so is in ld.conf search path (because it is needed at runtime); but after the binutils patch [2], problems occurs when linking to libraries not in standard locations but are written in ld.conf. Other linkers does not have the problem, since they did not track down the dt_needed entry of libbar.so at all. So we have this remaining issue, and blocks some packages. For example some ROCm packages, which links to libamd_comgr, while libamd_comgr links to libLLVM not in /usr/lib/llvm/SLOT/lib64, so when linking to libamd_comgr, ld cannot find libLLVM and omit undefined reference for symbols in libLLVM used by libamd_comgr. [1] http://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldelf.c;h=f9a6819366f1ac634103bedd32844ed1868591be;hb=HEAD#l1026
Let's reopen this bug before the said regression is understood.
(In reply to Yiyang Wu from comment #35) but > after the binutils patch [2], [2] https://gitweb.gentoo.org/repo/proj/prefix.git/commit/?id=f4bae8f7128a0a7977d4cf765f21301a2275f32e
After some studying, my opinion is The other linkers does not have to behave exactly the same as bfd linker [1]. Currently, Gentoo uses bfd linker to build packages by default, so we need to make sure it is functional. Previously we decided to patch ld.bfd so it behaves more liker others. it turns out if we want go this way, we have to do more -- ld.bfd rely on ld.so.conf to find out libraries needed in runtime. If this part is patched out, maybe we also need to patch out the entire logic of check dt_needed entries in ld.bfd. Otherwise, https://bugs.gentoo.org/892549#c35 occurs. As I understand, the RAP migration is turning RAP to cross-compile-for-itself, and then ESYSROOT=SYSROOT/EPREFIX become double prefixed. Consider a general case of cross building, libraries (in DEPEND) should be found at ESYSROOT, and then the output under ESYSROOT should be copied to EPREFIX at target system. So if standalone RAP is really cross-compile-itself, then things should be compiled and installed to /prefix/prefix, and then *copied to /prefix at target system* (which is the same machine actually). From this point of view, the existence of double prefix symlink becomes a bit more reasonable -- the symlink serves as the "copy" process. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=10238
I haven't had a chance to really look at this yet, but I think the case you describe is just the kind we were trying to fix under prefix. In that case though, it was failing to find libstdc++. In this case, I think it's some other library. This begs the question of how do the other linkers find this library if it's not via /etc/ld.so.conf. I'll have a closer look later.
(In reply to James Le Cuirot from comment #39) > This begs the question of how do the other linkers find this > library if it's not via /etc/ld.so.conf. I'll have a closer look later. Other linker simply don't need to find this library, because the executable does not use any symbol from this library (libfoo). It is ld.bfd doing extra work when linking exe against libbar: searching for link dependency not of exe, but of libbar.
note that the ld.so.conf reading code is correct, and hence, should never have been removed. I've swapped out the context on what the proper fix for the problem that lead to its removal by now, so I apologize for the lack of rationale. I seem to remember a --with-sysroot!=/ for something whose sysroot was very much /, though.
(In reply to Yiyang Wu from comment #40) > Other linker simply don't need to find this library, because the executable > does not use any symbol from this library (libfoo). It is ld.bfd doing extra > work when linking exe against libbar: searching for link dependency not of > exe, but of libbar. You're right, I had observed but forgotten that the other linkers do not do this extra work. A symlink would serve as a workaround, but it seemed like a hack at the time. I'll consider it if we can't find a better way forwards. We could address this as necessary in any packages, but I get the sense you're wanting this to work in general, not just for packages. I'll see how feasible it would be to stub out the rest of the behaviour. Having bfd behave even more like the other linkers does not seem like a bad goal.
Progress so far. It was fairly easy to stub out the part where it traverses through sub-dependent libraries, but then it complains of unresolved symbols later. Still investigating.
Hi James, This bug breaking all the default installations of Prefix. I expect that we could fix it in a timely manner. Yours, Benda
Sorry, it's been hard to put the time in, although I didn't think it was happening frequently. Which package are you seeing it with? I'll give it another look today.
(In reply to James Le Cuirot from comment #45) > Sorry, it's been hard to put the time in, although I didn't think it was > happening frequently. Which package are you seeing it with? I'll give it > another look today. My observation of the issue is at https://bugs.gentoo.org/892549#c35: > I attached a tarball containing files for reproducing. Sorry I forgot the attachment. Will attach it right away. As far as I know in ::gentoo sci-libs/caffe2[cuda] suffers. Also, any packages that links to libamd_comgr provided by dev-libs/rocm-comgr::gentoo suffers. Also, any second-order-reverse-dependency of libs in non standard location like <prefix>/usr/lib64, including llvm, breaks, I suppose. > It (surprisingly) has long been a feature of ld since 2006. Now it is handled by ld/ldelf.c. Also, since the https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=07d598347c2a311c91eacd4303e0517cf0a127c3 removes the feature, as I understand, that makes some odd software that depends on this feature broke. Not a big issue in ::gentoo I guess, but on overlays they have such situations, like https://bugs.gentoo.org/553382, or I personally maintained sci-libs/openfoam[1]. I would say those packages should not rely on the feature, but this is the status.
Created attachment 871235 [details] bug_MWE
The tarball really helps, thank you. I don't think there are many real instances of this. There aren't that many libraries that get installed to subdirectories, and those that are are almost always direct dependencies. The only ones I could find on my own system involve LLVM. I did look at this a little today. Unfortunately, it seems I didn't make any notes about this earlier, or if I did, I lost them. I'll continue investigating this avenue some more, but I think I may end up taking a different route like generating a second ld.so.conf that's only used for linking. I need to go back over my IRC logs to remind myself if there were any downsides to that approach.
I'm now leaning towards the /usr/etc/ld.so.conf solution. If it were only the GCC libraries we needed to worry about then we could keep the first of my two binutils patches and just make this file blank. For LLVM to work too, we need to copy the entries from /etc/ld.so.conf sans prefix. That file is written by env-update (part of Portage) so the fix would be implemented there. Before I wrote my binutils patches, I prepared a similar fix for gcc-config, which writes /etc/ld.so.conf.d/05gcc-*.conf, but that probably isn't needed now.
I've submitted a change to Portage to make it write ${EPREFIX}/usr/etc/ld.so.conf as part of env-update. This seems to fix the issue, but you need to remove the second of the two binutils fixes first. I'd appreciate it if you could try this out. The file is only written when it thinks it's needed, so you may also need to add a dummy line to ${EPREFIX}/etc/ld.so.conf.
Come to think of it, it would have been really easy to get bfd to just not add the sysroot to these paths. That almost seems like a better idea, but not changing the behaviour wins out in my mind.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6010348df47c9b5bb8e2f3305b35f82f789aca36 commit 6010348df47c9b5bb8e2f3305b35f82f789aca36 Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2023-10-01 14:07:46 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2023-10-01 14:08:05 +0000 sys-devel/binutils: Drop ld.so.conf prefix patch and enable -L patch for cross The ld.so.conf prefix patch didn't work in all the cases we needed it to. We'll fix the issue with /usr/etc/ld.so.conf via env-update instead. The -L patch was previously only applied to prefixed systems, but it's needed to fix crossdev environments too. We should probably just take it into the general patchset. Bug: https://bugs.gentoo.org/892549 Signed-off-by: James Le Cuirot <chewi@gentoo.org> ...tils-2.40-r8.ebuild => binutils-2.40-r9.ebuild} | 4 ++- ...tils-2.41-r1.ebuild => binutils-2.41-r2.ebuild} | 4 ++- sys-devel/binutils/binutils-9999.ebuild | 4 ++- .../files/binutils-2.40-linker-search-path.patch | 36 ---------------------- 4 files changed, 9 insertions(+), 39 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=8008e209d900dc988217ce3721292ba895cd0494 commit 8008e209d900dc988217ce3721292ba895cd0494 Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2023-10-01 09:32:33 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2023-10-02 21:38:18 +0000 env-update: Write /usr/etc/ld.so.conf to fix bfd in some obscure cases This is only needed on prefixed systems. bfd currently reads ${EPREFIX}/etc/ld.so.conf and adds the prefix to these paths, but these paths are already prefixed. We need them to stay prefixed for ldconfig and the runtime linker. bfd will use ${EPREFIX}/usr/etc/ld.so.conf instead if that is present, so we can write the unprefixed paths there. Other linkers do not use these files at all. We tried to patch bfd to not use them either, as it shouldn't really be necessary, but that broke some cases, so we are trying this safer approach instead. env-update does not write the files under /etc/ld.so.conf.d, but we shouldn't need to handle these in any case, as all known instances are not affected by this issue. Bug: https://bugs.gentoo.org/892549 Closes: https://github.com/gentoo/portage/pull/1105 Signed-off-by: James Le Cuirot <chewi@gentoo.org> NEWS | 3 +++ lib/portage/util/env_update.py | 19 +++++++++++++++++++ 2 files changed, 22 insertions(+)
The Portage change has already gone into 3.0.52, so I think we can consider this fixed now.
Thank you very much, James :)
One question though: You don't receive that fix unless LDPATH get changed: The block for generating /usr/etc/ld.so.conf ``` + if eprefix: + # ldconfig needs ld.so.conf paths to be prefixed, but the bfd linker + # needs them unprefixed, so write an alternative ld.so.conf file for + # the latter. Other linkers do not use these files. See ldelf.c in + # binutils for precise bfd behavior, as well as bug #892549. + ldsoconf_path = os.path.join(eroot, "usr", "etc", "ld.so.conf") ``` is under this if caluse: ``` if oldld != newld: ```
(In reply to Yiyang Wu from comment #56) > One question though: > > You don't receive that fix unless LDPATH get changed: > > is under this if caluse: > > ``` > if oldld != newld: > ``` Maybe we should update the NEWS to tell user that they have to manually modify LDPATH and trigger a env-update?
Could have sworn I'd replied to this. I don't recall what I thought at the time, but my thinking now is that the latest issue and fix is too obscure to warrant a news item. You just seem very good at hitting these issues.
(In reply to James Le Cuirot from comment #58) > Could have sworn I'd replied to this. I don't recall what I thought at the > time, but my thinking now is that the latest issue and fix is too obscure to > warrant a news item. Sure. I have to thank you a lot for fixing all these issues! And hopefully people encountering the same issue can find and read through this bug ticket and get the solution. > You just seem very good at hitting these issues. No, just because I'm "lucky enough" to encounter these corner cases.
We could add a minor note in the portage NEWS file (in portage.git).
Sorry, I have to report yet another issue, which is discovered previously at https://bugs.gentoo.org/892549#c3 but might be forgotten: Currently (sys-devel/binutils-2.41-r2) are reading "${EPREFIX}/${EPREFIX}/usr/etc/ld.so.conf", actually. I did a GDB trace, it seems to be caused by ld/ldelf.c[1], already mentioned in https://bugs.gentoo.org/892549#c17; while in that line of code, ld_sysroot is "${EPREFIX}" and prefix is "${EPREFIX}/usr", thus giving "${EPREFIX}/${EPREFIX}/usr/etc/ld.so.conf" Sorry for re-discover that in a late manner. I forgot to remove the double-prefix link hack and this bug did not reveal itself to me until now. 1. https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldelf.c;h=2e27cf48a816dc78bd76d2f0185a601d2edfb392;hb=ef8f08ca13f6c111cc549a3e13be5c5e2d95ca82#l910
(In reply to Yiyang Wu from comment #61) > Currently (sys-devel/binutils-2.41-r2) are reading > "${EPREFIX}/${EPREFIX}/usr/etc/ld.so.conf", actually. I hate to admit it, but I think you're right. I don't know how I missed this before, maybe I only tested cross-compiling. What a pain. I'll have a look.
I've had a think. Finding $prefix/etc/ld.so.conf is literally the only thing this is used for. Although "prefix" is a variable here, it is effectively hardcoded at build time via configure, genscripts.sh, and elf.em. For a Gentoo build, it will only ever be ${EPREFIX}/usr. Including ${EPREFIX} is unhelpful, not just because of the double prefix, but also because it means you cannot use this linker against some other prefix. Better to rely on the sysroot, which is dynamic, as we have been trying to do. I think it would therefore make sense to hardcode $prefix to /usr in elf.em. I'll give this a try.
I must remember to consider prefix-guest here though. That uses the host's libc and therefore does not set a sysroot. We don't want to use the host's other libraries, so we don't want to load the host's /usr/etc/ld.so.conf, and we should still include ${EPREFIX} in this case.
Please try out https://github.com/gentoo/gentoo/pull/34446.
(In reply to James Le Cuirot from comment #65) > Please try out https://github.com/gentoo/gentoo/pull/34446. Confirmed this fixes the double-prefix issue for finding <prefix>/usr/etc/ld.so.conf. Thank you very much for the careful analysis and quick response!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5d9341ed5b240e838abea81a582717aa92381dc6 commit 5d9341ed5b240e838abea81a582717aa92381dc6 Author: James Le Cuirot <chewi@gentoo.org> AuthorDate: 2023-12-23 14:50:47 +0000 Commit: James Le Cuirot <chewi@gentoo.org> CommitDate: 2024-01-06 11:47:39 +0000 sys-devel/binutils: Add conditional patch to fix ld.bfd prefix handling As before, this may make it into our patchset once it's been proven to work. Our track record here hasn't been great so far! Closes: https://bugs.gentoo.org/892549 Closes: https://github.com/gentoo/gentoo/pull/34446 Bug: https://github.com/gentoo/binutils-gdb/pull/5 Signed-off-by: James Le Cuirot <chewi@gentoo.org> sys-devel/binutils/binutils-2.41-r4.ebuild | 534 +++++++++++++++++++++ sys-devel/binutils/binutils-9999.ebuild | 9 +- .../files/binutils-2.41-linker-prefix.patch | 56 +++ 3 files changed, 597 insertions(+), 2 deletions(-)