Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 811726

Summary: linux-info.eclass: packages fail with "Could not detect kernel version" with out of tree kernel
Product: Gentoo Linux Reporter: David Flogeras <dflogeras2>
Component: EclassesAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED FIXED    
Severity: normal CC: dharding, floppym, sam, tom_gentoo
Priority: Normal Keywords: PATCH
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Fix linux-info.eclass get_version() function with separate kernel output directory
Pass need-config= to make in getfilevar
Rework get_running_version

Description David Flogeras 2021-09-05 10:54:58 UTC
First of all I am assuming it is because I build my kernel out of tree.  I have been doing out of tree for years now.  Some packages will warn that they couldn't check certain features are enabled because they don't know where that is, but they continue to compile saying it is up to me to make sure.

Recently those packages now just fail with "Could not detect kernel version" instead.  I have bisected to commit d1ea596f034285cd044006b107a60902ea059793 where the behaviour changes.

I can work around it with KBUILD_OUTPUT=/path/to/my/build/tree emerge blah, but wondering if this was intended.

A good run, prior to the change would look like:
-------
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Found sources for kernel version:
 *     5.10.61-gentoo
 * Unable to check for the following kernel config options due
 * to absence of any configured kernel sources or compiled
 * config:
------

Now it becomes:
------
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Could not detect kernel version.
 * Please ensure that /usr/src/linux points to a complete set of Linux sources.
 * Unable to calculate Linux Kernel version for build, attempting to use running version
 * ERROR: www-client/chromium-93.0.4577.63::gentoo failed (setup phase):
 *   Unable to determine any Linux Kernel version, please report a bug


Which is a bit interesting because it seems to try harder to find the kernel using the running kernel, and indeed, /lib/modules/{uname -r}/ contains valid symlinks to both the src and build trees (so it should even be able to check my .config).

Reproducible: Always
Comment 1 Mike Gilbert gentoo-dev 2021-09-05 14:05:58 UTC
With a couple of tweaks to the eclass, we can see this error from the make command in getfilevar().

> ++ echo -e 'e:\n\t@echo $(VERSION)\ninclude Makefile'
> +++ nonfatal emake -C /home/floppym/kernel/source M=/x/portage/app-misc/foo-0/temp -s -f -
> +++ ___eapi_has_nonfatal
> +++ [[ ! 7 =~ ^(0|1|2|3)$ ]]
> +++ [[ 7 -lt 1 ]]
> +++ PORTAGE_NONFATAL=1
> +++ emake -C /home/floppym/kernel/source M=/x/portage/app-misc/foo-0/temp -s -f -
> make -j6 -C /home/floppym/kernel/source M=/x/portage/app-misc/foo-0/temp -s -f - 
> 
>   ERROR: Kernel configuration is invalid.
>          include/generated/autoconf.h or include/config/auto.conf are missing.
>          Run 'make oldconfig && make prepare' on kernel src to fix it.
> 
> make: *** [Makefile:718: include/config/auto.conf] Error 1
Comment 2 David Flogeras 2021-09-05 15:56:21 UTC
Hmm, I just tried a make oldconfig/prepare, then repeated my emerge with the same failed result here.
Comment 3 Mike Gilbert gentoo-dev 2021-09-05 16:40:24 UTC
I wasn't saying that there is anything wrong with your config.

The problem is that the getfilevar() eclass function does not work properly when it cannot find the kernel build directory.
Comment 4 Mike Gilbert gentoo-dev 2021-09-05 16:54:00 UTC
Passing "need-config=" to the make command in getfilevar() seems to resolve this. However, I'm not sure if that will break other uses of this function.
Comment 5 Enne Eziarc 2021-09-05 20:47:26 UTC
Setting the KBUILD_SOURCE envvar instead of KBUILD_OUTPUT might help. The eclass doesn't do anything with that var and so the kernel Makefile becomes the single source of truth, which might (or not) be more reliable. I've been using it successfully for a long time in any case.

(FWIW I also see these errors randomly, but mine are due to bug 679158 which is different)
Comment 6 Mike Pagano gentoo-dev 2021-09-05 22:48:19 UTC
(In reply to Enne Eziarc from comment #5)
> Setting the KBUILD_SOURCE envvar instead of KBUILD_OUTPUT might help. The
> eclass doesn't do anything with that var and so the kernel Makefile becomes
> the single source of truth, which might (or not) be more reliable. I've been
> using it successfully for a long time in any case.
> 
> (FWIW I also see these errors randomly, but mine are due to bug 679158 which
> is different)

I revert that commit and I still see the error when I use an out of tree kernel
Comment 7 Mike Pagano gentoo-dev 2021-09-05 23:28:53 UTC
Ok, I'll post something to try here tomorrow.  Thanks for everyone's report.
Comment 8 Daniel Harding 2021-09-06 13:06:10 UTC
Created attachment 737926 [details, diff]
Fix linux-info.eclass get_version() function with separate kernel output directory

I was hitting this issue myself, and found that it was necessary for KBUILD_OUTPUT to be present in the environment when running make from get_file_var().  Otherwise, the Makefile will give the error "Kernel configuration is invalid."  The attached patch fixes the problem for me, but I'm not sure if it is the cleanest solution or not.
Comment 9 Alice Ferrazzi Gentoo Infrastructure gentoo-dev 2021-09-06 17:09:42 UTC
Thanks, Daniel for the patch
Comment 10 Mike Gilbert gentoo-dev 2021-09-06 17:29:32 UTC
Created attachment 737938 [details, diff]
Pass need-config= to make in getfilevar

Here's a patch that works without a build directory or .config file present at all. I think this is a better solution than Daniel's patch.
Comment 11 Mike Pagano gentoo-dev 2021-09-06 17:54:13 UTC
(In reply to Mike Gilbert from comment #10)
> Created attachment 737938 [details, diff] [details, diff]
> Pass need-config= to make in getfilevar
> 
> Here's a patch that works without a build directory or .config file present
> at all. I think this is a better solution than Daniel's patch.

Thank-you very much Mike.  This works through the testing I ran through.

You OK to commit this?
Comment 12 David Flogeras 2021-09-06 18:04:27 UTC
I also can confirm it makes things work as they did before for me. Thanks!

Does anyone know why the eclass is smart enough to find the sources via the running kernel (/lib/modules/xyz/source symlink) but it seems to require KBUILD_OUTPUT to find the .config even though there is also a /lib/modules/xyz/build symlink?
Comment 13 Larry the Git Cow gentoo-dev 2021-09-06 18:07:49 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ab005b2b0219f266a90891fa3cce8253014927db

commit ab005b2b0219f266a90891fa3cce8253014927db
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2021-09-06 17:18:05 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2021-09-06 18:07:03 +0000

    linux-info.eclass: getfilevar: pass need-config= to make
    
    This bypasses the config check in the kernel Makefile.
    
    Closes: https://bugs.gentoo.org/811726
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 eclass/linux-info.eclass | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 14 Mike Gilbert gentoo-dev 2021-09-06 18:24:13 UTC
(In reply to David Flogeras from comment #12)
> Does anyone know why the eclass is smart enough to find the sources via the
> running kernel (/lib/modules/xyz/source symlink) but it seems to require
> KBUILD_OUTPUT to find the .config even though there is also a
> /lib/modules/xyz/build symlink?

The eclass was coded to only use /lib/modules/.../{build,source} as a fallback if KERNEL_DIR is invalid. If you look at the linux-info_get_any_version function, it first calls get_version, and then calls get_running_version if that fails.

KERNEL_DIR might contain at a completely different kernel version from the running kernel.

We can't really use the version parsed from the kernel Makefile to locate the correct /lib/modules/.../build symlink because the user may have CONFIG_LOCALVERSION set in their config file, which would affect the "..." part of the path.

Also, they may have built several kernels using the same kernel sources, and we would have no way of knowing which one they want to use in ebuild kernel checks.

So, we really must rely on the user setting KBUILD_OUTPUT in make.conf if they are doing out-of-source kernel builds.
Comment 15 David Flogeras 2021-09-06 18:56:01 UTC
Makes sense, thanks for the very clear explanation.
Comment 16 jeremy mills 2021-09-07 01:04:08 UTC
this also solved https://bugs.gentoo.org/490328 at least in my case it did.
Comment 17 David Flogeras 2021-09-13 13:43:44 UTC
I think I'm still seeing a ripple effect of this, possibly.

I have a 32bit machine that I build binaries for in a chroot on a machine with more muscles.  This AM I tried to do a bunch of updates, and again get a failure (on the target 32bit machine):

>>> Running pre-merge checks for sys-libs/libomp-12.0.1
 * libomp-12.0.1.tbz2 MD5 SHA1 size ;-) ...                              [ ok ]
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Could not detect kernel version.
 * Please ensure that /usr/src/linux points to a complete set of Linux sources.
 * Unable to calculate Linux Kernel version for build, attempting to use running version
 * ERROR: sys-libs/libomp-12.0.1::gentoo failed (pretend phase):
 *   Unable to determine any Linux Kernel version, please report a bug
 * 
 * Call stack:


However, in this case, adding KBUILD_OUTPUT=/x/y/z did not work around the issue.  Both the build host, and the target machine were synced today so should have the latest linux-info.eclass changes.  Not sure how to diagnose further.
Comment 18 Mike Gilbert gentoo-dev 2021-09-13 14:05:14 UTC
(In reply to David Flogeras from comment #17)

Ok, so I think this might occur when get_running_version is able to find a Makefile in the kernel build directory, but not in the source directory.

Do either (or both) of these Makefiles exist?

/lib/modules/$(uname -r)/source/Makefile
/lib/modules/$(uname -r)/build/Makefile
Comment 19 David Flogeras 2021-09-13 14:53:10 UTC
Yep, both exist.

Something I did not mention, the target 32bit machine is running a 4.19.x kernel (hardware reasons prevent me from upgrading), while the build machine is running a 5.10.x kernel.

For something to try, I "eselect set kernel" to a configured 5.10.x tree on the target machine,  and it no longer fails when I emerge the offending package.  Switching back to 4.19.x and the problem comes back.  Is it possible that something is hardcoded in the .tbz2 package?  I used qxpak to extract the metadata, and grepped thru it and didn't see anything obvious, but I also don't have a clue what I'm doing :)
Comment 20 Mike Gilbert gentoo-dev 2021-09-13 14:57:41 UTC
Created attachment 739195 [details, diff]
Rework get_running_version

Could you give this patch a try?
Comment 21 Mike Gilbert gentoo-dev 2021-09-13 15:02:47 UTC
(In reply to David Flogeras from comment #19)
> Yep, both exist.

Regardless of the patch I just posted, I would like to debug this a bit more.

If you run the following, what happens?

> cd "/lib/modules/$(uname -r)/build"
> make kernelversion
Comment 22 David Flogeras 2021-09-13 15:19:59 UTC
(In reply to Mike Gilbert from comment #20)
> Created attachment 739195 [details, diff] [details, diff]
> Rework get_running_version
> 
> Could you give this patch a try?

Didn't seem to fix it here unfortunately.
Comment 23 David Flogeras 2021-09-13 15:20:09 UTC
(In reply to Mike Gilbert from comment #21)
> (In reply to David Flogeras from comment #19)
> > Yep, both exist.
> 
> Regardless of the patch I just posted, I would like to debug this a bit more.
> 
> If you run the following, what happens?
> 
> > cd "/lib/modules/$(uname -r)/build"
> > make kernelversion

4.19.205-gentoo
Comment 24 Mike Gilbert gentoo-dev 2021-09-13 15:26:02 UTC
(In reply to David Flogeras from comment #22)
> Didn't seem to fix it here unfortunately.

Are you sure you applied it correctly? I can't see how the revised function would fail.
Comment 25 Mike Gilbert gentoo-dev 2021-09-13 15:29:31 UTC
If you are using a binpkg, you will need to rebuild the binpkg with the patch applied.
Comment 26 David Flogeras 2021-09-13 15:37:29 UTC
Right, I didn't realize that.  Rebuilding the package (with portage patched), then emerge'ing on the target machine (with it's portage patched as well, even though it sounds like it doesn't need to be), and it gets past the check with:

 * libomp-12.0.1.tbz2 MD5 SHA1 size ;-) ...                              [ ok ]
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Could not detect kernel version.
 * Please ensure that /usr/src/linux points to a complete set of Linux sources.
 * Unable to calculate Linux Kernel version for build, attempting to use running version
Comment 27 Mike Gilbert gentoo-dev 2021-09-13 16:20:48 UTC
I reproduced the issue with linux-4.19.88.

The issue seems to be that the "need-config" Makefile variable used to be called "dot-config". It was renamed in 2042b5486bd311db67b85915ee6291905b72e270.

If we add "dot-config=0" to the make command in getfilevar, that should fix the version check logic for older kernels.
Comment 28 David Flogeras 2021-09-13 18:25:22 UTC
Aha, that makes sense why it "worked" when I switched to 5.10 sources.  Let me know if you need me to test anything else.
Comment 29 Larry the Git Cow gentoo-dev 2021-09-14 23:48:41 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=316e457b8f1f02304d28c75e6498c6664e000f68

commit 316e457b8f1f02304d28c75e6498c6664e000f68
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2021-09-13 16:24:31 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2021-09-14 23:48:05 +0000

    linux-info.eclass: getfilevar: pass dot-config=0 to make
    
    This disables the kernel config check for versions prior to 5.4.
    
    Closes: https://bugs.gentoo.org/811726
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 eclass/linux-info.eclass | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Additionally, it has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4459479bb524f980f77f82e3108e924077048713

commit 4459479bb524f980f77f82e3108e924077048713
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2021-09-13 14:46:57 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2021-09-14 23:47:53 +0000

    linux-info.eclass: rework get_running_version
    
    This function may fail if the version cannot be parsed from a Makefile
    found by following the /lib/modules/${KV_FULL}/{source,build} symlinks.
    Instead of failing, we should just split KV_FULL as a fallback.
    
    Also, simplify the existance checks for the kernel Makefile; if we can't
    find the kernel source directory, there is really no point in checking
    for the kernel build directory. The latter will probably contain a
    Makefile with no version information.
    
    Bug: https://bugs.gentoo.org/811726
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 eclass/linux-info.eclass | 47 ++++++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 27 deletions(-)