Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 613578 - x11-drivers/nvidia-drivers should allow parallel make for linux-mod_src_compile()
Summary: x11-drivers/nvidia-drivers should allow parallel make for linux-mod_src_compi...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Low normal (vote)
Assignee: Jeroen Roovers (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-22 22:34 UTC by Alex Efros
Modified: 2019-03-29 13:19 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
x11-drivers:nvidia-drivers-375.82:20181213-025131.log (x11-drivers:nvidia-drivers-375.82:20181213-025131.log,385.68 KB, text/plain)
2018-12-13 09:53 UTC, Jeroen Roovers (RETIRED)
Details
build.log.xz (badbuild1.log.xz,35.75 KB, application/x-xz)
2018-12-17 14:39 UTC, Alexander Bezrukov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Efros 2017-03-22 22:34:25 UTC
It looks like MAKEOPTS=-j1 can be safely removed from ebuild. I've just build x11-drivers/nvidia-drivers-375.26 multiple times with different USE combinations on 8-core CPU and all builds succeeded.

Current build times are very annoying, especially when you play with kernel config and have to `emerge @module-rebuild` after each kernel change. Last version of nvidia-drivers which builds fast (50 sec) was 358.16-r1 about a year ago. Next version 361.28 doubles build time (100 sec) and then half of year ago 367.57 doubles build time once again (210 sec). With -j1 removed current 375.26 builds in 90-110 sec (depending on USE flags) on my system (Core i7-2600K overclocked to 4.5GHz).
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2017-03-29 02:24:12 UTC
Your anecdotal evidence does not suggest Nvidia fixed their build system and neither does this:

# qlop -tH nvidia-drivers
nvidia-drivers: 2 minutes, 56 seconds for 323 merges

Do you really care whether it takes 4 minutes or 1.5 minute to compile? What if it fails to build with parallel make and you didn't properly check and you reboot and you find you have to do it again (and can't use your desktop environment in the mean time)?
Comment 2 Alex Efros 2017-03-29 03:25:26 UTC
(In reply to Jeroen Roovers from comment #1)
> Your anecdotal evidence does not suggest Nvidia fixed their build system and

Sure. When I've opened ebuild I expected to find comment with related bug number near -j1 line, to check original issue and try to find out is it was really fixed. But I didn't found that and opened this report in hope someone who know this issue better than me will check is it really fixed now or I just was lucky to build it in parallel without any issues.

> neither does this:
> 
> # qlop -tH nvidia-drivers
> nvidia-drivers: 2 minutes, 56 seconds for 323 merges

That says nothing. Here is mine:

# qlop -tH nvidia-drivers
nvidia-drivers: 55 seconds for 256 merges

but real pain is here:

# qlop -g nvidia-drivers | tail
nvidia-drivers: Thu Mar 23 03:28:44 2017: 88 seconds
nvidia-drivers: Thu Mar 23 03:42:38 2017: 91 seconds
nvidia-drivers: Mon Mar 27 17:06:09 2017: 278 seconds
nvidia-drivers: Mon Mar 27 18:20:30 2017: 270 seconds
nvidia-drivers: Mon Mar 27 19:16:20 2017: 270 seconds
nvidia-drivers: Mon Mar 27 19:46:11 2017: 330 seconds
nvidia-drivers: Mon Mar 27 20:12:53 2017: 275 seconds
nvidia-drivers: Mon Mar 27 20:40:30 2017: 274 seconds
nvidia-drivers: Mon Mar 27 20:59:22 2017: 222 seconds
nvidia-drivers: 256 times

> Do you really care whether it takes 4 minutes or 1.5 minute to compile? What

Yes, I do. After all, it's a Gentoo, which is expected to be optimized and fast. And even repoman get angry on -j1 in ebuilds, so I suppose it's not a normal thing and shouldn't be used "just in case", without known issues. Maybe nvidia-drivers is an exception, but, again, this may be noted as comment in ebuild.

> if it fails to build with parallel make and you didn't properly check and
> you reboot and you find you have to do it again (and can't use your desktop
> environment in the mean time)?

Actually I expect to have failed build, not successful build with broken nvidia driver installed. I've seen several packages which required -j1, and in all cases missing of -j1 result in (sometimes) broken build. And if build fails - that's okay, then we will know for sure it's still broken, add -j1 back and create an issue with evidence for future references.

That's all I want to see: either no -j1 in ebuild, or open issue describing what's broken and why we need -j1 until it's fixed. If there is such an issue and I just failed to find it - please give me a bug number and close this one.
Comment 3 Matthias Maier gentoo-dev 2017-03-29 06:02:22 UTC
> Yes, I do. After all, it's a Gentoo, which is expected to be optimized and
> fast.

Compiling the nvidia glue code as fast as possible is not exactly a goal of Gentoo linux. Guaranteeing a *reliable* result (i.e. a minimum of compilation errors) is.

Be warned that using "optimized and fast" as an argument might not necessarily be perceived as a particularly strong point in a discussion [1,2].

[1] Have a look at https://fun.irq.dk/funroll-loops.org/ for some historical amusement.

[2] Independent of the question whether the "-j1"-workaround used in nvidia-dirvers should be re-evaluated or not.
Comment 4 Larry the Git Cow gentoo-dev 2018-12-13 01:24:18 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7848d6701921b7959502d0b9cdd84344a1f5bf0d

commit 7848d6701921b7959502d0b9cdd84344a1f5bf0d
Author:     Jeroen Roovers <jer@gentoo.org>
AuthorDate: 2018-12-13 01:23:45 +0000
Commit:     Jeroen Roovers <jer@gentoo.org>
CommitDate: 2018-12-13 01:24:14 +0000

    x11-drivers/nvidia-drivers: Fix parallel make
    
    Fixes: https://bugs.gentoo.org/613578
    Package-Manager: Portage-2.3.52, Repoman-2.3.12
    Signed-off-by: Jeroen Roovers <jer@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-13 09:46:06 UTC
(In reply to Larry the Git Cow from comment #4)
> The bug has been closed via the following commit(s):
> 
> https://gitweb.gentoo.org/repo/gentoo.git/commit/
> ?id=7848d6701921b7959502d0b9cdd84344a1f5bf0d
> 
> commit 7848d6701921b7959502d0b9cdd84344a1f5bf0d
> Author:     Jeroen Roovers <jer@gentoo.org>
> AuthorDate: 2018-12-13 01:23:45 +0000
> Commit:     Jeroen Roovers <jer@gentoo.org>
> CommitDate: 2018-12-13 01:24:14 +0000
> 
>     x11-drivers/nvidia-drivers: Fix parallel make
>     
>     Fixes: https://bugs.gentoo.org/613578
>     Package-Manager: Portage-2.3.52, Repoman-2.3.12
>     Signed-off-by: Jeroen Roovers <jer@gentoo.org>
> 
>  x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I did a test run on the same change in all current ebuilds and it went wrong once in x11-drivers/nvidia-drivers-375.82.

/bin/sh: line 1: /home/jer/portage/x11-drivers/nvidia-drivers-375.82/work/kernel/.tmp_versions/nvidia.mod: No such file or directory
make[2]: *** [scripts/Makefile.build:541: /home/jer/portage/x11-drivers/nvidia-drivers-375.82/work/kernel/nvidia.o] Error 1
make[2]: *** Waiting for unfinished jobs....

But maybe that's something Nvidia already fixed. I do remember reading something to that effect a while ago. Reopening.
Comment 6 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-13 09:53:42 UTC
Created attachment 557708 [details]
x11-drivers:nvidia-drivers-375.82:20181213-025131.log

Test setup:

#!/bin/sh

for j in nvidia-drivers-*.ebuild; do
        for i in `seq 1 20`; do
                USE="-static-libs -tools" \
                KERNEL_DIR=/usr/src/linux-4.10.13-gentoo \
                MAKEOPTS="-j20" \
                FEATURES=noauto \
                ebuild $j clean setup unpack prepare configure compile install &&
                echo "===== $j $i =====" ||
                break 2
        done
done
echo $j $i


It failed for 375.82 at the twelfth try.
Comment 7 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-13 09:59:40 UTC
Release notes for 355.06:
https://www.nvidia.com/Download/driverResults.aspx/88694/en-us

"Replaced the build system for the NVIDIA kernel modules and updated the installer package and nvidia-installer to use the new build system and kernel module source code layout. For more information about the new build system and layout, see the README document at:
ftp://download.nvidia.com/XFree86/packaging/linux/new-kbuild-for-355/"

Release notes for 384.59:
https://www.nvidia.com/Download/driverResults.aspx/120916/en-us

"Restored several sanity checks that were inadvertently removed from the kernel module build process in the 355.06 driver."


Perhaps this is what I am seeing in 375, but if true that should also affect 378 and 381.
Comment 8 Wojciech Arabczyk 2018-12-13 11:52:57 UTC
I've tested the 415.22 branch with a slightly modified version:

#!/bin/sh

for j in /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild; do
        for i in `seq 1 32`; do
                USE="-static-libs -tools" \
                KERNEL_DIR=/usr/src/linux \
                MAKEOPTS="-j$i" \
                FEATURES=noauto \
                ebuild $j fetch clean setup unpack prepare configure compile install &&
                echo "===== $j $i =====" ||
                break 2
        done
done
echo $j $i

It worked for every single case, finishing up with status:

===== /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild 32 =====
/usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild 32

So at least for the recent driver version this seems to be fixed.

kv:
4.18.19-gentoo
Comment 9 Larry the Git Cow gentoo-dev 2018-12-13 14:20:59 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=bad03706202759766c455e463956e28ef02ef204

commit bad03706202759766c455e463956e28ef02ef204
Author:     Jeroen Roovers <jer@gentoo.org>
AuthorDate: 2018-12-13 10:03:47 +0000
Commit:     Jeroen Roovers <jer@gentoo.org>
CommitDate: 2018-12-13 14:20:50 +0000

    x11-drivers/nvidia-drivers: Fix more parallel make
    
    Package-Manager: Portage-2.3.52, Repoman-2.3.12
    Bug: https://bugs.gentoo.org/613578
    Signed-off-by: Jeroen Roovers <jer@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-304.137.ebuild | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-384.130.ebuild | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-387.34.ebuild  | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild  | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-396.54.ebuild  | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-410.78.ebuild  | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-415.18.ebuild  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)
Comment 10 Alexander Bezrukov 2018-12-17 14:39:15 UTC
Created attachment 557980 [details]
build.log.xz

(In reply to Larry the Git Cow from comment #9)

> commit bad03706202759766c455e463956e28ef02ef204
> Author:     Jeroen Roovers <jer@gentoo.org>
> AuthorDate: 2018-12-13 10:03:47 +0000
> Commit:     Jeroen Roovers <jer@gentoo.org>
> CommitDate: 2018-12-13 14:20:50 +0000
> 
>     x11-drivers/nvidia-drivers: Fix more parallel make

when load on CPU is high, nvidia-drivers-390.87 sometimes stopped to build after this commit against (at least) 4.9 series of vanilla-sources. Reverting this commit (or simply setting MAKEOPTS="-j1") restored successful builds. The build process breaks at seemingly random places, sometimes at compile stage, sometimes at install. An example of a build log of a failed build is attached.
Comment 11 Norman Back 2018-12-25 07:22:56 UTC
I've just had x11-drivers/nvidia-drivers-415.25 fail on install when compiling in parallel with 2 other modules (app-emulation/virtualbox-modules-6.0.0  & app-emulation/vmware-modules-329.1.2)

I prioritize reliable over fast emerge so now added MAKEOPTS=-j1 for x11-drivers/nvidia-drivers in /etc/portage/package.env.

This occurs intermittently on both 4 & 8 core AMD processors.
Comment 12 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-25 12:09:23 UTC
Comment on attachment 557980 [details]
build.log.xz

linux-mod.eclass calls make with the targets "clean module" according to the default value for the eclass's BUILD_TARGETS variable. In this build log, it looks like the targets are "built" in the reverse order:

  x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id  -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.mod.o ;  true
  x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id  -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.mod.
o ;  true
  x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id  -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.mod.o ;  true
  x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id  -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.mod.o ;  true
make[1]: Leaving directory '/usr/src/linux-4.9.146'

 === End of kernel modules build ===
 === Start of kernel modules clean ===

make[1]: Entering directory '/usr/src/linux-4.9.146'
make -f ./scripts/Makefile.clean obj=/tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel
  rm -f /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia/nv-interface.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset/nv-modeset-interface.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia/nv-kernel.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset/nv-modeset-kernel.o
  rm -rf /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/.tmp_versions
  rm -f /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/Module.symvers
make[1]: Leaving directory '/usr/src/linux-4.9.146'

 === End of kernel modules clean ===
 === Start of nvidia-settings build ===

make -j5 -C /tmp/portage/x11-drivers/nvidia-drivers-390.87/work//nvidia-settings-390.87/src AR=x86_64-pc-linux-gnu-ar CC=x86_64-pc-linux-gnu-gcc DO_STRIP= LD=x86_64-pc-linux-gnu-gcc LIBDIR=lib64 NVLD=x86_64-pc-linux-gnu-ld NV_VERBOSE=1 RANLIB=x86_64-pc-linux-gnu-ranlib build-xnvctrl

Perhaps we should set BUILD_TARGETS=modules in the ebuilds?
Comment 13 Larry the Git Cow gentoo-dev 2018-12-25 14:03:36 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1a410b8ebbd51b3224de532922316e8a832b07f8

commit 1a410b8ebbd51b3224de532922316e8a832b07f8
Author:     Jeroen Roovers <jer@gentoo.org>
AuthorDate: 2018-12-25 14:00:36 +0000
Commit:     Jeroen Roovers <jer@gentoo.org>
CommitDate: 2018-12-25 14:03:31 +0000

    x11-drivers/nvidia-drivers: Work around make bug
    
    When calling `make -j(2+) clean module', sometimes the `module' target is
    built before the `clean' target is built. Work around this by setting
    BUILT_TARGET=module so that the `clean' target is never built.
    
    Bug: https://bugs.gentoo.org/613578
    Package-Manager: Portage-2.3.52, Repoman-2.3.12
    Signed-off-by: Jeroen Roovers <jer@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-340.107.ebuild | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild  | 4 +++-
 x11-drivers/nvidia-drivers/nvidia-drivers-410.78.ebuild  | 4 +++-
 x11-drivers/nvidia-drivers/nvidia-drivers-415.18.ebuild  | 4 +++-
 x11-drivers/nvidia-drivers/nvidia-drivers-415.23.ebuild  | 4 +++-
 x11-drivers/nvidia-drivers/nvidia-drivers-415.25.ebuild  | 4 +++-
 6 files changed, 16 insertions(+), 6 deletions(-)
Comment 14 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-25 14:11:12 UTC
(In reply to Larry the Git Cow from comment #13)
> The bug has been referenced in the following commit(s):
> 
> https://gitweb.gentoo.org/repo/gentoo.git/commit/
> ?id=1a410b8ebbd51b3224de532922316e8a832b07f8
> 
> commit 1a410b8ebbd51b3224de532922316e8a832b07f8
> Author:     Jeroen Roovers <jer@gentoo.org>
> AuthorDate: 2018-12-25 14:00:36 +0000
> Commit:     Jeroen Roovers <jer@gentoo.org>
> CommitDate: 2018-12-25 14:03:31 +0000
> 
>     x11-drivers/nvidia-drivers: Work around make bug
>     
>     When calling `make -j(2+) clean module', sometimes the `module' target is
>     built before the `clean' target is built. Work around this by setting
>     BUILT_TARGET=module so that the `clean' target is never built.
>     
>     Bug: https://bugs.gentoo.org/613578
>     Package-Manager: Portage-2.3.52, Repoman-2.3.12
>     Signed-off-by: Jeroen Roovers <jer@gentoo.org>
> 
>  x11-drivers/nvidia-drivers/nvidia-drivers-340.107.ebuild | 2 +-
>  x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild  | 4 +++-

I just did some 50 builds of 390.87 with MAKEOPTS=-j, at some times seeing over 80 concurrent jobs. At no time did I see `clean' being built after `module'.