It looks like MAKEOPTS=-j1 can be safely removed from ebuild. I've just build x11-drivers/nvidia-drivers-375.26 multiple times with different USE combinations on 8-core CPU and all builds succeeded. Current build times are very annoying, especially when you play with kernel config and have to `emerge @module-rebuild` after each kernel change. Last version of nvidia-drivers which builds fast (50 sec) was 358.16-r1 about a year ago. Next version 361.28 doubles build time (100 sec) and then half of year ago 367.57 doubles build time once again (210 sec). With -j1 removed current 375.26 builds in 90-110 sec (depending on USE flags) on my system (Core i7-2600K overclocked to 4.5GHz).
Your anecdotal evidence does not suggest Nvidia fixed their build system and neither does this: # qlop -tH nvidia-drivers nvidia-drivers: 2 minutes, 56 seconds for 323 merges Do you really care whether it takes 4 minutes or 1.5 minute to compile? What if it fails to build with parallel make and you didn't properly check and you reboot and you find you have to do it again (and can't use your desktop environment in the mean time)?
(In reply to Jeroen Roovers from comment #1) > Your anecdotal evidence does not suggest Nvidia fixed their build system and Sure. When I've opened ebuild I expected to find comment with related bug number near -j1 line, to check original issue and try to find out is it was really fixed. But I didn't found that and opened this report in hope someone who know this issue better than me will check is it really fixed now or I just was lucky to build it in parallel without any issues. > neither does this: > > # qlop -tH nvidia-drivers > nvidia-drivers: 2 minutes, 56 seconds for 323 merges That says nothing. Here is mine: # qlop -tH nvidia-drivers nvidia-drivers: 55 seconds for 256 merges but real pain is here: # qlop -g nvidia-drivers | tail nvidia-drivers: Thu Mar 23 03:28:44 2017: 88 seconds nvidia-drivers: Thu Mar 23 03:42:38 2017: 91 seconds nvidia-drivers: Mon Mar 27 17:06:09 2017: 278 seconds nvidia-drivers: Mon Mar 27 18:20:30 2017: 270 seconds nvidia-drivers: Mon Mar 27 19:16:20 2017: 270 seconds nvidia-drivers: Mon Mar 27 19:46:11 2017: 330 seconds nvidia-drivers: Mon Mar 27 20:12:53 2017: 275 seconds nvidia-drivers: Mon Mar 27 20:40:30 2017: 274 seconds nvidia-drivers: Mon Mar 27 20:59:22 2017: 222 seconds nvidia-drivers: 256 times > Do you really care whether it takes 4 minutes or 1.5 minute to compile? What Yes, I do. After all, it's a Gentoo, which is expected to be optimized and fast. And even repoman get angry on -j1 in ebuilds, so I suppose it's not a normal thing and shouldn't be used "just in case", without known issues. Maybe nvidia-drivers is an exception, but, again, this may be noted as comment in ebuild. > if it fails to build with parallel make and you didn't properly check and > you reboot and you find you have to do it again (and can't use your desktop > environment in the mean time)? Actually I expect to have failed build, not successful build with broken nvidia driver installed. I've seen several packages which required -j1, and in all cases missing of -j1 result in (sometimes) broken build. And if build fails - that's okay, then we will know for sure it's still broken, add -j1 back and create an issue with evidence for future references. That's all I want to see: either no -j1 in ebuild, or open issue describing what's broken and why we need -j1 until it's fixed. If there is such an issue and I just failed to find it - please give me a bug number and close this one.
> Yes, I do. After all, it's a Gentoo, which is expected to be optimized and > fast. Compiling the nvidia glue code as fast as possible is not exactly a goal of Gentoo linux. Guaranteeing a *reliable* result (i.e. a minimum of compilation errors) is. Be warned that using "optimized and fast" as an argument might not necessarily be perceived as a particularly strong point in a discussion [1,2]. [1] Have a look at https://fun.irq.dk/funroll-loops.org/ for some historical amusement. [2] Independent of the question whether the "-j1"-workaround used in nvidia-dirvers should be re-evaluated or not.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7848d6701921b7959502d0b9cdd84344a1f5bf0d commit 7848d6701921b7959502d0b9cdd84344a1f5bf0d Author: Jeroen Roovers <jer@gentoo.org> AuthorDate: 2018-12-13 01:23:45 +0000 Commit: Jeroen Roovers <jer@gentoo.org> CommitDate: 2018-12-13 01:24:14 +0000 x11-drivers/nvidia-drivers: Fix parallel make Fixes: https://bugs.gentoo.org/613578 Package-Manager: Portage-2.3.52, Repoman-2.3.12 Signed-off-by: Jeroen Roovers <jer@gentoo.org> x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
(In reply to Larry the Git Cow from comment #4) > The bug has been closed via the following commit(s): > > https://gitweb.gentoo.org/repo/gentoo.git/commit/ > ?id=7848d6701921b7959502d0b9cdd84344a1f5bf0d > > commit 7848d6701921b7959502d0b9cdd84344a1f5bf0d > Author: Jeroen Roovers <jer@gentoo.org> > AuthorDate: 2018-12-13 01:23:45 +0000 > Commit: Jeroen Roovers <jer@gentoo.org> > CommitDate: 2018-12-13 01:24:14 +0000 > > x11-drivers/nvidia-drivers: Fix parallel make > > Fixes: https://bugs.gentoo.org/613578 > Package-Manager: Portage-2.3.52, Repoman-2.3.12 > Signed-off-by: Jeroen Roovers <jer@gentoo.org> > > x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) I did a test run on the same change in all current ebuilds and it went wrong once in x11-drivers/nvidia-drivers-375.82. /bin/sh: line 1: /home/jer/portage/x11-drivers/nvidia-drivers-375.82/work/kernel/.tmp_versions/nvidia.mod: No such file or directory make[2]: *** [scripts/Makefile.build:541: /home/jer/portage/x11-drivers/nvidia-drivers-375.82/work/kernel/nvidia.o] Error 1 make[2]: *** Waiting for unfinished jobs.... But maybe that's something Nvidia already fixed. I do remember reading something to that effect a while ago. Reopening.
Created attachment 557708 [details] x11-drivers:nvidia-drivers-375.82:20181213-025131.log Test setup: #!/bin/sh for j in nvidia-drivers-*.ebuild; do for i in `seq 1 20`; do USE="-static-libs -tools" \ KERNEL_DIR=/usr/src/linux-4.10.13-gentoo \ MAKEOPTS="-j20" \ FEATURES=noauto \ ebuild $j clean setup unpack prepare configure compile install && echo "===== $j $i =====" || break 2 done done echo $j $i It failed for 375.82 at the twelfth try.
Release notes for 355.06: https://www.nvidia.com/Download/driverResults.aspx/88694/en-us "Replaced the build system for the NVIDIA kernel modules and updated the installer package and nvidia-installer to use the new build system and kernel module source code layout. For more information about the new build system and layout, see the README document at: ftp://download.nvidia.com/XFree86/packaging/linux/new-kbuild-for-355/" Release notes for 384.59: https://www.nvidia.com/Download/driverResults.aspx/120916/en-us "Restored several sanity checks that were inadvertently removed from the kernel module build process in the 355.06 driver." Perhaps this is what I am seeing in 375, but if true that should also affect 378 and 381.
I've tested the 415.22 branch with a slightly modified version: #!/bin/sh for j in /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild; do for i in `seq 1 32`; do USE="-static-libs -tools" \ KERNEL_DIR=/usr/src/linux \ MAKEOPTS="-j$i" \ FEATURES=noauto \ ebuild $j fetch clean setup unpack prepare configure compile install && echo "===== $j $i =====" || break 2 done done echo $j $i It worked for every single case, finishing up with status: ===== /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild 32 ===== /usr/portage/x11-drivers/nvidia-drivers/nvidia-drivers-415.22.ebuild 32 So at least for the recent driver version this seems to be fixed. kv: 4.18.19-gentoo
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=bad03706202759766c455e463956e28ef02ef204 commit bad03706202759766c455e463956e28ef02ef204 Author: Jeroen Roovers <jer@gentoo.org> AuthorDate: 2018-12-13 10:03:47 +0000 Commit: Jeroen Roovers <jer@gentoo.org> CommitDate: 2018-12-13 14:20:50 +0000 x11-drivers/nvidia-drivers: Fix more parallel make Package-Manager: Portage-2.3.52, Repoman-2.3.12 Bug: https://bugs.gentoo.org/613578 Signed-off-by: Jeroen Roovers <jer@gentoo.org> x11-drivers/nvidia-drivers/nvidia-drivers-304.137.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-384.130.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-387.34.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-396.54.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-410.78.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-415.18.ebuild | 2 +- 7 files changed, 7 insertions(+), 7 deletions(-)
Created attachment 557980 [details] build.log.xz (In reply to Larry the Git Cow from comment #9) > commit bad03706202759766c455e463956e28ef02ef204 > Author: Jeroen Roovers <jer@gentoo.org> > AuthorDate: 2018-12-13 10:03:47 +0000 > Commit: Jeroen Roovers <jer@gentoo.org> > CommitDate: 2018-12-13 14:20:50 +0000 > > x11-drivers/nvidia-drivers: Fix more parallel make when load on CPU is high, nvidia-drivers-390.87 sometimes stopped to build after this commit against (at least) 4.9 series of vanilla-sources. Reverting this commit (or simply setting MAKEOPTS="-j1") restored successful builds. The build process breaks at seemingly random places, sometimes at compile stage, sometimes at install. An example of a build log of a failed build is attached.
I've just had x11-drivers/nvidia-drivers-415.25 fail on install when compiling in parallel with 2 other modules (app-emulation/virtualbox-modules-6.0.0 & app-emulation/vmware-modules-329.1.2) I prioritize reliable over fast emerge so now added MAKEOPTS=-j1 for x11-drivers/nvidia-drivers in /etc/portage/package.env. This occurs intermittently on both 4 & 8 core AMD processors.
Comment on attachment 557980 [details] build.log.xz linux-mod.eclass calls make with the targets "clean module" according to the default value for the eclass's BUILD_TARGETS variable. In this build log, it looks like the targets are "built" in the reverse order: x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-drm.mod.o ; true x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset.mod. o ; true x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-uvm.mod.o ; true x86_64-pc-linux-gnu-ld -r -m elf_x86_64 -T ./scripts/module-common.lds --build-id -o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.ko /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia.mod.o ; true make[1]: Leaving directory '/usr/src/linux-4.9.146' === End of kernel modules build === === Start of kernel modules clean === make[1]: Entering directory '/usr/src/linux-4.9.146' make -f ./scripts/Makefile.clean obj=/tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel rm -f /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia/nv-interface.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset/nv-modeset-interface.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia/nv-kernel.o /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/nvidia-modeset/nv-modeset-kernel.o rm -rf /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/.tmp_versions rm -f /tmp/portage/x11-drivers/nvidia-drivers-390.87/work/kernel/Module.symvers make[1]: Leaving directory '/usr/src/linux-4.9.146' === End of kernel modules clean === === Start of nvidia-settings build === make -j5 -C /tmp/portage/x11-drivers/nvidia-drivers-390.87/work//nvidia-settings-390.87/src AR=x86_64-pc-linux-gnu-ar CC=x86_64-pc-linux-gnu-gcc DO_STRIP= LD=x86_64-pc-linux-gnu-gcc LIBDIR=lib64 NVLD=x86_64-pc-linux-gnu-ld NV_VERBOSE=1 RANLIB=x86_64-pc-linux-gnu-ranlib build-xnvctrl Perhaps we should set BUILD_TARGETS=modules in the ebuilds?
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1a410b8ebbd51b3224de532922316e8a832b07f8 commit 1a410b8ebbd51b3224de532922316e8a832b07f8 Author: Jeroen Roovers <jer@gentoo.org> AuthorDate: 2018-12-25 14:00:36 +0000 Commit: Jeroen Roovers <jer@gentoo.org> CommitDate: 2018-12-25 14:03:31 +0000 x11-drivers/nvidia-drivers: Work around make bug When calling `make -j(2+) clean module', sometimes the `module' target is built before the `clean' target is built. Work around this by setting BUILT_TARGET=module so that the `clean' target is never built. Bug: https://bugs.gentoo.org/613578 Package-Manager: Portage-2.3.52, Repoman-2.3.12 Signed-off-by: Jeroen Roovers <jer@gentoo.org> x11-drivers/nvidia-drivers/nvidia-drivers-340.107.ebuild | 2 +- x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild | 4 +++- x11-drivers/nvidia-drivers/nvidia-drivers-410.78.ebuild | 4 +++- x11-drivers/nvidia-drivers/nvidia-drivers-415.18.ebuild | 4 +++- x11-drivers/nvidia-drivers/nvidia-drivers-415.23.ebuild | 4 +++- x11-drivers/nvidia-drivers/nvidia-drivers-415.25.ebuild | 4 +++- 6 files changed, 16 insertions(+), 6 deletions(-)
(In reply to Larry the Git Cow from comment #13) > The bug has been referenced in the following commit(s): > > https://gitweb.gentoo.org/repo/gentoo.git/commit/ > ?id=1a410b8ebbd51b3224de532922316e8a832b07f8 > > commit 1a410b8ebbd51b3224de532922316e8a832b07f8 > Author: Jeroen Roovers <jer@gentoo.org> > AuthorDate: 2018-12-25 14:00:36 +0000 > Commit: Jeroen Roovers <jer@gentoo.org> > CommitDate: 2018-12-25 14:03:31 +0000 > > x11-drivers/nvidia-drivers: Work around make bug > > When calling `make -j(2+) clean module', sometimes the `module' target is > built before the `clean' target is built. Work around this by setting > BUILT_TARGET=module so that the `clean' target is never built. > > Bug: https://bugs.gentoo.org/613578 > Package-Manager: Portage-2.3.52, Repoman-2.3.12 > Signed-off-by: Jeroen Roovers <jer@gentoo.org> > > x11-drivers/nvidia-drivers/nvidia-drivers-340.107.ebuild | 2 +- > x11-drivers/nvidia-drivers/nvidia-drivers-390.87.ebuild | 4 +++- I just did some 50 builds of 390.87 with MAKEOPTS=-j, at some times seeing over 80 concurrent jobs. At no time did I see `clean' being built after `module'.