Summary: | sci-libs/vtk-9.2.5: computes absurdly large build-time RAM requirement | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Will Simoneau <bugzilla> |
Component: | Current packages | Assignee: | Gentoo Science Related Packages <sci> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | negril.nx+gentoo, proxy-maint, waebbl-gentoo |
Priority: | Normal | Keywords: | PullRequest |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://github.com/gentoo/gentoo/pull/31487 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Will Simoneau
2023-03-14 19:51:18 UTC
>Multiplying the memory requirement by the number of build jobs is complete nonsense. The minimum amount of memory required for the build to succeed is in general *NOT* a linear function of the number of parallel build jobs.
It's a fair rule of thumb to say each job may take up to ~2GB. It's conservative but it doesn't work so well with large numbers of cores or RAM, yes.
It indeed could be tweaked but it's not totally unreasonable to start with.
(In reply to Will Simoneau from comment #0) > IMO it might be reasonable to just change the minimum RAM check to trigger a > warning instead of an error. The eclass always triggers an error if the amount is exceeded. It has however a user settable flag, to not fail and instead issue a warning only. You might give this a try. I'm happy if you can report any findings, so we can improve the way we calculate the requirements. Like Sam said, it's a conservative approach to estimate the needed amount RAM. Individual nvcc tasks require up to 7G of RAM. Some files require less, but most of the files, I have watched, were in a range close to this amount. I have a machine with only one CPU (8 cores, 16 threads) and 32G RAM (+32G swap) and ran into a OOM like scenario. It wasn't a kernel OOM, but the machine became slow up to a point, where there was virtually no responsiveness left. I hard-resetted the machine after some increased time in this state to get it back to working. The cuda USE flag was masked for a considerable amount of time, because of major issues for the last few versions and was only re-enabled with v9.2 IIRC. (In reply to Bernd from comment #2) > (In reply to Will Simoneau from comment #0) > > > IMO it might be reasonable to just change the minimum RAM check to trigger a > > warning instead of an error. > > The eclass always triggers an error if the amount is exceeded. It has > however a user settable flag, to not fail and instead issue a warning only. > You might give this a try. Thanks for the tip - I wasn't aware that I could just set CHECKREQS_DONOTHING=1 to get the behavior I wanted. > I'm happy if you can report any findings, so we can improve the way we > calculate the requirements. FWIW, peak memory usage during build of sci-libs/vtk-9.2.5[cuda] with: dev-util/nvidia-cuda-toolkit-11.8.0-r3 sys-devel/gcc-11.3.1_p20230120-r1 sys-devel/binutils-2.39-r4 MAKEOPTS="-j52 -l128" VTK_CUDA_ARCH=pascal ... seems to have only been ~9.1G. I did see one nvcc process peak at ~7GB RSS and a few others at ~4.5GB RSS, but I didn't see multiple large-RSS jobs running simultaneously. (Which of course might come down to pure luck / arbitrary execution order of the individual compile jobs) > I have a machine with only one CPU (8 cores, 16 threads) and 32G RAM (+32G > swap) and ran into a OOM like scenario. It wasn't a kernel OOM, but the > machine became slow up to a point, where there was virtually no > responsiveness left. I hard-resetted the machine after some increased time > in this state to get it back to working. Yeah, IME Linux does a poor job of handling situations that involve parallel build jobs pushing the system deep into swap. In theory those can be dealt with by appropriate use of cgroups, though I personally consider that approach far too complicated to bother with. Instead I just configure machines that have >=32GB RAM or so without swap. Can't softlock due to a swap-storm if there isn't any swap to begin with ;-) I find it much nicer overall to just let the OOM-killer step in when necessary. I just tested without using swap with -j8 settings and it finished without getting OOM, but at some points, there was only ~1-2G free RAM left. My approach now would be to test for 4*7G if more than 4 jobs are set in MAKEOPTS/NINJAOPTS and for jobs*7G if less than 4 jobs are set, so we would have a max of 28G RAM required. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1590d50aab5daa504272ee6104c5b06e3d5d037b commit 1590d50aab5daa504272ee6104c5b06e3d5d037b Author: Paul Zander <negril.nx+gentoo@gmail.com> AuthorDate: 2023-06-16 16:32:01 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2023-06-28 21:09:20 +0000 sci-libs/vtk: reduce required memory for cuda compilation Prior logic assumes infinite parallel nvcc calls, while real-life testing shows a max of 4. This adds crude logic to require no more memory then needed for 4 parallel calls. Bug: https://bugs.gentoo.org/901241 Signed-off-by: Paul Zander <negril.nx+gentoo@gmail.com> Signed-off-by: Sam James <sam@gentoo.org> sci-libs/vtk/vtk-9.2.5.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks a lot for the patch and the update to 9.2.6. I sadly didn't have any time during the last few weeks to look about my Gentoo ebuilds at all and am thankful you cared about it. (In reply to Bernd from comment #6) > Thanks a lot for the patch and the update to 9.2.6. I sadly didn't have any > time during the last few weeks to look about my Gentoo ebuilds at all and am > thankful you cared about it. No worries Bernd, just glad you're OK. I was wondering about sending an email. Hope to speak soon |