744517 – media-libs/opensubdiv-3.4.3 fails with CUDA11: "Value 'compute_30' is not defined for option 'gpu-architecture'"

Bug 744517 - media-libs/opensubdiv-3.4.3 fails with CUDA11: "Value 'compute_30' is not defined for option 'gpu-architecture'"

Summary: media-libs/opensubdiv-3.4.3 fails with CUDA11: "Value 'compute_30' is not def...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	AMD64 Linux

Importance:	Normal normal
Assignee:	Adrian

URL:
Whiteboard:
Keywords:	PATCH, PullRequest

Duplicates (2):	799134 811054 (view as bug list)
Depends on:
Blocks:

Reported:	2020-09-24 15:38 UTC by Aiwendil
Modified:	2021-11-22 14:11 UTC (History)
CC List:	10 users (show)

See Also:	https://github.com/gentoo/gentoo/pull/18516 https://github.com/gentoo/gentoo/pull/18663 https://github.com/gentoo/gentoo/pull/22852
Package list:
Runtime testing required:	---

Attachments
Failed build log (media-libs:opensubdiv-3.4.3:20200924-123831.log,173.85 KB, text/x-log) 2020-09-24 15:40 UTC, Aiwendil	Details
opensubdiv-3.4.3-add-CUDA11-compatibility.patch (opensubdiv-3.4.3-add-CUDA11-compatibility.patch,713 bytes, patch) 2020-09-24 15:44 UTC, Aiwendil	Details \| Diff
ebuild patch (opensubdiv_ebuild_cuda11_patch.patch,544 bytes, patch) 2020-09-24 15:48 UTC, Aiwendil	Details \| Diff
New CUDA-11 patch (CUDA-11-support.patch,653 bytes, patch) 2020-12-18 05:39 UTC, Neil	Details \| Diff
Attrmpted patch for 3.4.4 (opensubdiv-3.4.4-add-CUDA11-compatibility.patch,709 bytes, patch) 2021-11-06 12:23 UTC, Satori80a	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aiwendil 2020-09-24 15:38:14 UTC

Pretty much a continuation of bug #641242 

Seems compute_30 was dropped in CUDA11 making the build of opensubdiv fail.

Reproducible: Always

Steps to Reproduce:
1.emerge dev-util/nvidia-cuda-toolkit-11.0.3
2.emerge media-libs/opensubdiv-3.4.3
Actual Results:  
opensubdiv-3.4.3 fails to build

Expected Results:  
opensubdiv-3.4.3 builds

Comment 1 Aiwendil 2020-09-24 15:40:15 UTC

Created attachment 662305 [details]
Failed build log

Comment 2 Aiwendil 2020-09-24 15:44:24 UTC

Created attachment 662308 [details, diff]
opensubdiv-3.4.3-add-CUDA11-compatibility.patch

Modified patch from bug #641242

I added "compute_52" as gpu architecture for CUDA11 because it was the lowest version nvcc --help offered me that was not going to be deprecated...not because I really have a clue what version I should use.

Comment 3 Aiwendil 2020-09-24 15:48:22 UTC

Created attachment 662311 [details, diff]
ebuild patch

Patch for the opensubdiv ebuild to use the CUDA11 patch above and not the CUDA9 one from bug #641242

Comment 4 Adrian 2020-09-25 10:42:57 UTC

The same error affects me as well.

A work around might be to choose between 35, 50, 52 and 53 which are the values in common between the currently supported nvidia-cuda-toolkit 8 and 11. I think setting 52 might be a good compromise, as it is the base model for the Maxwell generation, but am not sure whether it might prevent users of cards before Maxwell from using opensubdiv (pre 2014), and I think it also prevents cuda from using the later features in pascal, volta and turing cards.

The best solution would be to create a USE_EXPAND variable for the GPU architecture, allowing everyone to select the architecture best suited to their system. One list is found at https://en.wikipedia.org/wiki/CUDA#GPUs_supported.

Others have mentioned this at https://github.com/archenroot/gentoo-overlay/issues/24.

It would be most useful where performance is important, such as rendering for blender, or taking advantage of additional instructions in graphics card assisted computations for machine learning, such as opencv.

Comment 5 Aiwendil 2020-09-26 12:46:54 UTC

I tested it with a 780Ti (max supported model compute_35). It builds fine with the patch, also blender builds fine...but then lacks the ability to select any hardware for CUDA. That looks very hard to debug in case someone is not aware of this issue as the reason is not directly obvious in blender (it works if I change to compute_35). So yeah, setting compute_52 isn't really the best way to go about it.

A solution with a make.conf variable to set the compute_model for CUDA is probably the cleanest way.

Comment 6 Adrian 2020-09-30 23:44:28 UTC

Thank you Alwendil for your testing, I don't have a card old enough any more. I start holidays in a week and will examine the list of packages which might benefit from setting the gpu architecture and write up a submission for a USE_EXPAND variable.

Otherwise it appears that setting the lowest version of compute supported by both the oldest and latest versions of nvidia-cuda-toolkit would be the safest policy, so compute_35 presently.

I suggest anyone affected by this bug use your patch in the interim.

Comment 7 Kai Blaschke 2020-11-13 22:04:20 UTC

I concur with changing it to the lowest available version supported by each cuda version, as using a higher compute shader version potentially excludes older hardware from being usable. A USE_EXPAND variable would be the best option, providing a safe default for the currently installed cuda version, as using newer shader language versions potentially improve performance on recent hardware.

If any hardware tests are needed, I've got a good assortment of nVidia boards as old as a 7800 GT (from around 2005/2006) lying around which I can slam into my test box at any time. Just drop me a message and I'll see what I can do.

Comment 8 Neil 2020-12-18 05:39:30 UTC

Created attachment 678658 [details, diff]
New CUDA-11 patch

The old path does not work anymore, as CMakeLists.txt has been updated to support CUDA 9.

Comment 9 Adrian 2020-12-18 06:29:49 UTC

Thank you for the patch, Neil.

I am working on a different method to set the value in github pr 18663, where you can set OSD_CUDA_COMPUTE_CAPABILITIES="compute_35" in make.conf and old will compile to work with your card.

If you don't mind please try it out and let me know how it works for you as I think this will be better than hard coding a value that is not optimal for most people.

Comment 10 Neil 2020-12-18 07:08:08 UTC

I would be happy to try those changes out. I haven't setup my gentoo repo to clone from git, so it'll be a little while. I'm following the docs here: https://wiki.gentoo.org/wiki/GitHub_Pull_Requests

Specifically, I'm setting up the sync type on /etc/portage/repos.conf/ to be git.

Comment 11 Adrian 2020-12-18 09:01:47 UTC

I use that guide for all my PR, although I use variant a so can't comment on how variant b works

If you have trouble you can use the opensubdiv branch of my repository by setting /etc/portage/repos.conf/ebuild-overlay.conf to
[ebuild-overlay]
location=/usr/local/portage/ebuild-overlay
sync-type = git
sync-uri = https://github.com/redchillipadi/ebuild-overlay.git
priority=50
auto-sync = yes

and then update and switch to the opensubdiv branch with
emerge --sync (or emaint sync --repo ebuild-overlay)
cd /usr/local/portage/ebuild-overlay
git checkout opensubdiv

You might need to do
echo "ebuild-overlay" > /usr/local/portage/ebuild-overlay/profiles/repo_name
echo "masters = gentoo" > /usr/local/portage/ebuild-overlay/metadata/layout.conf
echo "thin-manifests = true" >> /usr/local/portage/ebuild-overlay/metadata/layout.conf
echo "media-libs/opensubdiv ~amd64" >> /etc/portage/package.accept_keywords/opensubdiv

Then you can
emerge opensubdiv

Comment 12 Neil 2020-12-18 17:40:02 UTC

I added your github exactly as you suggested, and I was able to emerge with your patches. It built, and reinstalled the package just fine. However, nvcc spit out a warning:



-- Generating dependency file: /var/tmp/portage/media-libs/opensubdiv-3.4.3/work/opensubdiv-3.4.3_build/opensubdiv/CMakeFiles/osd_dynamic_gpu.dir/osd/osd_dynamic_gpu_generated_cudaKernel.cu.o.NVCC-depend

/opt/cuda/bin/nvcc -M -D__CUDACC__ /var/tmp/portage/media-libs/opensubdiv-3.4.3/work/OpenSubdiv-3_4_3/opensubdiv/osd/cudaKernel.cu -o /var/tmp/portage/media-libs/opensubdiv-3.4.3/work/opensubdiv-3.4.3_build/opensubdiv/CMakeFiles/osd_dynamic_gpu.dir/osd/osd_dynamic_gpu_generated_cudaKernel.cu.o.NVCC-depend -ccbin /usr/x86_64-pc-linux-gnu/gcc-bin/8.4.0/gcc -m64 -Dosd_dynamic_gpu_EXPORTS -DOPENSUBDIV_VERSION_STRING=\"3.4.3\" -DOPENSUBDIV_HAS_OPENGL -DOSD_USES_INTERNAL_GLAPILOADER -DOPENSUBDIV_HAS_OPENMP -DGLFW_VERSION_3 -DOPENSUBDIV_HAS_GLSL_TRANSFORM_FEEDBACK -DOPENSUBDIV_HAS_GLSL_COMPUTE -DOPENSUBDIV_HAS_OPENCL -DOPENSUBDIV_HAS_CUDA -DCUDA_ENABLE_DEPRECATED=0 -Xcompiler ,\"-O2\",\"-pipe\",\"-fPIC\" -Xcompiler -fPIC --gpu-architecture compute_35 -DNVCC -I/opt/cuda/include -I/var/tmp/portage/media-libs/opensubdiv-3.4.3/work/OpenSubdiv-3_4_3/opensubdiv -I/usr/include -I/var/tmp/portage/media-libs/opensubdiv-3.4.3/work/OpenSubdiv-3_4_3/glLoader

nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).



The second line is quite long, but it has a --gpu-architecture compute_35. Again, everything built fine, but it will probably break again with a new version of cuda. Maybe you could make a fallback setup, where it chooses compute_52 for any version greater than cuda 11. This way, when a new version of cuda comes out, your package won't break, and you can make a proper patch with the correct minimum version.

Or even better, you migh be able to specify a USE flag for whichever architechture the user needs, so that they can just fix this problem in the future with a flag override.

I'm decently new to gentoo, and your instructions for cloning were super helpful. Thank you!

Comment 13 Adrian 2020-12-18 20:47:04 UTC

I am glad my instructions worked, thanks for testing the ebuild.

As for setting a default option, it would need to be the lowest possible family of cards supported by each nvidia-cuda-toolkit version, otherwise it would prevent people with low spec cards from compiling. Keeping it up to date is fragile as it would need checking each time a new version of nvidia-cuda-toolkit is released.

So I am working towards your latter option where I have a flag in the ebuild for each family of cards and allow the user to select the appropriate option. To get approval for a new USE_EXPAND variable to do this I need to be able to show that at least five packages would benefit. My changes here in opensubdiv are the first step in this direction.

Currently you can get the same functionality by setting OSD_CUDA_COMPUTE_CAPABILITIES="compute_XX" where XX specifies the version of your card, so if you want to use cuda 5.2 just put compute_52.

You can look up your card at https://developer.nvidia.com/cuda-gpus
or run /opt/cuda/extras/demo_suite/deviceQuery | grep 'CUDA Capability' (but I think deviceQuery is not present in nvidia-cuda-toolkit-11, so hopefully you already know the model of your card)

Comment 14 Ionen Wolkens gentoo-dev

2021-06-29 12:19:27 UTC

*** Bug 799134 has been marked as a duplicate of this bug. ***

Comment 15 Sam James archtester

2021-08-30 05:11:26 UTC

*** Bug 811054 has been marked as a duplicate of this bug. ***

Comment 16 Satori80a 2021-11-06 12:22:08 UTC

I've been trying to make the work for my RTX 3080 Ti (Ampere) with opensubdiv-3.4.4-r1 via local overlay, but I obviously don't know what I'm doing.

I keep getting:

* ERROR: media-libs/opensubdiv-3.4.4-r1::localrepo failed (prepare phase):
 *   patch -p1  failed with /var/tmp/portage/media-libs/opensubdiv-3.4.4-r1/files/opensubdiv-3.4.4-add-CUDA11-compatibility.patch

Any help, please?

TIA,

Scott

Comment 17 Satori80a 2021-11-06 12:23:53 UTC

Created attachment 749055 [details, diff]
Attrmpted patch for 3.4.4

Comment 18 Fat-Zer 2021-11-06 20:23:01 UTC

(In reply to Satori80a from comment #16)
> I've been trying to make the work for my RTX 3080 Ti (Ampere) with
> opensubdiv-3.4.4-r1 via local overlay, but I obviously don't know what I'm
> doing.
> 
> I keep getting:
> 
> * ERROR: media-libs/opensubdiv-3.4.4-r1::localrepo failed (prepare phase):
>  *   patch -p1  failed with
> /var/tmp/portage/media-libs/opensubdiv-3.4.4-r1/files/opensubdiv-3.4.4-add-
> CUDA11-compatibility.patch
> 
> Any help, please?
> 
> TIA,
> 
> Scott

Portage (since EAPI=5) requires that patches are provided in the format accepted by the `patch -p1` command. The arg `-p1` is used to strip the first component of the path (see `man patch` for details), so essentially in order to be applied by portage your patch should start with lines like:

    --- a/CMakeLists.txt	2021-02-05 18:24:39.000000000 -0700
    +++ b/CMakeLists.txt	2021-11-06 04:36:45.780016396 -0700

note the a/ and b/ components in front of filenames. Those dirnames can be arbitrary, but those should present. i.e. such patches are commonly produced by git. In order to produce compliant patches you may:
 
 * manually edit your existing patch
 * use `git diff` or `git format-patch` (or counterparts for whatever vcs you are using)
 * `cd ..` from root dir of a project and create a patch with something like `diff -u OpenSubdiv-3_4_4/CMakeLists.txt{.old,}`

----------------

PS: It would be more productive to ask such questions on the IRC channel #gentoo-dev-help or  on the gentoo forum.

Comment 19 Larry the Git Cow gentoo-dev

2021-11-22 14:11:10 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fc0a2d9cd04c458e48543abea41bba7882913e93

commit fc0a2d9cd04c458e48543abea41bba7882913e93
Author:     Alexander Golubev <fatzer2@gmail.com>
AuthorDate: 2021-11-06 23:14:33 +0000
Commit:     Joonas Niilola <juippis@gentoo.org>
CommitDate: 2021-11-22 14:10:19 +0000

    media-libs/opensubdiv: use cuda eclass
    
    * Utilize cuda eclass and let it handle gcc selection instead of forcing
      an outdated version.
    * Add a fix to provide sane defaults when compiling against a recent
      enough CUDA versions.
    * Add an option to pass user-specified NVCCFLAGS and prevent cmake from
      overriding them.
    
    Closes: https://bugs.gentoo.org/744517
    Closes: https://bugs.gentoo.org/751382
    Signed-off-by: Alexander Golubev <fatzer2@gmail.com>
    Closes: https://github.com/gentoo/gentoo/pull/22852
    Signed-off-by: Joonas Niilola <juippis@gentoo.org>

 ...opensubdiv-3.4.4-add-CUDA11-compatibility.patch | 19 +++++
 media-libs/opensubdiv/opensubdiv-3.4.4-r2.ebuild   | 93 ++++++++++++++++++++++
 2 files changed, 112 insertions(+)