Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 266568 - Request: hardmask dev-util/nvidia-cuda-{toolkit, sdk}-2.1
Summary: Request: hardmask dev-util/nvidia-cuda-{toolkit, sdk}-2.1
Status: RESOLVED WONTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High normal
Assignee: Michal Januszewski (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-17 19:14 UTC by Gottfried Munda
Modified: 2009-09-25 18:09 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gottfried Munda 2009-04-17 19:14:16 UTC
As apparently CUDA 2.1 is broken in several places it should be hardmasked. nvcc from CUDA 2.1 refuses to compile the CUDA Templates [1], whereas it builds fine with CUDA 2.0 (and 2.2).
I can also report random crashes (segfaults) in CUDA applications that work fine with 2.0

Proposed Solution: Wait for CUDA 2.2 or stick with 2.0

[1] http://cudatemplates.sourceforge.net/

Reproducible: Always

Steps to Reproduce:
1. Checkout Cudatemplates: svn co https://cudatemplates.svn.sourceforge.net/svnroot/cudatemplates/trunk
2. cmake . in cudatemplates/trunk
3. make

Actual Results:  
compile error

Expected Results:  
No error
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2009-04-18 15:24:02 UTC
Why do you think it's impossible to fix the problems? Masking is quite a radical solution and usually solves nothing.
Comment 2 Michal Januszewski (RETIRED) gentoo-dev 2009-04-18 16:11:55 UTC
(In reply to comment #0)
> As apparently CUDA 2.1 is broken in several places it should be hardmasked.
> nvcc from CUDA 2.1 refuses to compile the CUDA Templates [1], whereas it builds
> fine with CUDA 2.0 (and 2.2).

Could you please provide the exact error messages?

Are you sure this is a problem with the CUDA compiler and not the templates themselves?  Is this a known problem and has it been reported upstream (e.g. at the CUDA forums)?

> I can also report random crashes (segfaults) in CUDA applications that work
> fine with 2.0

Do they work with 2.2?  Again, are you sure this is a problem with CUDA and not the application?  I have been using the 2.1 toolkit for quite some time and it seems to be working just fine for me.
Comment 3 Gottfried Munda 2009-04-18 17:33:00 UTC
Using the latest cudatemplates version from svn I get:
CUDA/cudatemplates $ make
[...]
Linking CXX executable border
[  5%] Built target border
[  6%] Building (Device) NVCC Dependency File: /x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-depend
[  8%] Converting NVCC dependency to CMake (/x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.depend)
[  9%] Building (Device) NVCC -cubin File: /x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-cubin.txt
nvcc error   : 'cudafe' died due to signal 11 (Invalid memory reference)
make[2]: *** [src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-cubin.txt] Fehler 255
make[1]: *** [testing/CMakeFiles/buffer_object.dir/all] Fehler 2
make: *** [all] Fehler 2

A colleague tried to compile a framework which made use of the cudatemplates (not the latest version but one from 2-3 weeks or so ago) and got:
[ 50%] Building (Device) NVCC Dependency File: /x/cuda/performance.cu_performance_generated.cc.NVCC-depend
[ 60%] Building (Device) NVCC /x/performance/performance.cu: /x/cuda/performance.cu_performance_generated.cc
/x/performance/performance.cu(88): warning: variable "shared_mem_entries" was declared but never referenced

Segmentation fault
make[2]: *** [src/cuda/performance.cu_performance_generated.cc] Error 255
make[1]: *** [performance/CMakeFiles/performance.dir/all] Error 2
make: *** [all] Error 2 

In both cases switching to 2.0 solved the problem (note: it also works using the 2.2 beta).
In our department it is "common knowledge" that 2.1 is a bad release, although I must admit I don't know any details besides the one I provided above. I can confirm though that our applications work with 2.0 and 2.2.

I found a somewhat similar problem [1] which hints that there's a known incompatibility between cuda 2.1 and templates. It also states that this is already fixed in 2.2 and won't get fixed in 2.1.

Hence my request to hardmask 2.1, I think users should get 2.0 by default (or 2.2 when released), but not a version that has known problems which won't get fixed. To my knowledge 2.1 offers no essential advantages over 2.0, so this shouldn't be much of a problem.

[1] http://groups.google.co.uk/group/cudpp/browse_thread/thread/8a9ce203e8dbbef4
Comment 4 Gottfried Munda 2009-05-14 19:08:33 UTC
Any news on this? I still think it would be nice for someone who did an ACCEPT_KYWORDS="~x86" emerge nvidia-cuda-toolkit nvidia-cuda-sdk if he actually got a version that doesn't has those issues.
Comment 5 Michal Januszewski (RETIRED) gentoo-dev 2009-05-15 11:28:22 UTC
Sorry about the delay, I got sidetracked by other things.

Since in the meantime CUDA 2.2 has been released, I think it would be good to make a move in the opposite direction, i.e. get CUDA 2.2 into the tree.  I will start working on the updated ebuilds today.
Comment 6 Gottfried Munda 2009-05-25 18:05:20 UTC
Great, thanks a lot! Keep up the good work! :)

Maybe 2.0 can go stable soon, so there's less chance someone gets 2.1...
Comment 7 Michal Januszewski (RETIRED) gentoo-dev 2009-09-25 18:09:45 UTC
Closing as WONTFIX, since there don't seem to be any reproducible crashes to warrant a hard mask.

To be on the safe side, 2.1 will not be a stable candidate though -- the next version of CUDA to be stabilized in Gentoo will be 2.2.  The main issue of the bug should now also be resolved: as the latest stable version is currently 2.0 and the latest unmasked unstable version is 2.2, no one should actually hit 2.1 unless they explicitly request that release.