266568 – Request: hardmask dev-util/nvidia-cuda-{toolkit, sdk}-2.1

Bug 266568 - Request: hardmask dev-util/nvidia-cuda-{toolkit, sdk}-2.1

Summary: Request: hardmask dev-util/nvidia-cuda-{toolkit, sdk}-2.1

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	New packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Michal Januszewski (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-04-17 19:14 UTC by Gottfried Munda
Modified:	2009-09-25 18:09 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gottfried Munda 2009-04-17 19:14:16 UTC

As apparently CUDA 2.1 is broken in several places it should be hardmasked. nvcc from CUDA 2.1 refuses to compile the CUDA Templates [1], whereas it builds fine with CUDA 2.0 (and 2.2).
I can also report random crashes (segfaults) in CUDA applications that work fine with 2.0

Proposed Solution: Wait for CUDA 2.2 or stick with 2.0

[1] http://cudatemplates.sourceforge.net/

Reproducible: Always

Steps to Reproduce:
1. Checkout Cudatemplates: svn co https://cudatemplates.svn.sourceforge.net/svnroot/cudatemplates/trunk
2. cmake . in cudatemplates/trunk
3. make

Actual Results:  
compile error

Expected Results:  
No error

Comment 1 Jeroen Roovers (RETIRED) gentoo-dev

2009-04-18 15:24:02 UTC

Why do you think it's impossible to fix the problems? Masking is quite a radical solution and usually solves nothing.

Comment 2 Michal Januszewski (RETIRED) gentoo-dev

2009-04-18 16:11:55 UTC

(In reply to comment #0)
> As apparently CUDA 2.1 is broken in several places it should be hardmasked.
> nvcc from CUDA 2.1 refuses to compile the CUDA Templates [1], whereas it builds
> fine with CUDA 2.0 (and 2.2).

Could you please provide the exact error messages?

Are you sure this is a problem with the CUDA compiler and not the templates themselves?  Is this a known problem and has it been reported upstream (e.g. at the CUDA forums)?

> I can also report random crashes (segfaults) in CUDA applications that work
> fine with 2.0

Do they work with 2.2?  Again, are you sure this is a problem with CUDA and not the application?  I have been using the 2.1 toolkit for quite some time and it seems to be working just fine for me.

Comment 3 Gottfried Munda 2009-04-18 17:33:00 UTC

Using the latest cudatemplates version from svn I get:
CUDA/cudatemplates $ make
[...]
Linking CXX executable border
[  5%] Built target border
[  6%] Building (Device) NVCC Dependency File: /x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-depend
[  8%] Converting NVCC dependency to CMake (/x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.depend)
[  9%] Building (Device) NVCC -cubin File: /x/CUDA/cudatemplates/src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-cubin.txt
nvcc error   : 'cudafe' died due to signal 11 (Invalid memory reference)
make[2]: *** [src/cuda/buffer_object_init.cu_buffer_object_generated.cc.NVCC-cubin.txt] Fehler 255
make[1]: *** [testing/CMakeFiles/buffer_object.dir/all] Fehler 2
make: *** [all] Fehler 2

A colleague tried to compile a framework which made use of the cudatemplates (not the latest version but one from 2-3 weeks or so ago) and got:
[ 50%] Building (Device) NVCC Dependency File: /x/cuda/performance.cu_performance_generated.cc.NVCC-depend
[ 60%] Building (Device) NVCC /x/performance/performance.cu: /x/cuda/performance.cu_performance_generated.cc
/x/performance/performance.cu(88): warning: variable "shared_mem_entries" was declared but never referenced

Segmentation fault
make[2]: *** [src/cuda/performance.cu_performance_generated.cc] Error 255
make[1]: *** [performance/CMakeFiles/performance.dir/all] Error 2
make: *** [all] Error 2 

In both cases switching to 2.0 solved the problem (note: it also works using the 2.2 beta).
In our department it is "common knowledge" that 2.1 is a bad release, although I must admit I don't know any details besides the one I provided above. I can confirm though that our applications work with 2.0 and 2.2.

I found a somewhat similar problem [1] which hints that there's a known incompatibility between cuda 2.1 and templates. It also states that this is already fixed in 2.2 and won't get fixed in 2.1.

Hence my request to hardmask 2.1, I think users should get 2.0 by default (or 2.2 when released), but not a version that has known problems which won't get fixed. To my knowledge 2.1 offers no essential advantages over 2.0, so this shouldn't be much of a problem.

[1] http://groups.google.co.uk/group/cudpp/browse_thread/thread/8a9ce203e8dbbef4

Comment 4 Gottfried Munda 2009-05-14 19:08:33 UTC

Any news on this? I still think it would be nice for someone who did an ACCEPT_KYWORDS="~x86" emerge nvidia-cuda-toolkit nvidia-cuda-sdk if he actually got a version that doesn't has those issues.

Comment 5 Michal Januszewski (RETIRED) gentoo-dev

2009-05-15 11:28:22 UTC

Sorry about the delay, I got sidetracked by other things.

Since in the meantime CUDA 2.2 has been released, I think it would be good to make a move in the opposite direction, i.e. get CUDA 2.2 into the tree.  I will start working on the updated ebuilds today.

Comment 6 Gottfried Munda 2009-05-25 18:05:20 UTC

Great, thanks a lot! Keep up the good work! :)

Maybe 2.0 can go stable soon, so there's less chance someone gets 2.1...

Comment 7 Michal Januszewski (RETIRED) gentoo-dev

2009-09-25 18:09:45 UTC

Closing as WONTFIX, since there don't seem to be any reproducible crashes to warrant a hard mask.

To be on the safe side, 2.1 will not be a stable candidate though -- the next version of CUDA to be stabilized in Gentoo will be 2.2.  The main issue of the bug should now also be resolved: as the latest stable version is currently 2.0 and the latest unmasked unstable version is 2.2, no one should actually hit 2.1 unless they explicitly request that release.