Somewhat related to Bug #101402 http://bugs.gentoo.org/show_bug.cgi?id=101402 From -dev debugging, it seems this only happens on Opterons. AMD64 machines do not seem affected. There isn't a clear set of CPU differences that seem to matter. The AMD64 boxes versus the Opterons only seemed to differ in the 'pni' flags (which my Opterons have) and a 'lahf_lm' flags which one of the AMD64 boxes had. All of my Opterons follow the same cpuinfo as below except varying in speed. You can click the URL link above the summary or go to these for lots of debug output that I scrounged up. Compressed and uncompressed versions are here. Links that follow after this link are directly to the uncompressed, plain text versions. http://www.twobit.net/~carpaski/vg3/ Simple demonstration where 'fortune' cannot be run through valgrind. It contains the plain execution and then an strace of the output. http://www.twobit.net/~carpaski/vg3/vg3-illegal-strace.log A gdb run of continuous step operations breaking in main and single stepping until Valgrind exits. Might want a quick sed on that. sed -i '/^(gdb) $/d' vg3-illegal.log http://www.twobit.net/~carpaski/vg3/vg3-illegal.log A gdb run with 'where full' interlaced after every step. This is a lot of output. It's the best info I can provide at the moment. http://www.twobit.net/~carpaski/vg3/vg3-illegal-full.log Portage 2.0.51.22-r2 (default-linux/amd64/2005.0, gcc-3.4.3, glibc-2.3.5-r1, 2.6.11-gentoo-r11 x86_64) gcc (GCC) 3.4.3 20041125 (Gentoo 3.4.3-r1, ssp-3.4.3-0, pie-8.7.7) processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2004.595 cache size : 1024 KB physical id : 255 siblings : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow bogomips : 3940.35 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 5 model name : AMD Opteron(tm) Processor 246 stepping : 10 cpu MHz : 2004.595 cache size : 1024 KB physical id : 255 siblings : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 pni syscall nx mmxext lm 3dnowext 3dnow bogomips : 4005.88 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp
Still broken as of SVN checkout on Aug 9 @ 16:30 UTC.
Quick note, in case to clairify: Athlon64 (939) seems to work fine, but Opterons (940) do not. Additional bits of insight: On SMP Opterons, I see the unhandled instructions. On single-proc, non-SMP kernels, at least two of us see infinate looping that is very resistant to kill -9.
Bug filed against Valgrind tracker: http://bugs.kde.org/show_bug.cgi?id=110478
Alright, let's wait for them to fix it.
According to upstream the problem has been fixed in SVN - maybe a fix yould be extracted to apply to 3.0.0? Or will we just wait for 3.0.1 which is planned to follow asap ?
Since it's not just this issue, but also the SSE thing, I'll wait until 3.0.1 comes out. A little fixing here and there is ok, but I don't want to turn 3.0.0 into an SVN ebuild. Especially not with another release coming in about a month.
Nicholas, 3.0.1 is in the tree. Has this issue been fixed?
Confirmed to work. Reopening to change resolution.
Fixed in 3.0.1