glibc-2.7-r1 compiles fine in alpha but testsuite hangs completely our alpha dev machine while running test 'nptl/tst-robust1'. Latest compilations lines before the nuke were: http://eric.schwarzvogel.de/~klausman/glibc-2.7-r1_crash.txt http://eric.schwarzvogel.de/~klausman/glibc-2.7-r1_crash2.txt I was searching for the bug and found this thread: http://lkml.org/lkml/2007/8/1/474 Problem seems to be with arches that use generic futex implementation (alpha,arm,sparc32,...) which can make the kernel to loop: From there: -- 8< -------------------- The problem is that kernel/futex.c expects futex_atomic_cmpxchg_inatomic() to return -EFAULT or the new value. It doesn't expect -ENOSYS at all, and generally -ENOSYS causes the futex code to loop, hanging the kernel. -------------------------- There is a patch in the thread but seems to only work with no SMP configurations. toolchain any idea about how to deal with this?
it isnt a toolchain issue ... you'll need to actually fix the code in the kernel the functions in question can be executed directly without glibc and cause the same crash/hang
Some more info about the problem: - glibc-2.6 seems to be also affected. glibc-2.6.1 tests make the box hangs too. This time is not tst-robust1 but tst-robust3. tst-robust1 ends with a "Timed out: killed the child process". This enforces what spanky commented, should be something in the kernel. - debian seems to not be affected by the problem: The buildd log shows the tst-robust tests done succesfully. http://buildd.debian.org/fetch.cgi?pkg=glibc;ver=2.7-6;arch=alpha;stamp=1200169499 I was looking at debian diff but .. it's not very friendly. There are patches applied on futex code but I couldn't find something alpha/futex related.
I tested with my XP1000 today (glibc-2.7-r1) and different kernels. Both of them crashed on the test in the subject: 2.6.22.1 2.6.23.12
IMHO if current stable is also affected, it shouldn't be a showstopper for keywording/stabilizing. IIRC glibc tests fail anyway...
I applied Ivan Kokshaysky's futex patches, located here http://lkml.org/lkml/2009/4/15/307 http://lkml.org/lkml/2009/4/16/371 And reran the tst-robust tests on my UP1500 (single CPU) and they pass with 0 return codes. I'll run them on an unpatched kernel as well to verify it is actually the patches making the difference. I think we need to test these on an SMP system as well. Tobias, any chance you can do it on Monolith?
I've tried this with 2.6.27.7+patches and glibc-2.7 (r2). The test does not hang, but something entirely different fails the test (some symbol exporting stuff): /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/elf/check-localplt /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/libc.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/math/libm.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/nptl/libpthread.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/rt/librt.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/dlfcn/libdl.so | LC_ALL=C sort|diff -u ../scripts/data/localplt-generic.data - This command results in a nonempty diff which the test suite deems a failure. I'll give the whole thing a shot with a newer glibc, too.
Since I stabilized 2.9 today, I'm inclined to close this as WONTFIX if 2.9 prevails for, say, 30 days. Any objections?
(In reply to comment #7) > Since I stabilized 2.9 today, I'm inclined to close this as WONTFIX if 2.9 > prevails for, say, 30 days. Any objections? > Fine by me :)
No. I want to do more testing, but I'm in another hemisphere now. I think we may have the fix in the form of Ivan Kokshaysky's Futex implementation, but more testing is needed. It doesn't make sense to close as wontfix when the situation is closer to wonttest.
Notes: Single processor system is UP1500 Dual processor system is DS20L futex patches were merged for 2.6.30. UP1500 with 2.6.31-r5 - all tst-robust{1..9} tests are successful - all tst-robustpi{1..9} tests are successful UP1500 with 2.6.29-gentoo-r5 - tst-robust8 prints "cannot support pshared robust mutexes" - tst-robustpi{1..9} print "PI robust mutexes not supported" DS20L with 2.6.30-gentoo-r4 - all tst-robust{1..9} tests are successful - all tst-robustpi{1..9} tests are successful DS20L with 2.6.25-gentoo-r6 - tst-robust8 prints "cannot support pshared robust mutexes" - tst-robustpi{1..9} print "PI robust mutexes not supported" I was never able to reproduce the hard lock experienced by yoswink, but the results are pretty clear. I guess the only case that I have not covered is SMP with more than two CPUs, but I think it's unlikely that the results are any different. If anyone with a quad-alpha system would like to test, be my guest. I'll even send you the binaries to avoid having to build glibc and the test suite. Please mark fixed.