Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 205099 - [glibc/tests] tst-robust1 test hangs alpha
Summary: [glibc/tests] tst-robust1 test hangs alpha
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: Alpha Linux
: High normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-09 17:47 UTC by Jose Luis Rivero (yoswink) (RETIRED)
Modified: 2009-08-10 21:47 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jose Luis Rivero (yoswink) (RETIRED) gentoo-dev 2008-01-09 17:47:18 UTC
glibc-2.7-r1 compiles fine in alpha but testsuite hangs completely our alpha dev machine while running test 'nptl/tst-robust1'.

Latest compilations lines before the nuke were:
http://eric.schwarzvogel.de/~klausman/glibc-2.7-r1_crash.txt 
http://eric.schwarzvogel.de/~klausman/glibc-2.7-r1_crash2.txt 

I was searching for the bug and found this thread:
http://lkml.org/lkml/2007/8/1/474

Problem seems to be with arches that use generic futex implementation (alpha,arm,sparc32,...) which can make the kernel to loop: 

From there:
-- 8< --------------------
The problem is that kernel/futex.c expects futex_atomic_cmpxchg_inatomic()
to return -EFAULT or the new value. It doesn't expect -ENOSYS at all, and
generally -ENOSYS causes the futex code to loop, hanging the kernel. 
--------------------------  

There is a patch in the thread but seems to only work with no SMP configurations. 

toolchain any idea about how to deal with this?
Comment 1 SpanKY gentoo-dev 2008-01-09 18:12:31 UTC
it isnt a toolchain issue ... you'll need to actually fix the code in the kernel

the functions in question can be executed directly without glibc and cause the same crash/hang
Comment 2 Jose Luis Rivero (yoswink) (RETIRED) gentoo-dev 2008-01-13 23:37:48 UTC
Some more info about the problem:

- glibc-2.6 seems to be also affected.

glibc-2.6.1 tests make the box hangs too. This time is not tst-robust1 but tst-robust3. tst-robust1 ends with a "Timed out: killed the child process". This enforces what spanky commented, should be something in the kernel. 


- debian seems to not be affected by the problem:

The buildd log shows the tst-robust tests done succesfully.
http://buildd.debian.org/fetch.cgi?pkg=glibc;ver=2.7-6;arch=alpha;stamp=1200169499
I was looking at debian diff but .. it's not very friendly. There are patches applied on futex code but I couldn't find something alpha/futex related.
Comment 3 Tobias Klausmann (RETIRED) gentoo-dev 2008-01-21 19:03:43 UTC
I tested with my XP1000 today (glibc-2.7-r1) and different kernels. Both of them crashed on the test in the subject:

2.6.22.1
2.6.23.12
Comment 4 Raúl Porcel (RETIRED) gentoo-dev 2008-04-11 14:16:21 UTC
IMHO if current stable is also affected, it shouldn't be a showstopper for keywording/stabilizing. IIRC glibc tests fail anyway...
Comment 5 Matt Turner gentoo-dev 2009-04-19 19:16:39 UTC
I applied Ivan Kokshaysky's futex patches, located here

http://lkml.org/lkml/2009/4/15/307
http://lkml.org/lkml/2009/4/16/371

And reran the tst-robust tests on my UP1500 (single CPU) and they pass with 0 return codes.

I'll run them on an unpatched kernel as well to verify it is actually the patches making the difference.

I think we need to test these on an SMP system as well. Tobias, any chance you can do it on Monolith?
Comment 6 Tobias Klausmann (RETIRED) gentoo-dev 2009-04-20 12:45:46 UTC
I've tried this with 2.6.27.7+patches and glibc-2.7 (r2). The test does not hang, but something entirely different fails the test (some symbol exporting stuff):

/var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/elf/check-localplt /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/libc.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/math/libm.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/nptl/libpthread.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/rt/librt.so /var/tmp/portage/sys-libs/glibc-2.7-r2/work/build-default-alpha-unknown-linux-gnu-nptl/dlfcn/libdl.so | LC_ALL=C sort|diff -u ../scripts/data/localplt-generic.data -

This command results in a nonempty diff which the test suite deems a failure.
I'll give the whole thing a shot with a newer glibc, too.
Comment 7 Tobias Klausmann (RETIRED) gentoo-dev 2009-06-28 17:14:53 UTC
Since I stabilized 2.9 today, I'm inclined to close this as WONTFIX if 2.9 prevails for, say, 30 days. Any objections?
Comment 8 Mark Loeser (RETIRED) gentoo-dev 2009-06-28 19:24:40 UTC
(In reply to comment #7)
> Since I stabilized 2.9 today, I'm inclined to close this as WONTFIX if 2.9
> prevails for, say, 30 days. Any objections?
> 

Fine by me :)
Comment 9 Matt Turner gentoo-dev 2009-06-28 19:29:42 UTC
No.
I want to do more testing, but I'm in another hemisphere now.

I think we may have the fix in the form of Ivan Kokshaysky's Futex implementation, but more testing is needed.

It doesn't make sense to close as wontfix when the situation is closer to wonttest.
Comment 10 Matt Turner gentoo-dev 2009-08-09 21:02:16 UTC
Notes:
Single processor system is UP1500
Dual processor system is DS20L
futex patches were merged for 2.6.30.

UP1500 with 2.6.31-r5
- all tst-robust{1..9} tests are successful
- all tst-robustpi{1..9} tests are successful

UP1500 with 2.6.29-gentoo-r5
- tst-robust8 prints "cannot support pshared robust mutexes"
- tst-robustpi{1..9} print "PI robust mutexes not supported"

DS20L with 2.6.30-gentoo-r4
- all tst-robust{1..9} tests are successful
- all tst-robustpi{1..9} tests are successful

DS20L with 2.6.25-gentoo-r6
- tst-robust8 prints "cannot support pshared robust mutexes"
- tst-robustpi{1..9} print "PI robust mutexes not supported"

I was never able to reproduce the hard lock experienced by yoswink, but the results are pretty clear. I guess the only case that I have not covered is SMP with more than two CPUs, but I think it's unlikely that the results are any different.

If anyone with a quad-alpha system would like to test, be my guest. I'll even send you the binaries to avoid having to build glibc and the test suite.

Please mark fixed.