Summary: | sys-libs/glibc-2.30-r8: arm64 test failures (nptl/{tst-rwlock18,tst-rwlock9,tst-stack4}) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Sam James <sam> |
Component: | Current packages | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | arm64 |
Priority: | Normal | Keywords: | TESTFAILURE |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Sam James
![]() ![]() ![]() ![]() Build log is 60MB. Will upload somewhere shortly... ---- XPASS: elf/tst-latepthread UNSUPPORTED: elf/tst-ldconfig-bad-aux-cache UNSUPPORTED: elf/tst-pldd XPASS: elf/tst-protected1a XPASS: elf/tst-protected1b UNSUPPORTED: iconv/tst-gconv-init-failure UNSUPPORTED: io/tst-getcwd-abspath FAIL: io/tst-open-tmpfile XPASS: locale/tst-locale-locpath UNSUPPORTED: math/test-fesetexcept-traps UNSUPPORTED: math/test-fexcept-traps UNSUPPORTED: math/test-nearbyint-except-2 FAIL: misc/tst-pkey UNSUPPORTED: nptl/test-cond-printers UNSUPPORTED: nptl/test-condattr-printers UNSUPPORTED: nptl/test-mutex-printers UNSUPPORTED: nptl/test-mutexattr-printers UNSUPPORTED: nptl/test-rwlock-printers UNSUPPORTED: nptl/test-rwlockattr-printers FAIL: nptl/tst-rwlock18 FAIL: nptl/tst-rwlock9 UNSUPPORTED: nptl/tst-sem16 UNSUPPORTED: nss/tst-nss-db-endgrent UNSUPPORTED: nss/tst-nss-db-endpwent UNSUPPORTED: nss/tst-nss-files-alias-leak UNSUPPORTED: nss/tst-nss-files-alias-truncated UNSUPPORTED: nss/tst-nss-files-hosts-erange UNSUPPORTED: nss/tst-nss-files-hosts-getent UNSUPPORTED: nss/tst-nss-files-hosts-long UNSUPPORTED: nss/tst-nss-test3 UNSUPPORTED: posix/tst-spawn4-compat UNSUPPORTED: posix/tst-sysconf-empty-chroot UNSUPPORTED: resolv/tst-resolv-threads FAIL: rt/tst-shm UNSUPPORTED: rt/tst-shm-cancel UNSUPPORTED: sunrpc/tst-svc_register Summary of test results: 5 FAIL 5945 PASS 27 UNSUPPORTED 18 XFAIL 4 XPASS make[1]: *** [Makefile:413: tests] Error 1 make[1]: Leaving directory '/var/tmp/portage/sys-libs/glibc-2.30-r8/work/glibc-2.30' make: *** [Makefile:9: check] Error 2 ^[[31;01m*^[[0m ERROR: sys-libs/glibc-2.30-r8::gentoo failed (test phase): ^[[31;01m*^[[0m emake failed Failures from 2.29-r8: UNSUPPORTED: elf/tst-pldd XPASS: elf/tst-protected1a XPASS: elf/tst-protected1b UNSUPPORTED: iconv/tst-gconv-init-failure UNSUPPORTED: io/tst-getcwd-abspath FAIL: io/tst-open-tmpfile UNSUPPORTED: math/test-fesetexcept-traps UNSUPPORTED: math/test-fexcept-traps UNSUPPORTED: math/test-nearbyint-except-2 FAIL: misc/tst-pkey UNSUPPORTED: nptl/test-cond-printers UNSUPPORTED: nptl/test-condattr-printers UNSUPPORTED: nptl/test-mutex-printers UNSUPPORTED: nptl/test-mutexattr-printers UNSUPPORTED: nptl/test-rwlock-printers UNSUPPORTED: nptl/test-rwlockattr-printers UNSUPPORTED: nptl/tst-sem16 FAIL: nptl/tst-stack4 UNSUPPORTED: nss/tst-nss-db-endgrent UNSUPPORTED: nss/tst-nss-db-endpwent UNSUPPORTED: nss/tst-nss-files-alias-leak UNSUPPORTED: nss/tst-nss-files-hosts-erange UNSUPPORTED: nss/tst-nss-files-hosts-getent UNSUPPORTED: nss/tst-nss-test3 UNSUPPORTED: posix/tst-spawn4-compat UNSUPPORTED: posix/tst-sysconf-empty-chroot UNSUPPORTED: resolv/tst-resolv-threads FAIL: rt/tst-shm UNSUPPORTED: rt/tst-shm-cancel UNSUPPORTED: sunrpc/tst-svc_register Summary of test results: 4 FAIL 5809 PASS 24 UNSUPPORTED 17 XFAIL 2 XPASS ---- So, new failures?: FAIL: nptl/tst-rwlock18 FAIL: nptl/tst-rwlock9 Can you attach *.out files? They usually contain actual tests' stderr that explais the failure. Okay, here's an updated one: FAIL: misc/tst-pkey UNSUPPORTED: nptl/test-cond-printers UNSUPPORTED: nptl/test-condattr-printers UNSUPPORTED: nptl/test-mutex-printers UNSUPPORTED: nptl/test-mutexattr-printers UNSUPPORTED: nptl/test-rwlock-printers UNSUPPORTED: nptl/test-rwlockattr-printers FAIL: nptl/tst-rwlock18 FAIL: nptl/tst-rwlock9 FAIL: nptl/tst-stack4 UNSUPPORTED: nss/tst-nss-db-endgrent UNSUPPORTED: nss/tst-nss-db-endpwent UNSUPPORTED: nss/tst-nss-files-alias-leak UNSUPPORTED: nss/tst-nss-files-alias-truncated UNSUPPORTED: nss/tst-nss-files-hosts-erange UNSUPPORTED: nss/tst-nss-files-hosts-getent UNSUPPORTED: nss/tst-nss-files-hosts-long UNSUPPORTED: nss/tst-nss-test3 UNSUPPORTED: posix/tst-spawn4-compat UNSUPPORTED: posix/tst-sysconf-empty-chroot UNSUPPORTED: resolv/tst-resolv-threads UNSUPPORTED: sunrpc/tst-svc_register Summary of test results: 4 FAIL 5948 PASS 25 UNSUPPORTED 18 XFAIL 4 XPASS ---- Note that only the nptl ones matter here. The rest (inc. tst-pkey) are known to fail by upstream. Attaching each of the 3 logs now. (In reply to Sam James (sec padawan) from comment #5) > Attaching each of the 3 logs now. https://cmpct.info/~sam/gentoo/tst-rwlock9.out https://cmpct.info/~sam/gentoo/tst-rwlock18.out https://cmpct.info/~sam/gentoo/tst-stack4.out (In reply to Sam James (sec padawan) from comment #6) > (In reply to Sam James (sec padawan) from comment #5) > > Attaching each of the 3 logs now. > > https://cmpct.info/~sam/gentoo/tst-rwlock9.out > https://cmpct.info/~sam/gentoo/tst-rwlock18.out > https://cmpct.info/~sam/gentoo/tst-stack4.out These don't seem to provide much more detail. You can pisk a command from build.log that ran the test and explore why each test fails. Looking at tst-rwlock9 test now: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/tst-rwlock9.c;h=408bbcdd5d47f93337c87cdb0be6e3cb20115efb;hb=HEAD It's not a new test. tst-rwlock9 is a simple timed rwlock test: - it spawns 15 writer threads, 15 readers threads - against a single rwlock with priority for writes - each write thread acquires the lock , waits for 1 ms and releases it. repeats 10 times - each read thread acquires the lock , waits for 1 ms and releases it. repeats 15 times - each test is ran for 3 clock types My understanding is that: - best case is when all writers sequentially acquire the lock (15 * 10 * 1 ms = 150ms), then all readers in parallel acquire the lock (15 * 10 * 1 ms = 150ms). All in all 300 ms. 3 x 300ms = 900ms. - worst case: readers and writers are interspersed and effectively serialize one another. Still not too bad: 15 * 10 * 1ms writers + 15 * 15 * 1 ms = 375 ms. 3 x 375ms = 1.125s. This does not account for actual locking overhead, which might be very significant. My math is off somewhere as on amd64 tests finish in 537ms for me: $ time { build-x86-x86_64-pc-linux-gnu-nptl/nptl/tst-rwlock9 > /dev/null; } real 0m0,537s user 0m0,049s sys 0m0,099s But that at least matches order of magnitude for best case. On 96-core arm64 tests timeout in 30 seconds for me. That looks wrong. rwlock should be somewhat fair for mostly sleeping threads. We have 25 contending cores though and they all have the same timedlock timeout. It's also probably not an effect of contention across cores: $ time { nptl/tst-rwlock9 >o; } real 0m30.121s user 6m53.276s sys 0m5.408s $ time { taskset -c 0 nptl/tst-rwlock9 >o; } real 0m30.108s user 0m20.936s sys 0m8.932s That's quite a lot of system time. As if nanosleep was implemented as busy non-preemptible loop. I also created up-to-date chroot with gcc-9.3.0 and now my tests pass in 0.6 seconds as expected: """ # time { env GCONV_PATH=/var/tmp/portage/sy... real 0m0.633s user 0m0.100s sys 0m0.404s """ Can you retest against latest stable gcc? (In reply to Sergei Trofimovich from comment #10) > Can you retest against latest stable gcc? Specificall since gcc-9.2.0 we had a few potential fixes: https://gcc.gnu.org/PR92692 / https://sourceware.org/PR24924 which mentions the same tests. With gcc-9.3.0 I'm down to 1 test failure: """ FAIL: nptl/tst-stack4 Summary of test results: 1 FAIL 5949 PASS 25 UNSUPPORTED 21 XFAIL 3 XPASS """ Looking at it now: cat $ nptl/tst-stack4.out Didn't expect signal from child: got `Aborted' Test on it's own takes 14 seconds: real 0m14.315s user 0m7.176s sys 0m9.136s Yet another threaded test: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/tst-stack4.c;h=3030f75027f837c92f4c833a146d4148119a898f;hb=HEAD - it has 100s timeout (ok, 10s is a lot) - test is flaky: If it run in in the loop it passes most of time time, but occasionally fails with various crashes related to seeming memory corruption # while :; do date; env GCONV_PATH=/var/tmp/portage/sys...; done Sun 03 May 2020 10:42:08 AM UTC Sun 03 May 2020 10:42:21 AM UTC Sun 03 May 2020 10:42:34 AM UTC Didn't expect signal from child: got `Segmentation fault' Sun 03 May 2020 10:42:43 AM UTC Sun 03 May 2020 10:42:56 AM UTC malloc(): invalid size (unsorted) Didn't expect signal from child: got `Aborted' Sun 03 May 2020 10:42:57 AM UTC Sun 03 May 2020 10:43:12 AM UTC Sun 03 May 2020 10:43:28 AM UTC Sun 03 May 2020 10:43:46 AM UTC Sun 03 May 2020 10:44:01 AM UTC malloc(): invalid size (unsorted) Didn't expect signal from child: got `Aborted' Sun 03 May 2020 10:44:03 AM UTC Sun 03 May 2020 10:44:18 AM UTC (In reply to Sergei Trofimovich from comment #12) > FAIL: nptl/tst-stack4 https://sourceware.org/PR19329 looks relevant. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f55f2be1bd6791465453c7b0dfc60bf4c3c75613 commit f55f2be1bd6791465453c7b0dfc60bf4c3c75613 Author: Sergei Trofimovich <slyfox@gentoo.org> AuthorDate: 2020-05-03 11:01:18 +0000 Commit: Sergei Trofimovich <slyfox@gentoo.org> CommitDate: 2020-05-03 11:01:35 +0000 sys-libs/glibc: disable flaky tst-stack4, bug #719674 tst-stack4 exposes known race condition in glibc (https://sourceware.org/PR19329). Let's disable this test until it's fixed upstream. Reported-by: Sam James Bug: https://bugs.gentoo.org/719674 Bug: https://sourceware.org/PR19329 Package-Manager: Portage-2.3.99, Repoman-2.3.22 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> sys-libs/glibc/glibc-2.30-r8.ebuild | 5 +++++ sys-libs/glibc/glibc-2.31-r2.ebuild | 5 +++++ sys-libs/glibc/glibc-9999.ebuild | 5 +++++ 3 files changed, 15 insertions(+) I assume tests don't fail anymore. |