https://blogs.gentoo.org/ago/2020/07/04/gentoo-tinderbox/ Issue: app-admin/clustershell-1.9.2 fails tests. Discovered on: x86 (internal ref: tinderbox_x86) System: GCC-14-SYSTEM (https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#GCC-14) Info about the issue: https://wiki.gentoo.org/wiki/Project:Tinderbox/Common_Issues_Helper#CF0015
Created attachment 884992 [details] build.log build log and emerge --info
Error(s) that match a know pattern: ERROR:ClusterShell.Gateway:MessageProcessingError: Invalid "message" attributes: missing key "gateway" test groups when not allowed to read some YAML config file ... DEBUG:ClusterShell.NodeUtils:[Errno 13] Permission denied: '/var/tmp/portage/app-admin/clustershell-1.9.2/temp/cs-test-vyo42aiu/cs-test-24lq2nzx.yaml'
This bug happens on amd64 as well. I was able to bisect it to commit 9429ff64f132 ("dev-lang/python: Bump to 3.10.13_p3") which allows to pull in dev-libs/expat-2.6.0. So, This issue happens only with dev-libs/expat-2.6.0, tests pass if I downgrade to expat-2.5.0. The issue is a deadlock in test_basic_noop from https://github.com/cea-hpc/clustershell/blob/v1.9.2/tests/TreeGatewayTest.py There are two threads sending data through pipes and the main process stuck waiting for more data, most probably in this function: https://github.com/cea-hpc/clustershell/blob/v1.9.2/tests/TreeGatewayTest.py#L131-L136 In case of the issue, strace looks like this: write(7, "<channel version=\"1.9.2\">\n", 26) = 26 read(8, "<?xml version=\"1.0\" encoding=\"ut"..., 4096) = 39 read(8, "<channel", 4096) = 8 read(8, " version=\"1.9.2\"", 4096) = 16 read(8, ">", 4096) = 1 read(8, While normally it looks like this: write(7, "<channel version=\"1.9.2\">\n", 26) = 26 read(8, "<?xml version=\"1.0\" encoding=\"ut"..., 4096) = 39 read(8, "<channel", 4096) = 8 read(8, " version=\"1.9.2\"", 4096) = 16 read(8, ">", 4096) = 1 write(7, "</channel>\n", 11) = 11 read(8, "</channel>", 4096) = 10 close(8) = 0 close(7) = 0 write(2, "ok\n", 3) = 3 I bissected the libexpat and the issue is triggered by this commit https://github.com/libexpat/libexpat/commit/9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1 ("Skip parsing after repeated partials on the same token") Any idea?
Thanks for the detailed report! I will try to find time for a closer look on the weekend.
I've had a chance at investigating this more by now. Options for next steps depend on how CPython upstream feels about my new and related pull request https://github.com/python/cpython/pull/115623 . Let's see about that first.