Summary: | app-admin/clustershell[test] blocks upgrading >=dev-libs/expat-2.6.0 due to not being compatible (was: -1.9.2 fails test (hang): test deadlocks with >=dev-libs/expat-2.6.0) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Agostino Sarubbo <ago> |
Component: | Current packages | Assignee: | Petr Vaněk <arkamar> |
Status: | IN_PROGRESS --- | ||
Severity: | normal | CC: | monsieurp, sam, sping |
Priority: | Normal | Keywords: | SECURITY, TESTFAILURE |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://github.com/cea-hpc/clustershell/pull/556 | ||
See Also: | https://github.com/python/cpython/pull/115623 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 923951, 926786, 938894 | ||
Attachments: | build.log |
Description
Agostino Sarubbo
2024-02-14 18:33:58 UTC
Created attachment 884992 [details]
build.log
build log and emerge --info
Error(s) that match a know pattern: ERROR:ClusterShell.Gateway:MessageProcessingError: Invalid "message" attributes: missing key "gateway" test groups when not allowed to read some YAML config file ... DEBUG:ClusterShell.NodeUtils:[Errno 13] Permission denied: '/var/tmp/portage/app-admin/clustershell-1.9.2/temp/cs-test-vyo42aiu/cs-test-24lq2nzx.yaml' This bug happens on amd64 as well. I was able to bisect it to commit 9429ff64f132 ("dev-lang/python: Bump to 3.10.13_p3") which allows to pull in dev-libs/expat-2.6.0. So, This issue happens only with dev-libs/expat-2.6.0, tests pass if I downgrade to expat-2.5.0. The issue is a deadlock in test_basic_noop from https://github.com/cea-hpc/clustershell/blob/v1.9.2/tests/TreeGatewayTest.py There are two threads sending data through pipes and the main process stuck waiting for more data, most probably in this function: https://github.com/cea-hpc/clustershell/blob/v1.9.2/tests/TreeGatewayTest.py#L131-L136 In case of the issue, strace looks like this: write(7, "<channel version=\"1.9.2\">\n", 26) = 26 read(8, "<?xml version=\"1.0\" encoding=\"ut"..., 4096) = 39 read(8, "<channel", 4096) = 8 read(8, " version=\"1.9.2\"", 4096) = 16 read(8, ">", 4096) = 1 read(8, While normally it looks like this: write(7, "<channel version=\"1.9.2\">\n", 26) = 26 read(8, "<?xml version=\"1.0\" encoding=\"ut"..., 4096) = 39 read(8, "<channel", 4096) = 8 read(8, " version=\"1.9.2\"", 4096) = 16 read(8, ">", 4096) = 1 write(7, "</channel>\n", 11) = 11 read(8, "</channel>", 4096) = 10 close(8) = 0 close(7) = 0 write(2, "ok\n", 3) = 3 I bissected the libexpat and the issue is triggered by this commit https://github.com/libexpat/libexpat/commit/9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1 ("Skip parsing after repeated partials on the same token") Any idea? Thanks for the detailed report! I will try to find time for a closer look on the weekend. I've had a chance at investigating this more by now. Options for next steps depend on how CPython upstream feels about my new and related pull request https://github.com/python/cpython/pull/115623 . Let's see about that first. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ba7cde1220c07ad93609373420797064a58176e1 commit ba7cde1220c07ad93609373420797064a58176e1 Author: Sebastian Pipping <sping@gentoo.org> AuthorDate: 2024-09-04 14:59:11 +0000 Commit: Sebastian Pipping <sping@gentoo.org> CommitDate: 2024-09-04 15:01:20 +0000 app-admin/clustershell: Protect against testing with >=dev-libs/expat-2.6.0 Bug: https://bugs.gentoo.org/924601 Signed-off-by: Sebastian Pipping <sping@gentoo.org> app-admin/clustershell/clustershell-1.9.2.ebuild | 1 + 1 file changed, 1 insertion(+) |