Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 691330 - sys-apps/sandbox: fails open_static/1 test
Summary: sys-apps/sandbox: fails open_static/1 test
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Sandbox Maintainers
URL:
Whiteboard:
Keywords: TESTFAILURE
Depends on:
Blocks:
 
Reported: 2019-08-03 08:23 UTC by Toralf Förster
Modified: 2021-10-21 22:51 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge-info.txt (emerge-info.txt,15.56 KB, text/plain)
2019-08-03 08:23 UTC, Toralf Förster
Details
emerge-history.txt (emerge-history.txt,6.54 KB, text/plain)
2019-08-03 08:23 UTC, Toralf Förster
Details
environment (environment,85.16 KB, text/plain)
2019-08-03 08:23 UTC, Toralf Förster
Details
etc.portage.tbz2 (etc.portage.tbz2,11.39 KB, application/x-bzip)
2019-08-03 08:23 UTC, Toralf Förster
Details
logs.tbz2 (logs.tbz2,36.91 KB, application/x-bzip)
2019-08-03 08:23 UTC, Toralf Förster
Details
sys-apps:sandbox-2.18:20190803-042059.log (sys-apps:sandbox-2.18:20190803-042059.log,356.65 KB, text/plain)
2019-08-03 08:23 UTC, Toralf Förster
Details
temp.tbz2 (temp.tbz2,33.51 KB, application/x-bzip)
2019-08-03 08:23 UTC, Toralf Förster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Toralf Förster gentoo-dev 2019-08-03 08:23:00 UTC
ERROR: All 87 tests were run,
1 failed unexpectedly.
## -------------------------- ##
## testsuite.log was created. ##
## -------------------------- ##

  -------------------------------------------------------------------

  This is an unstable amd64 chroot image at a tinderbox (==build bot)
  name: 17.1_desktop_plasma-test-20190802-201521

  -------------------------------------------------------------------

gcc-config -l:
 [1] x86_64-pc-linux-gnu-9.1.0 *

Available Python interpreters, in order of preference:
  [1]   python3.6
  [2]   python2.7 (fallback)





emerge -qpvO sys-apps/sandbox
[ebuild     U ] sys-apps/sandbox-2.18 [2.13] ABI_X86="(32) (64) (-x32)"
Comment 1 Toralf Förster gentoo-dev 2019-08-03 08:23:03 UTC
Created attachment 585582 [details]
emerge-info.txt
Comment 2 Toralf Förster gentoo-dev 2019-08-03 08:23:05 UTC
Created attachment 585584 [details]
emerge-history.txt
Comment 3 Toralf Förster gentoo-dev 2019-08-03 08:23:08 UTC
Created attachment 585586 [details]
environment
Comment 4 Toralf Förster gentoo-dev 2019-08-03 08:23:11 UTC
Created attachment 585588 [details]
etc.portage.tbz2
Comment 5 Toralf Förster gentoo-dev 2019-08-03 08:23:14 UTC
Created attachment 585590 [details]
logs.tbz2
Comment 6 Toralf Förster gentoo-dev 2019-08-03 08:23:16 UTC
Created attachment 585592 [details]
sys-apps:sandbox-2.18:20190803-042059.log
Comment 7 Toralf Förster gentoo-dev 2019-08-03 08:23:19 UTC
Created attachment 585594 [details]
temp.tbz2
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-03 21:48:50 UTC
> 41: open_static/1                                   FAILED (open_static.at:1)
Probably a variant of bug #689376
Comment 9 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-03 22:04:39 UTC
(In reply to Sergei Trofimovich from comment #8)
> > 41: open_static/1                                   FAILED (open_static.at:1)
> Probably a variant of bug #689376

Woohoo \o/ It's a flaky test (source of flakiness unknown yet).
Running the following in sandbox eventually triggers the same test failure:

    $ while make check; do echo again; done
    <a few successful runs>
    41: open_static/1 FAILED (open_static.at:1)
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-06 07:36:43 UTC
Status update on how to reproduce the failure faster:

1. Make open_static helper loop until it fails:

 --- a/tests/open_static-1.sh
 +++ b/tests/open_static-1.sh
 @@ -8,5 +8,7 @@ open_static-0 3 ok O_RDONLY 0666 || exit 1
 
  mkdir deny || exit 1
  adddeny $PWD/deny
 -open_static-0 -1,EPERM deny/not-ok "O_WRONLY|O_CREAT" 0666 || exit 1
 +while :; do
 +    open_static-0 -1,EPERM deny/not-ok "O_WRONLY|O_CREAT" 0666 || exit 1
 +done
  test -e sandbox.log

2. Run test-suite without parallelism:
 sandbox/tests $ './testsuite' AUTOTEST_PATH='src:tests' --jobs=1 --verbose  2>&1 | nl
 ...
  6067	PASS: open("deny/not-ok", 0x41, 666) = -1 (wanted -1); errno = EPERM:1 [Operation not permitted] (wanted EPERM:1 [Operation not permitted])
  6068	PASS: open("deny/not-ok", 0x41, 666) = -1 (wanted -1); errno = EPERM:1 [Operation not permitted] (wanted EPERM:1 [Operation not permitted])
  6069	PASS: open("deny/not-ok", 0x41, 666) = -1 (wanted -1); errno = EPERM:1 [Operation not permitted] (wanted EPERM:1 [Operation not permitted])
  6070	FAIL: open("deny/not-ok", 0x41, 666) = 3 (wanted -1); errno = Success:0 [Success] (wanted EPERM:1 [Operation not permitted])
  6071	./open_static.at:1: exit code was 1, expected 0
  6072	1. open_static.at:1:  FAILED (open_static.at:1)

It usually takes a few seconds (a few thousands iterations) to fail.
Comment 11 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-06 22:25:19 UTC
Enabling debug tracing shows it's a bug in ptrace()/child-exec() handoff. Sometimes exec() in the child happens before parent starts tracing.

Debug patch:

--- a/libsandbox/trace.c
+++ b/libsandbox/trace.c
@@ -26,7 +26,7 @@ static long _do_ptrace(sb_ptrace_req_t request, const char *srequest, void *addr
 #ifdef DEBUG
 # define SBDEBUG 1
 #else
-# define SBDEBUG 0
+# define SBDEBUG 1
 #endif


Good:

++ open_static-0 -1,EPERM deny/not-ok 'O_WRONLY|O_CREAT' 0666
 * TRACE (pid=5617 ):trace_main*: parent waiting for child (pid=5 6TRACE (pid=1586) to signal
18):trace_main: child setting up ...
 * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGSTOP(19)
 * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGCONT(18)
 * TRACE (pid=5617):trace_loop: waiting for exec; status: 0x127f * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 * TRACE (pid=5617):trace_loop: waiting for exec; status: 0x857f * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 * TRACE (pid=5617):trace_loop: hit exec! * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 = 0
 * TRACE (pid=5617):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
...
 * TRACE (pid=5619):trace_loop: IDK:231(...) * TRACE (pid=5619):trace_child_signal: got sig SIGCHLD(17): code:CLD_EXITED(1) status:SIG???(0)
 = 0
++ :

Bad:

++ open_static-0 -1,EPERM deny/not-ok 'O_WRONLY|O_CREAT' 0666
 *  TRACE (pid=*56 2TRACE (pid=35):6trace_main2: parent waiting for child (pid=45):6trace_main2: child setting up ...
4) to signal
 * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIG???(0)
 * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 * TRACE (pid=5623):trace_loop: waiting for exec; status: 0x857f * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
 * TRACE (pid=5623):trace_loop: waiting for exec; status: 0x857f * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_TRAPPED(4) status:SIGTRAP(5)
...
 * TRACE (pid=5623):trace_loop: waiting for exec; status: 0x857f * TRACE (pid=5623):trace_child_signal: got sig SIGCHLD(17): code:CLD_EXITED(1) status:SIGHUP(1)
 = 1
++ exit 1.


Note: 'hit exec!' was never observed by parent. I guess it already happened before parent processed first event.
Comment 12 Sergei Trofimovich (RETIRED) gentoo-dev 2019-08-09 07:47:40 UTC
At least one problem in sandbox's ptrace state machine is in handling SIGCHLD/CLD_TRAPPED: normally tracer gets this signal when child is stopped by ptrace notification event (by signal). It that case info->si_status contains real signal that stopped tracee.

But in our case tracee gets info->si_status = 0:
 * TRACE (pid=5619):trace_loop: IDK:231(...) * TRACE (pid=5619):trace_child_signal: got sig SIGCHLD(17): code:CLD_EXITED(1) status:SIG???(0)

We should not forward back sig=0 to tracee as it will just resume tracee without signal delivery. That causes ptrace state machine to get out of sync across kernel and tracer.

The following workaround seems to make things strictly better:

--- a/libsandbox/trace.c
+++ b/libsandbox/trace.c
@@ -244,7 +244,9 @@ static void trace_child_signal(int signo, siginfo_t *info, void *context)
 			 * its problem, not ours, so don't whine about it.  We just
 			 * have to be sure to bubble it back up.  #265072
 			 */
-			do_ptrace(PTRACE_CONT, NULL, (void *)(long)info->si_status);
+			if (info->si_status != 0) {
+				do_ptrace(PTRACE_CONT, NULL, (void *)(long)info->si_status);
+			}
 			return;
 	}

I'll try to find meaning of SIGCHLD/CLD_TRAPPED/info->si_status=0 first.
Comment 13 SpanKY gentoo-dev 2021-10-21 22:51:30 UTC
i rewrote the state machine for 2.26, and leverage PTRACE_EVENT_EXEC more strongly now.  hopefully it's better now, but who knows.
https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=e9946633344c34b295485130f8f0fceb10fedc96