Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 910273 - sys-apps/sandbox: Large regression in webkit-gtk compile time between 2.29 and 2.30
Summary: sys-apps/sandbox: Large regression in webkit-gtk compile time between 2.29 an...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal critical (vote)
Assignee: Sandbox Maintainers
URL:
Whiteboard: Workaround in 2.37
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-12 20:49 UTC by Matt Turner
Modified: 2024-02-20 02:17 UTC (History)
11 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
patch to disable faccessat usage (from floppym) (stdin.diff,581 bytes, patch)
2023-07-12 21:45 UTC, Matt Turner
Details | Diff
patch to add AT_EACCESS to flags to faccessat (stdin.diff,445 bytes, patch)
2023-07-13 18:08 UTC, Matt Turner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Turner gentoo-dev 2023-07-12 20:49:34 UTC
tl;dr: The compile-time of webkit-gtk increases from around 9 minutes with sandbox-2.29 to  around 1 hour with sandbox-2.30.

This manifests in some cc1plus processes taking 12-14 minutes to complete.

It looks to me like the culprit is commit f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests").

Testing is in an nspawn container on a 64-core/128-thread machine.

sandbox-2.29
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  24334.73s user 3808.59s system 5373% cpu 8:43.76 total

> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  24584.34s user 3882.00s system 5307% cpu 8:56.31 total

sandbox-2.29 +
f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests")
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  20313.64s user 518388.95s system 11466% cpu 1:18:18.10 tota

sandbox-2.29 +
f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests")
944bc03489ef99f4b3c6c62fa12e9782139aa4c7 ("change FS calls to use 64-bit interfaces explicitly")
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  21102.82s user 406423.01s system 11326% cpu 1:02:54.56 total

sandbox-2.29 +
f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests")
784477e6c1e09328b363fc5fea4dd304fe053c04 ("libsandbox: reduce & inline the __64_{pre,post}.h headers")
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  22384.74s user 332820.88s system 11128% cpu 53:11.96 total

sandbox-2.30-r1
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  21919.23s user 277294.56s system 10961% cpu 45:29.59 total

sandbox-2.36
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  21546.48s user 357539.98s system 11041% cpu 57:13.25 total

sandbox-2.36 +
revert 944bc03489ef99f4b3c6c62fa12e9782139aa4c7 ("change FS calls to use 64-bit interfaces explicitly")
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  23306.28s user 254991.92s system 10768% cpu 43:04.47 total

sandbox-2.36 +
revert 609dd64e6e88b8abbbd424c24e5e40abe95cdb1c ("libsbutil: add sb_exists function") (for conflict resolution purposes)
revert 784477e6c1e09328b363fc5fea4dd304fe053c04 ("libsandbox: reduce & inline the __64_{pre,post}.h headers")
revert 944bc03489ef99f4b3c6c62fa12e9782139aa4c7 ("change FS calls to use 64-bit interfaces explicitly")
revert f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests")
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  26094.12s user 3915.83s system 5105% cpu 9:47.77 total
Comment 1 Joakim Tjernlund 2023-07-12 21:05:00 UTC
Only webkit-gtk affected ?
Comment 2 Matt Turner gentoo-dev 2023-07-12 21:31:00 UTC
Whereas 128 jobs with this configuration gave a time of 

> sandbox ninja  21102.82s user 406423.01s system 11326% cpu 1:02:54.56 total

32 jobs is significantly faster:

sandbox-2.29 +
f63fea69097e37424c700f820d25591747a15752 ("libsandbox/libsbutil: use faccessat for file-existence tests")
> webkitgtk-2.41.6_build # time sandbox ninja -j32
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja -j32  17892.33s user 52597.15s system 2960% cpu 39:41.14 total

So... some serialization in faccessat causing basically a denial of service?
Comment 3 Matt Turner gentoo-dev 2023-07-12 21:45:49 UTC
Created attachment 865440 [details, diff]
patch to disable faccessat usage (from floppym)

sandbox-2.36 + attached patch to disable faccessat usage
> webkitgtk-2.41.6_build # time sandbox ninja           
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  24374.68s user 3827.28s system 5216% cpu 9:00.67 total

So it's definitely faccessat causing some sort of problem.
Comment 4 Matt Turner gentoo-dev 2023-07-12 21:47:41 UTC
(In reply to Joakim Tjernlund from comment #1)
> Only webkit-gtk affected ?

I do not know. This is just the package that I noticed taking much longer than normal to build.
Comment 5 cJ 2023-07-13 01:25:28 UTC
It's not just webkit-gtk; as I was updating my laptop (which has *a lot* of packages) I noticed that pretty much everything was slowed down, with most CPU time being system time. But webkit-gtk was easy to notice.
Comment 6 Joakim Tjernlund 2023-07-13 09:05:29 UTC
This bit was removed when adding faccessat:
-	/* Check incoming args against common *at issues */
-	char dirfd_path[SB_PATH_MAX];
-	if (!sb_common_at_pre_check(func, &pathname, dirfd, dirfd_path, sizeof(dirfd_path)))
-		return false;

Perhaps that has something to do with the lost performance?
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-07-13 09:08:45 UTC
Alexander Monakov gave some great analysis at https://social.treehouse.systems/@amonakov@mastodon.gamedev.place/110705631844458392 which almost certainly nails it.
Comment 8 Matt Turner gentoo-dev 2023-07-13 18:08:14 UTC
Created attachment 865461 [details, diff]
patch to add AT_EACCESS to flags to faccessat

Alexander Monakov suggested adding AT_EACCESS to the flags of faccessat. See attached patch.

sandbox-2.36 + patch
> webkitgtk-2.41.6_build # time sandbox ninja
> [0/2] Re-checking globbed directories...
> [6791/6791] Generating WebKit-6.0.typelib
> sandbox ninja  19962.77s user 571988.19s system 11574% cpu 1:25:14.45 total

Surprisingly that doesn't resolve the issue.
Comment 9 Joakim Tjernlund 2023-07-13 18:38:12 UTC
(In reply to Matt Turner from comment #8)
> Created attachment 865461 [details, diff] [details, diff]
> patch to add AT_EACCESS to flags to faccessat
> 
> Alexander Monakov suggested adding AT_EACCESS to the flags of faccessat. See
> attached patch.
> 
> sandbox-2.36 + patch
> > webkitgtk-2.41.6_build # time sandbox ninja
> > [0/2] Re-checking globbed directories...
> > [6791/6791] Generating WebKit-6.0.typelib
> > sandbox ninja  19962.77s user 571988.19s system 11574% cpu 1:25:14.45 total
> 
> Surprisingly that doesn't resolve the issue.

Maybe try adding back this removed chunk?
-	/* Check incoming args against common *at issues */
-	char dirfd_path[SB_PATH_MAX];
-	if (!sb_common_at_pre_check(func, &pathname, dirfd, dirfd_path, sizeof(dirfd_path)))
-		return false;
Comment 10 Mike Gilbert gentoo-dev 2023-07-13 18:59:55 UTC
(In reply to Joakim Tjernlund from comment #9)

I'm pretty sure that code has nothing to do with the performance regression.
Comment 11 Matt Turner gentoo-dev 2023-07-13 19:15:44 UTC
(In reply to Joakim Tjernlund from comment #9)
> Maybe try adding back this removed chunk?
> -	/* Check incoming args against common *at issues */
> -	char dirfd_path[SB_PATH_MAX];
> -	if (!sb_common_at_pre_check(func, &pathname, dirfd, dirfd_path,
> sizeof(dirfd_path)))
> -		return false;

That doesn't help.
Comment 12 Alexander Monakov 2023-07-14 09:01:33 UTC
I can't seem to reproduce the problem (but my CPU has fewer cores). Note that non-emulated faccessat2 is available only since linux-5.8 and glibc-2.33 — I assume yours are newer than that? I'm testing on linux-6.2.

Can you inspect output of 'qlop' to find more "lightweight" examples where build time has noticeably increased? Would be nice to have an testcase that builds under a minute and has few dependencies, unlike webkit-gtk.

Can you produce a perf profile while the slow rebuild of webkit-gtk is running? If you don't want to run perf as root, you'll need

sysctl kernel.kptr_restrict=0
sysctl kernel.perf_event_paranoid=0

Then while rebuild is running

perf record -a -e cycles/period=300000/k -g sleep 10

perf report --stdio --header > perf-report.txt
Comment 13 Matt Turner gentoo-dev 2023-07-17 02:02:16 UTC
media-libs/libva-intel-media-driver also seems to suffer from this -- and it builds ~1500 objects rather than the ~7000 of webkit-gtk.

I haven't had a chance to try to run perf yet, sorry.
Comment 14 Alexander Monakov 2023-07-17 09:51:20 UTC
Yeah, ok, I could repro with libva-intel-media-driver and it is really... sigh.

libsandbox does not intercept any *stat* functions so the fstatat64 call goes straight to libc.

libsandbox does, on the other hand, intercept faccessat, so the call to faccessat in sb_exists goes **to the sandbox's wrapper**. Comparing just the top 10 syscalls, with faccessat:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.12   84.859715    14143285         6         1 wait4
  5.34    5.203671           2   2294799   2294760 readlink
  3.19    3.109788           2   1297487    138209 newfstatat
  2.46    2.395624           5    417262           munmap
  1.35    1.315728           3    417765           mmap
  0.51    0.500804           3    138884    138194 faccessat2
  0.00    0.004341           4       945           read
  0.00    0.002871           3       928       466 stat
  0.00    0.002421           3       681        12 openat
  0.00    0.001253           1       675           close

Without faccessat:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.20    3.572955      595492         6         1 wait4
  8.64    0.342247           2    139101    138263 newfstatat
  0.54    0.021544           1     13028     12989 readlink
  0.17    0.006695           4      1493           munmap
  0.11    0.004496           2      1996           mmap
  0.09    0.003591           3       945           read
  0.09    0.003369           3       928       466 stat
  0.04    0.001683           2       751        82 openat
  0.02    0.000893           1       675           close
  0.02    0.000737           0      1113           write


As you can see there's about three mmap/munmap per each faccessat, plus more than 10 readlink syscalls (plus half as many newfstatat invocations).

FWIW I don't quite see the logic in what sandbox is intercepting. Certainly intercepting 'access' and 'faccessat' without also readlink/fstat/lstat/stat makes no sense to me.
Comment 15 Pablo Yanez Trujillo 2023-07-17 12:56:49 UTC
I can confirm that sandbox-2.32 leads to double and triple compile times, specially of C++ packages.

Usually my system needs 36 minutes to compile webkit-gtk, but yesterday I stopped compiling after 2 hours. After two hours it was still compiling file 3678 of 6771 files and it was compiling at a rate of 10-15 seconds pro .cpp file. Needless to say, this would have taken ages to finish.

I created this thread and https://forums.gentoo.org/viewtopic-p-8795730.html Sam pointed me to this bug.

I managed to find a mirror (https://ftp.uni-hannover.de/gentoo/distfiles/6a/) of the university of Hannover that hosted sandbox-2.29.tar.xz and from https://anongit.gentoo.org/git/repo/gentoo.git I managed to get the sandbox-2.29.ebuild. I added it to my local overlay and added 
>sys-apps/sandbox-2.29 to /etc/portage/package.mask, then I downgraded to sandbox-2.29.

Afterwards I restarted compilation of webkit-gtk and it finished in 36 minutes, as expected.

I also tried something else where yesterday also took longer than expected (dev-qt/designer)

$ genlop -t dev-qt/designer
  ...

     Sun Jan 22 20:13:21 2023 >>> dev-qt/designer-5.15.8
       merge time: 1 minute and 42 seconds.

     Sun Apr 30 20:18:28 2023 >>> dev-qt/designer-5.15.9
       merge time: 2 minutes and 26 seconds.

     Mon Jul 17 00:00:17 2023 >>> dev-qt/designer-5.15.10
       merge time: 6 minutes and 8 seconds.

     Mon Jul 17 14:54:31 2023 >>> dev-qt/designer-5.15.10
       merge time: 2 minutes and 26 seconds.


as you can see, on Jul 17 00:00:17  (with sandbox-2.32) it took 3 times longer than usual. On Jul 17 14:54:31 on sandbox-2.29 it took once again 2 minutes.
Comment 16 Larry the Git Cow gentoo-dev 2023-07-17 13:54:53 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=6a6a6a6c9680e5868544887a7ab4d141833abfb6

commit 6a6a6a6c9680e5868544887a7ab4d141833abfb6
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-07-17 13:43:51 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-07-17 13:43:51 +0000

    sb_exists: drop use of faccessat
    
    faccessat appears to perform quite poorly under certain conditions.
    Go back to using fstatat until this can be debugged.
    
    Bug: https://bugs.gentoo.org/910273
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 libsbutil/sb_exists.c | 10 ----------
 1 file changed, 10 deletions(-)
Comment 17 Larry the Git Cow gentoo-dev 2023-07-17 14:03:47 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=71475b194ac37e0541265906f63d95ab597480d9

commit 71475b194ac37e0541265906f63d95ab597480d9
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-07-17 14:02:44 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-07-17 14:03:26 +0000

    sys-apps/sandbox: add 2.37
    
    Bug: https://bugs.gentoo.org/910273
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 sys-apps/sandbox/Manifest            |  1 +
 sys-apps/sandbox/sandbox-2.37.ebuild | 64 ++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)
Comment 18 Mike Gilbert gentoo-dev 2023-07-17 14:19:02 UTC
(In reply to Alexander Monakov from comment #14)

Thanks for the analysis.

I think we want to intercept access and faccessat so that we can return a fake response to programs that pass W_OK in the mode parameter.

I'm actually surprised that the process running ptrace is also being sandboxed. That seems like a bug in itself.
Comment 19 tt_1 2023-07-17 15:33:07 UTC
sys-libs/glibc has also almost doubled in compile time, from 4 minutes 38 seconds with sandbox 2.29 to 8 minutes and 9 seconds with sandbox-2.32
Comment 20 Alexander Monakov 2023-07-17 19:44:44 UTC
(In reply to Mike Gilbert from comment #18)
> I think we want to intercept access and faccessat so that we can return a
> fake response to programs that pass W_OK in the mode parameter.

But you need to reject the attempt to open anyway. What happens if you drop these wrappers?

> I'm actually surprised that the process running ptrace is also being
> sandboxed. That seems like a bug in itself.

What are you talking about here? What is "the process running ptrace" and what makes you think it's sandboxed?
Comment 21 Mike Gilbert gentoo-dev 2023-07-17 20:11:24 UTC
(In reply to Alexander Monakov from comment #20)
> (In reply to Mike Gilbert from comment #18)
> > I think we want to intercept access and faccessat so that we can return a
> > fake response to programs that pass W_OK in the mode parameter.
> 
> But you need to reject the attempt to open anyway. What happens if you drop
> these wrappers?

If the intent is to prevent programs from writing outside the sandbox, then yes, we don't need to wrap access().

However, there are likely programs that will change their behavior based what access() says. If access("/bad/path", W_OK) and open("/bad/path", O_WRONLY) give different results, poorly written programs may behave strangely.

In any case, the pull request in See Also should resolve the issue for faccessat(..., F_OK, ...) by returning early from the wrapper before attempting to resolve any paths.

> > I'm actually surprised that the process running ptrace is also being
> > sandboxed. That seems like a bug in itself.
> 
> What are you talking about here? What is "the process running ptrace" and
> what makes you think it's sandboxed?

Sorry, I got my wires crossed. I spent most of last week debugging an issue in the ptrace-based sandbox code in libsandbox/trace.c. That has nothing to do with this bug.
Comment 22 Alexander Monakov 2023-07-17 20:37:37 UTC
If catering to "poorly written programs" is giving rise to these kinds of issues, maybe it's time to reconsider ;)

How did we arrive to this situation, anyway? Did sandbox maintainers forget that unlike xstat, faccessat is wrapped, and nobody thought to check?

And why is sandbox performing existence check immediately before opening the file? What value does the attempt provide over just opening the file?
Comment 23 Mike Gilbert gentoo-dev 2023-07-17 20:50:57 UTC
(In reply to Alexander Monakov from comment #22)

The code was written by a developer who is now inactive. You are welcome to dig through the commit log for clues.
Comment 24 Mike Gilbert gentoo-dev 2023-07-17 20:57:01 UTC
We should probably create an unwrapped version of faccessat, similar to sb_unwrapped_open.
Comment 25 Alexander Monakov 2023-07-17 21:27:10 UTC
Both of the questionable existence checks in sb_{openat,fopen}_pre_check appear as parts of big-ish commits, without explanation and direct relation to the purpose of the commit.
Comment 26 Mike Gilbert gentoo-dev 2023-07-18 15:23:54 UTC
(In reply to Alexander Monakov from comment #22)
> How did we arrive to this situation, anyway? Did sandbox maintainers forget
> that unlike xstat, faccessat is wrapped, and nobody thought to check?

Yeah, my guess is that it was overlooked by mistake.

> And why is sandbox performing existence check immediately before opening the
> file? What value does the attempt provide over just opening the file?

These comments in libsandbox/pre_check_openat.c give a clue:

/* If we're not trying to create, fail normally if file does not stat */
/* Doesn't exist -> skip permission checks */

My translation:

If the file does not exist, we want the sandboxed program to get errno = ENOENT when it tries to open a non-existant file. This mimics the behavior the program would see if it were not sandboxed.

Without the existence check, libsandbox would check the path against the allowlist (SANDBOX_READ/SANDBOX_WRITE) and would set errno = EACCES if the path isn't allowed.

Basically, we are trying to interfere with the sandboxed program as little as possible to avoid unintended breaks in behavior.
Comment 27 Larry the Git Cow gentoo-dev 2023-08-04 00:26:01 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=0317bbe09fe23e4bd972ee254f14817def701731

commit 0317bbe09fe23e4bd972ee254f14817def701731
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-07-17 15:03:13 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-08-04 00:20:37 +0000

    libsbutil: add sbio_faccessat and use it in sb_exists
    
    sbio_faccessat allows libsbutil to access the unwrapped version of
    faccessat when called from libsandbox.
    
    Using faccessat in place of fstatat seems to give a small boost in
    performance.
    
    Pass AT_EACCESS faccessat to enable a faster path if uid != euid.
    
    Bug: https://bugs.gentoo.org/910273
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 libsandbox/libsandbox.c |  1 +
 libsandbox/wrappers.h   |  2 ++
 libsbutil/sb_exists.c   | 10 ++++++++++
 libsbutil/sbutil.h      |  1 +
 src/sandbox.c           |  1 +
 5 files changed, 15 insertions(+)

https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=8d6a4839ebd909903691e4a71d6a94b3809adc82

commit 8d6a4839ebd909903691e4a71d6a94b3809adc82
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-07-17 14:55:27 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-08-03 19:12:42 +0000

    libsandbox: skip checking access() without W_OK or R_OK mode
    
    If access/faccessat is called with F_OK or X_OK in the mode argument,
    there is no need to check the path.
    
    Bug: https://bugs.gentoo.org/910273
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 libsandbox/libsandbox.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 28 Larry the Git Cow gentoo-dev 2023-08-06 00:51:59 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=761c8d19f1549cd137d2b0a0cbcb12f569dfba5e

commit 761c8d19f1549cd137d2b0a0cbcb12f569dfba5e
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2023-08-06 00:50:57 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2023-08-06 00:51:49 +0000

    sys-apps/sandbox: add 2.38
    
    Closes: https://bugs.gentoo.org/906234
    Closes: https://bugs.gentoo.org/910273
    Closes: https://bugs.gentoo.org/910561
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 sys-apps/sandbox/Manifest            |  1 +
 sys-apps/sandbox/sandbox-2.38.ebuild | 64 ++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)