447970 – sys-apps/sandbox makes autoconf's getcwd long path test take hours

Bug 447970 - sys-apps/sandbox makes autoconf's getcwd long path test take hours

Summary: sys-apps/sandbox makes autoconf's getcwd long path test take hours

Status:	CONFIRMED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Sandbox (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Sandbox Maintainers

URL:
Whiteboard:
Keywords:

Duplicates (1):	502048 (view as bug list)
Depends on:
Blocks:

Reported:	2012-12-20 17:20 UTC by Piotr Karbowski (RETIRED)
Modified:	2024-04-02 10:47 UTC (History)
CC List:	18 users (show)

See Also:	922652 149982 98419 261936 553092 548250 598806 925031
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Piotr Karbowski (RETIRED) gentoo-dev

2012-12-20 17:20:12 UTC

As example I will use app-arch/cpio-2.11-r1, with sandbox enabled (the portage's default) the configure script hangs at 'checking whether getcwd handles long file names properly.' for a long time, in my case it take more than 2 hours, and the underlaying conftest process take 100% cpu time the whole time.

with FEATURES="-sandbox" it does do the test in seconds.

I can reproduce it on only one device, a thinclient with luks-encrypted lvm-based rootfs on usb 2.0 disk with a 1GHz 64bit mobile sempron and 1G of ddr2 memory. (hp t5730). The issue is 100% reproducable and happens on cpio every time, but does not on others packages (I don't have list of it but it used to happen on some of others too).

It also has been reported back in 2005 on forums gentoo https://forums.gentoo.org/viewtopic-p-2454506.html



Reproducible: Always




### emerge info:
Portage 2.2.0_alpha149 (default/linux/amd64/10.0, gcc-4.5.4, glibc-2.15-r3, 3.6.3-frontier x86_64)
=================================================================
System uname: Linux-3.6.3-frontier-x86_64-Mobile_AMD_Sempron-tm-_Processor_2100+-with-gentoo-2.2
Timestamp of tree: Thu, 20 Dec 2012 16:00:01 +0000
ld GNU ld (GNU Binutils) 2.22
ccache version 3.1.8 [enabled]
app-shells/bash:          4.2_p39
dev-lang/python:          2.7.3-r2, 3.2.3-r1
dev-util/ccache:          3.1.8
dev-util/cmake:           2.8.10.2
dev-util/pkgconfig:       0.27.1
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.6
sys-devel/autoconf:       2.13, 2.69
sys-devel/automake:       1.9.6-r3, 1.12.5
sys-devel/binutils:       2.22-r1
sys-devel/gcc:            4.5.4
sys-devel/gcc-config:     1.8
sys-devel/libtool:        2.4.2
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.6 (virtual/os-headers)
sys-libs/glibc:           2.15-r3
Repositories: gentoo foo-overlay
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -fno-var-tracking -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=256 -mtune=k8 -mno-3dnow"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.4/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -fno-var-tracking -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=256 -mtune=k8 -mno-3dnow"
DISTDIR="/var/portage/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps y --binpkg-respect-use y"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs ccache distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LC_ALL="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="--quiet -j2"
PKGDIR="/var/portage/packages"
PORTAGE_COMPRESS="bzip2"
PORTAGE_COMPRESS_FLAGS="-9"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/var/portage/tree"
PORTDIR_OVERLAY="/var/lib/layman/foo-overlay"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl amd64 berkdb bzip2 cli cracklib crypt cups cxx dri fontconfig fortran gdbm gpm iconv ipv6 jpeg lcdfilter logrotate mmx modules mudflap multilib ncurses nls nptl opengl openmp pam pcre png pppd readline session sse sse2 ssl tcpd threads unicode vim-syntax xft zlib zsh-completion" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en pl" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON

Comment 1 Jeroen Roovers (RETIRED) gentoo-dev

2012-12-20 17:48:29 UTC

Don't compile on the thin client (that isn't what it's for, and USB access to an encrypted filesystem surely isn't going to help) or set FEATURES=-sandbox.

Comment 2 Jeroen Roovers (RETIRED) gentoo-dev

2012-12-20 17:55:58 UTC

Alternatively, have /var/tmp and /tmp on a separate filesystem with no encryption.

Comment 3 Piotr Karbowski (RETIRED) gentoo-dev

2012-12-20 18:18:56 UTC

You are fracking impossible. You mark this as invalid because thin client is thin client? It has 1GHz 64bit sempron with sse3 support and 1G of ddr2 memory, and with /var/tmp/portage on tmpfs the issue is still there. I just can't believe what you did here.

You didn't even TRIED to understand the issue, just assume that my hardware is slow so you ain't gonna look into this issue.

Comment 4 Piotr Karbowski (RETIRED) gentoo-dev

2012-12-20 18:24:32 UTC

Added QA because something went terribly wrong here.

Ba dum tsss~

Comment 5 Piotr Karbowski (RETIRED) gentoo-dev

2012-12-28 11:12:41 UTC

Also noticable on openvz guests, where getcwd test takes about minute and without sandbox its below 3s.

Comment 6 Piotr Karbowski (RETIRED) gentoo-dev

2013-01-31 14:19:12 UTC

*friendly bump*

Comment 7 Tully Gray 2013-05-10 12:48:06 UTC

Just hit this trying to update app-arch/tar-1.26 (now with xattrs) on a hardened system.  I also have luks encrypted LVM rootfs.  This also happens with coreutils-8.20.

Comment 8 Piotr Karbowski (RETIRED) gentoo-dev

2013-05-18 11:11:45 UTC

New record, 2h 26m for 'checking whether getcwd handles long file names properly...' on findutils-4.5.11.

*full of sorrow bump*

Comment 9 Piotr Karbowski (RETIRED) gentoo-dev

2013-12-23 15:52:09 UTC

Bump.

Now took about 4 minutes to pass conftest on quad core Ivy Bridge-based Xeon with 16 GB of ram.

Comment 10 SpanKY gentoo-dev

2014-03-11 06:22:48 UTC

*** Bug 502048 has been marked as a duplicate of this bug. ***

Comment 11 Piotr Karbowski (RETIRED) gentoo-dev

2014-04-05 14:54:50 UTC

@Vapier, is there any work going on regarding this issue? Or perhaps you have at least informations what may be root cause and you could share it so maybe some random folks can fix it? I have no idea where to start checking libsandbox.

Comment 12 Piotr Karbowski (RETIRED) gentoo-dev

2014-10-30 16:38:28 UTC

*friendly bump*

Comment 13 SpanKY gentoo-dev

2014-10-30 18:11:10 UTC

(In reply to Piotr Karbowski from comment #11)

few people hit it (just three so far?), and i certainly can't reproduce it, so i haven't looked further

Comment 14 Matt Whitlock 2017-04-19 06:05:51 UTC

The underlying problem appears to be that libsandbox's mkdir() scales very poorly.


#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/stat.h>

int main() {
	for (int depth = 0; depth < 1000; ++depth) {
		struct timespec start, end;
		clock_gettime(CLOCK_MONOTONIC, &start);
		if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) {
			break;
		}
		clock_gettime(CLOCK_MONOTONIC, &end);
		fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec);
	}
	return 0;
}


When executing the above program without sandbox, each loop iteration runs in approximately constant time. However, when run under sandbox, the loop iterations get slower and slower, and beyond a certain threshold they become *extremely* slow and continue getting slower.

Comment 15 Sergei Trofimovich (RETIRED) gentoo-dev

2019-01-14 20:21:54 UTC

(In reply to Matt Whitlock from comment #14)
> The underlying problem appears to be that libsandbox's mkdir() scales very
> poorly.
> 
> 
> #include <stdio.h>
> #include <time.h>
> #include <unistd.h>
> #include <sys/stat.h>
> 
> int main() {
> 	for (int depth = 0; depth < 1000; ++depth) {
> 		struct timespec start, end;
> 		clock_gettime(CLOCK_MONOTONIC, &start);
> 		if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) {
> 			break;
> 		}
> 		clock_gettime(CLOCK_MONOTONIC, &end);
> 		fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) *
> 1000000000LL + end.tv_nsec - start.tv_nsec);
> 	}
> 	return 0;
> }
> 
> 
> When executing the above program without sandbox, each loop iteration runs
> in approximately constant time. However, when run under sandbox, the loop
> iterations get slower and slower, and beyond a certain threshold they become
> *extremely* slow and continue getting slower.

A few questions to you and others experiencing the slowdown:
1. What absolute times you are observing running this reproducer with and without sandbox?
2. What filesystem you are running it on?

On tmpfs things are blazingly fast for me. I see very slow creep in application runtime on each iteration, which is expected for ever deepening cwd.

sandbox likely does access() check on every intermediate component for directory operations. That alone should make O(1) into O(n) for this test.

Comment 16 Georgy Yakovlev archtester

2019-01-14 20:49:36 UTC

here are some tests from my work machine:

Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz


tmpfs, average desktop system
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)

rm -rf aaaaaaaa
time ./cwd

real    0m0.004s
user    0m0.002s
sys     0m0.002s

rm -r aaaaaaaa
sandbox time ./cwd

real    0m6.282s
user    0m2.360s
sys     0m3.902s


same machine zfs on nvme drive, dir is /var/tmp (no special performance  sesttings on zfs side)
02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black NVMe SSD
zroot/ROOT/default on / type zfs (rw,relatime,xattr,posixacl)

rm -rf aaaaaaaa
time ./cwd
real    0m0.023s
user    0m0.001s
sys     0m0.022s

rm -rf aaaaaaaa
sandbox time ./cwd

real    0m9.452s
user    0m2.255s
sys     0m7.085s

ext4 default options on nvme drive

/dev/nvme0n1p4 on /mnt type ext4 (rw,relatime)

rm -rf aaaaaaaa
time ./cwd

real    0m0.013s
user    0m0.002s
sys     0m0.011s

rm -rf aaaaaaaa
SANDBOX_WRITE=. sandbox time ./cwd

real    0m6.708s
user    0m2.191s
sys     0m4.402s



SANDBOX_WRITE was needed because it was non-tmp location at /mnt

Comment 17 Sergei Trofimovich (RETIRED) gentoo-dev

2019-02-23 22:55:21 UTC

I think I see something remotely reproducible on
    =sys-apps/man-db-2.7.6.1-r2
package on stable default/linux/amd64/17.0.

It hangs up for about a minute on my machine (Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz).

With FEATURES='-sandbox -usersandbox' it becomes about a second long run.

I use tmpfs for builds but not sure it's what ./configure test hits. Will explore more to check if the reproducer is similar to #comment14.

Comment 18 Tobias Klausmann (RETIRED) gentoo-dev

2019-02-25 08:57:54 UTC

FWIW, my approach has been to add /etc/portage/package.env with this contents:

sys-apps/coreutils getcwd_path_max
app-arch/tar getcwd_path_max

and then /etc/portage/env/getcwd_path_max with this:

export gl_cv_func_getcwd_path_max=yes

That way, configure doesn't run the check but takes the result as given. It's an ugly hack, but it's better than waiting hours for a configure check that will have the same outcome every time.

Comment 19 Fabian Groffen gentoo-dev

2019-02-25 12:17:48 UTC

(In reply to Matt Whitlock from comment #14)
> The underlying problem appears to be that libsandbox's mkdir() scales very
> poorly.
> 
> 
> #include <stdio.h>
> #include <time.h>
> #include <unistd.h>
> #include <sys/stat.h>
> 
> int main() {
> 	for (int depth = 0; depth < 1000; ++depth) {
> 		struct timespec start, end;
> 		clock_gettime(CLOCK_MONOTONIC, &start);
> 		if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) {
> 			break;
> 		}
> 		clock_gettime(CLOCK_MONOTONIC, &end);
> 		fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) *
> 1000000000LL + end.tv_nsec - start.tv_nsec);
> 	}
> 	return 0;
> }
> 
> 
> When executing the above program without sandbox, each loop iteration runs
> in approximately constant time. However, when run under sandbox, the loop
> iterations get slower and slower, and beyond a certain threshold they become
> *extremely* slow and continue getting slower.

I think this is due to realpath, the longer the path, the more resolves it needs to do.  Will check later to see if there is a way to optimise for this.

Comment 20 Mart Raudsepp gentoo-dev

2019-02-25 18:32:34 UTC

The slowness comes from glibc own getcwd calls. But this isn't a problem when running outside sandbox, because, well, the test program doesn't do any getcwd calls. If you add a getcwd call like this:

#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/stat.h>
#include <stdlib.h>

#define SB_PATH_MAX (4096*2)

int main() {
	char *tmpbuf, *buf;
	buf = malloc(SB_PATH_MAX);
	for (int depth = 0; depth < 1000; ++depth) {
		struct timespec start, end;
		clock_gettime(CLOCK_MONOTONIC, &start);
		tmpbuf = getcwd(buf, SB_PATH_MAX);
		clock_gettime(CLOCK_MONOTONIC, &end);
		fprintf(stderr, "%d getcwd: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec);
		clock_gettime(CLOCK_MONOTONIC, &start);
		if (mkdir("aaaaaaaa", 0700) < 0) {
			break;
		}
		clock_gettime(CLOCK_MONOTONIC, &end);
		fprintf(stderr, "%d mkdir: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec);
		if (chdir("aaaaaaaa") < 0) {
			break;
		}
	}
	return 0;
}


then you will also see enormous slowdowns of these raw getcwd calls outside sandbox after some 450 depth (once returned cwd path is longer than ~4096 bytes):

451 getcwd: 8143
451 mkdir: 26785
452 getcwd: 8277
452 mkdir: 27468
453 getcwd: 26447617
453 mkdir: 142374
454 getcwd: 5278355
454 mkdir: 74516
455 getcwd: 3966265
455 mkdir: 62434

And all these slow getcwd calls do succeed until about depth 908, at which point they start giving ERANGE error (but still take this much longer amount of time to complete the call).

Now the reason that sandbox hits this is that the sandbox code ends up calling egetcwd three times (via 3 calls to erealpath), each of which calls sb_unwrapped_getcwd (the actual glibc getcwd function), which all will take a much longer time once the configure/testcase cwd is longer than 4096 bytes. On my computer on ZFS disk that's some 500-1000 times slower per cwd call at that point OUTSIDE sandbox. Inside sandbox the relative slowdown actually seems much smaller due to other things taking time too.

First call is in libsandbox/wrapper-funcs/mkdirat_pre_check.c via canonicalize; I didn't chase down the other two, but probably other sandbox wrapping steps of mkdir or the involved syscalls.

Outside sandbox these long path getcwd calls succeed for deeper depths, but in sandbox all these calls that are taking a long time actually interestingly end up with ENAMETOOLONG. Due to that I peeked into glibc too - getcwd implementation seems to first do a syscall to getcwd, if that returns ENAMETOOLONG, then it calls a generic_getcwd implementation -- this is probably the reason getcwd starts taking a long time outside sandbox as well.
That generic_getcwd implementation (sysdeps/posix/getcwd.c __getcwd) probably does some other libc or syscalls that the sandbox shortcicruits to ENAMETOOLONG or something, but not sure as I would have expected to see an EARLY FAIL debug print, but there isn't any.

I don't quite like that under sandbox it has different error conditions, but I don't think that's unheard of for paths with length 4096..4096*2

Comment 21 Mart Raudsepp gentoo-dev

2019-02-25 18:45:55 UTC

Not sure what we can do at all here. At best I can think of ways to only end up calling slow getcwd only once, not three times, leading up to only a 3x improvement.

Given the mkdir API taking only the subdir name to create under cwd, I don't see how we can do better, because we can't know that the cwd path is longer than 4096 bytes (and shortcircuit it to ENAMETOOLONG directly) before we actually make one call to cwd. Maybe some complicated cwd tracking, shrug...


That said - this all is why the test case in the comments here is slower than expected for me. It doesn't explain why on some system people end up with hours of runtime. Maybe the adjusted testcase that shows slow getcwd even without sandbox could confirm that it's just generic_getcwd being really that slow on their filesystem, or show that isn't the case and we can try to debug these extreme slow cases to hopefully find a different performance bottleneck, which maybe could be fixed without huge work, unlike the one I found.

Comment 22 Georgy Yakovlev archtester

2019-02-26 03:46:14 UTC

some tests with new reproducer

lscpu 
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              6
On-line CPU(s) list: 0-5
Thread(s) per core:  1
Core(s) per socket:  3
Socket(s):           2
Vendor ID:           ARM
Model:               4
Model name:          Cortex-A53
Stepping:            r0p4
CPU max MHz:         1992.0000
CPU min MHz:         408.0000
BogoMIPS:            48.00
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32


filesystem is tmpfs

time ./cwd2

real    0m3.606s
user    0m0.532s
sys     0m3.052s


time sandbox ./cwd2
real    0m14.280s
user    0m2.212s
sys     0m12.036s


with sandbox it slows down after 455 items





zfs

0 getcwd: 18375
0 mkdir: 787496
1 getcwd: 5833
1 mkdir: 594706
2 getcwd: 5542
2 mkdir: 608705
3 getcwd: 6125
3 mkdir: 611914
4 getcwd: 6125
4 mkdir: 715747
5 getcwd: 6125
5 mkdir: 625039
...
450 mkdir: 1328827
451 getcwd: 16916
451 mkdir: 1403202
452 getcwd: 16916
452 mkdir: 1396202
453 getcwd: 17208
453 mkdir: 818705
454 getcwd: 12718943
454 mkdir: 40632486
455 getcwd: 8090506
455 mkdir: 40427446
456 getcwd: 9446458
456 mkdir: 40719403
457 getcwd: 8744420
457 mkdir: 41254025
...
906 getcwd: 16298261
906 mkdir: 65635500
907 getcwd: 16089137
907 mkdir: 65741374
908 getcwd: 16654967
908 mkdir: 65281710
909 getcwd: 16148345

real    0m31.779s
user    0m2.360s
sys     0m29.048s







======================
another board:

Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           2
Vendor ID:           ARM
Model:               3
Model name:          Cortex-A7
Stepping:            r0p3
CPU max MHz:         2000.0000
CPU min MHz:         200.0000
BogoMIPS:            12.00
Flags:               half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae



filesystem is tmpfs


time ./cwd2
0 getcwd: 65249
0 mkdir: 26958
1 getcwd: 3333
1 mkdir: 15875
2 getcwd: 3541
2 mkdir: 14917
3 getcwd: 3334
3 mkdir: 14958
4 getcwd: 3250
4 mkdir: 14250
...
454 getcwd: 15042
454 mkdir: 12750
455 getcwd: 8422895
455 mkdir: 13374
456 getcwd: 8150480
456 mkdir: 12375
457 getcwd: 8606685
457 mkdir: 14375
...

997 getcwd: 17527243
997 mkdir: 14416
998 getcwd: 17504326
998 mkdir: 14167
999 getcwd: 18021489
999 mkdir: 14625


real    0m7.725s
user    0m0.816s
sys     0m6.446s






time sandbox ./cwd2

0 getcwd: 7417
0 mkdir: 221999
1 getcwd: 3333
1 mkdir: 143541
2 getcwd: 2708
2 mkdir: 181665
3 getcwd: 2792
3 mkdir: 147124
4 getcwd: 2667
4 mkdir: 146332
...
453 getcwd: 14875
453 mkdir: 2598897
454 getcwd: 15042
454 mkdir: 1528822
455 getcwd: 8650352
455 mkdir: 43861128
456 getcwd: 8933600
456 mkdir: 45635240
457 getcwd: 9079765
457 mkdir: 45433658
...
907 getcwd: 18199863
907 mkdir: 72035875
908 getcwd: 18475819
908 mkdir: 72654078
909 getcwd: 18007573
909 mkdir: 71803626
910 getcwd: 18235988

real    0m33.329s
user    0m3.937s
sys     0m27.426s



on btrfs everything looks sad

0 getcwd: 8000
0 mkdir: 389789
1 getcwd: 4375
1 mkdir: 350456
2 getcwd: 3333
2 mkdir: 274998
3 getcwd: 3500
3 mkdir: 275165
4 getcwd: 3458
4 mkdir: 310873
...
450 mkdir: 1855695
451 getcwd: 13874
451 mkdir: 2154317
452 getcwd: 15375
452 mkdir: 2603647
453 getcwd: 15249
453 mkdir: 1619862
454 getcwd: 15856339
454 mkdir: 78100537
455 getcwd: 15748589
455 mkdir: 77571291
456 getcwd: 15466092
456 mkdir: 77818997
457 getcwd: 15636715
457 mkdir: 77616957
...

906 getcwd: 35287901
906 mkdir: 143502129
907 getcwd: 35585899
907 mkdir: 145845945
908 getcwd: 35813689
908 mkdir: 145026201
909 getcwd: 35690648

real    1m2.865s
user    0m4.443s
sys     0m57.154s

Comment 23 Sergei Trofimovich (RETIRED) gentoo-dev

2019-03-16 22:34:25 UTC

fun fact: sandbox's own 'make check' runs getcwd test from gnulib.
~/dev/git/sandbox $ time tests/getcwd-gnulib_tst

real    0m0,138s
user    0m0,003s
sys     0m0,133s
[sf] ~/dev/git/sandbox $ time LD_PRELOAD=libsandbox.so tests/getcwd-gnulib_tst

real    0m4,714s
user    0m0,231s
sys     0m4,450s

`perf record` claims most of time is spent in audit kernel code:

# perf report | cat
    39.00%  getcwd-gnulib_t  [kernel.vmlinux]   [k] __audit_inode
     2.77%  getcwd-gnulib_t  [kernel.vmlinux]   [k] generic_bin_search.constprop.16
     2.37%  getcwd-gnulib_t  [kernel.vmlinux]   [k] __radix_tree_lookup
     1.24%  getcwd-gnulib_t  [kernel.vmlinux]   [k] _raw_spin_lock
     1.19%  getcwd-gnulib_t  [kernel.vmlinux]   [k] __memcpy

__audit_inode() is a quite complicate function. I suspect sandbox changes syscall pattern enough to break audit cache.

Can you could check if audit is enabled for you?
And can you try kernel configuration with CONFIG_AUDIT disabled and see if it helps?

Comment 24 Matt Whitlock 2019-03-18 21:02:09 UTC

(In reply to Sergei Trofimovich from comment #23)
> Can you could check if audit is enabled for you?
> And can you try kernel configuration with CONFIG_AUDIT disabled and see if
> it helps?

I've never enabled kernel auditing.

$ zgrep AUDIT /proc/config.gz
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDIT_ARCH=y

Comment 25 Sergei Trofimovich (RETIRED) gentoo-dev

2019-12-25 21:47:39 UTC

If I read the comments correctly nobody experiences more than a minute delay on a synthetic test. I would declare it good enough.

Original post implies that task is CPU-bound on luks/lvm and makes more than a minute. 

If someone still has such a bad behaviour it would be nice to get a dev-util/perf trace of such a system:

In one terminal run a workload:
    $ sudo emerge -v1 cpio
In another run a sampler for 100 seconds (or how logs it usually takes) when getcwd gets stuck and grab the stats:
    $ sudo perf record -a sleep 100
    $ sudo perf report | head -n 20

The output usually looks like:
"""
# Overhead  Command          Shared Object             Symbol
    11.96%  swapper          [kernel.vmlinux]          [k] intel_idle
     1.63%  emerge           libpython3.6m.so.1.0      [.] _PyEval_EvalFrameDefault
     1.01%  emerge           libpython3.6m.so.1.0      [.] sre_ucs1_match
     0.92%  as               ld-2.30.so                [.] _dl_relocate_object
     0.89%  cc1              cc1                       [.] _cpp_lex_direct
     0.85%  as               [kernel.vmlinux]          [k] copy_page
     0.71%  emerge           libpython3.6m.so.1.0      [.] collect
     0.48%  x86_64-pc-linux  [kernel.vmlinux]          [k] do_syscall_64
     0.45%  cc1              cc1                       [.] rtx_cost
"""

Here my system was busy doing nothing most of the time ('intel_idle'). Your's should show where real overhead comes from.

Comment 26 Michael 'veremitz' Everitt 2019-12-26 16:20:52 UTC Comment hidden (obsolete)

The main problem is on slower arches with eg. SDIO 'disks'/storage where you have none of the optimisations of fast clocks or storage, eg. ARM systems.

This won't, of course, manifest itself on an ARM server box, eg. Cavium Thunders, but will on any SBC in the land... R-pi, Beaglebone, Cubie, etc, etc.

Of course, you can cross-compile to an extent, but I suspect qemu-arm will probably exhibit a similar issue with correct CPU emulations...

Comment 27 Sergei Trofimovich (RETIRED) gentoo-dev

2019-12-26 18:26:20 UTC Comment hidden (obsolete)

(In reply to Michael 'veremitz' Everitt from comment #26)
> The main problem is on slower arches with eg. SDIO 'disks'/storage where you
> have none of the optimisations of fast clocks or storage, eg. ARM systems.
> 
> This won't, of course, manifest itself on an ARM server box, eg. Cavium
> Thunders, but will on any SBC in the land... R-pi, Beaglebone, Cubie, etc,
> etc.
> 
> Of course, you can cross-compile to an extent, but I suspect qemu-arm will
> probably exhibit a similar issue with correct CPU emulations...

Slow storage does not manifest as 100% CPU usage. I'd prefer looking at specific numbers than speculations here.

Comment 28 Michael 'veremitz' Everitt 2019-12-26 19:11:53 UTC Comment hidden (obsolete)

(In reply to Sergei Trofimovich from comment #27)
> 
> Slow storage does not manifest as 100% CPU usage. I'd prefer looking at
> specific numbers than speculations here.

Noted, as soon as I have a reproducer, I will run the tools as suggested.

Comment 29 Sergei Trofimovich (RETIRED) gentoo-dev

2021-04-28 18:48:28 UTC

I wonder if https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=b393720bc6e87d16c5013839553c894f2b163961 fixed it by accident (given that it's not reproducible for everyone).

Comment 32 Larry the Git Cow gentoo-dev

2024-01-23 15:14:19 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=39e7dae5ae53f438bb94c464b05aba169938d9fb

commit 39e7dae5ae53f438bb94c464b05aba169938d9fb
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-01-22 16:11:52 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-01-23 15:13:47 +0000

    profiles: workaround sandbox bug with getcwd() configure test (gl_cv_func_getcwd_path_max)
    
    Workaround for sandbox bug which causes this gnulib configure test to take
    many real hours on slower machines, and certainly a huge amount of CPU hours
    on others.
    
    Spoof the same result as configure gets on a modern glibc & musl system for now.
    
    Bug: https://bugs.gentoo.org/447970
    Closes: https://bugs.gentoo.org/922652
    Signed-off-by: Sam James <sam@gentoo.org>

 profiles/default/linux/make.defaults | 9 ++++++++-
 profiles/features/musl/make.defaults | 7 +++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comment 33 Sam James archtester

2024-04-02 08:22:34 UTC

This came up recently on bug-gnulib at https://lists.gnu.org/archive/html/bug-gnulib/2024-03/msg00444.html which led to https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=d96e86fe0a2a6b84f94f94b81aa3edaefc5edc6e.

Comment 34 Arsen Arsenović gentoo-dev

2024-04-02 10:47:15 UTC

that seems not great.. given that sandbox isn't representative of the system