As example I will use app-arch/cpio-2.11-r1, with sandbox enabled (the portage's default) the configure script hangs at 'checking whether getcwd handles long file names properly.' for a long time, in my case it take more than 2 hours, and the underlaying conftest process take 100% cpu time the whole time. with FEATURES="-sandbox" it does do the test in seconds. I can reproduce it on only one device, a thinclient with luks-encrypted lvm-based rootfs on usb 2.0 disk with a 1GHz 64bit mobile sempron and 1G of ddr2 memory. (hp t5730). The issue is 100% reproducable and happens on cpio every time, but does not on others packages (I don't have list of it but it used to happen on some of others too). It also has been reported back in 2005 on forums gentoo https://forums.gentoo.org/viewtopic-p-2454506.html Reproducible: Always ### emerge info: Portage 2.2.0_alpha149 (default/linux/amd64/10.0, gcc-4.5.4, glibc-2.15-r3, 3.6.3-frontier x86_64) ================================================================= System uname: Linux-3.6.3-frontier-x86_64-Mobile_AMD_Sempron-tm-_Processor_2100+-with-gentoo-2.2 Timestamp of tree: Thu, 20 Dec 2012 16:00:01 +0000 ld GNU ld (GNU Binutils) 2.22 ccache version 3.1.8 [enabled] app-shells/bash: 4.2_p39 dev-lang/python: 2.7.3-r2, 3.2.3-r1 dev-util/ccache: 3.1.8 dev-util/cmake: 2.8.10.2 dev-util/pkgconfig: 0.27.1 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.6 sys-devel/autoconf: 2.13, 2.69 sys-devel/automake: 1.9.6-r3, 1.12.5 sys-devel/binutils: 2.22-r1 sys-devel/gcc: 4.5.4 sys-devel/gcc-config: 1.8 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.6 (virtual/os-headers) sys-libs/glibc: 2.15-r3 Repositories: gentoo foo-overlay ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -fno-var-tracking -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=256 -mtune=k8 -mno-3dnow" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.4/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe -fno-var-tracking -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=256 -mtune=k8 -mno-3dnow" DISTDIR="/var/portage/distfiles" EMERGE_DEFAULT_OPTS="--with-bdeps y --binpkg-respect-use y" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs ccache distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LC_ALL="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="--quiet -j2" PKGDIR="/var/portage/packages" PORTAGE_COMPRESS="bzip2" PORTAGE_COMPRESS_FLAGS="-9" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/var/portage/tree" PORTDIR_OVERLAY="/var/lib/layman/foo-overlay" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="acl amd64 berkdb bzip2 cli cracklib crypt cups cxx dri fontconfig fortran gdbm gpm iconv ipv6 jpeg lcdfilter logrotate mmx modules mudflap multilib ncurses nls nptl opengl openmp pam pcre png pppd readline session sse sse2 ssl tcpd threads unicode vim-syntax xft zlib zsh-completion" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en pl" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, INSTALL_MASK, LANG, PORTAGE_BUNZIP2_COMMAND, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Don't compile on the thin client (that isn't what it's for, and USB access to an encrypted filesystem surely isn't going to help) or set FEATURES=-sandbox.
Alternatively, have /var/tmp and /tmp on a separate filesystem with no encryption.
You are fracking impossible. You mark this as invalid because thin client is thin client? It has 1GHz 64bit sempron with sse3 support and 1G of ddr2 memory, and with /var/tmp/portage on tmpfs the issue is still there. I just can't believe what you did here. You didn't even TRIED to understand the issue, just assume that my hardware is slow so you ain't gonna look into this issue.
Added QA because something went terribly wrong here. Ba dum tsss~
Also noticable on openvz guests, where getcwd test takes about minute and without sandbox its below 3s.
*friendly bump*
Just hit this trying to update app-arch/tar-1.26 (now with xattrs) on a hardened system. I also have luks encrypted LVM rootfs. This also happens with coreutils-8.20.
New record, 2h 26m for 'checking whether getcwd handles long file names properly...' on findutils-4.5.11. *full of sorrow bump*
Bump. Now took about 4 minutes to pass conftest on quad core Ivy Bridge-based Xeon with 16 GB of ram.
*** Bug 502048 has been marked as a duplicate of this bug. ***
@Vapier, is there any work going on regarding this issue? Or perhaps you have at least informations what may be root cause and you could share it so maybe some random folks can fix it? I have no idea where to start checking libsandbox.
(In reply to Piotr Karbowski from comment #11) few people hit it (just three so far?), and i certainly can't reproduce it, so i haven't looked further
The underlying problem appears to be that libsandbox's mkdir() scales very poorly. #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/stat.h> int main() { for (int depth = 0; depth < 1000; ++depth) { struct timespec start, end; clock_gettime(CLOCK_MONOTONIC, &start); if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) { break; } clock_gettime(CLOCK_MONOTONIC, &end); fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec); } return 0; } When executing the above program without sandbox, each loop iteration runs in approximately constant time. However, when run under sandbox, the loop iterations get slower and slower, and beyond a certain threshold they become *extremely* slow and continue getting slower.
(In reply to Matt Whitlock from comment #14) > The underlying problem appears to be that libsandbox's mkdir() scales very > poorly. > > > #include <stdio.h> > #include <time.h> > #include <unistd.h> > #include <sys/stat.h> > > int main() { > for (int depth = 0; depth < 1000; ++depth) { > struct timespec start, end; > clock_gettime(CLOCK_MONOTONIC, &start); > if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) { > break; > } > clock_gettime(CLOCK_MONOTONIC, &end); > fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) * > 1000000000LL + end.tv_nsec - start.tv_nsec); > } > return 0; > } > > > When executing the above program without sandbox, each loop iteration runs > in approximately constant time. However, when run under sandbox, the loop > iterations get slower and slower, and beyond a certain threshold they become > *extremely* slow and continue getting slower. A few questions to you and others experiencing the slowdown: 1. What absolute times you are observing running this reproducer with and without sandbox? 2. What filesystem you are running it on? On tmpfs things are blazingly fast for me. I see very slow creep in application runtime on each iteration, which is expected for ever deepening cwd. sandbox likely does access() check on every intermediate component for directory operations. That alone should make O(1) into O(n) for this test.
here are some tests from my work machine: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz tmpfs, average desktop system tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime) rm -rf aaaaaaaa time ./cwd real 0m0.004s user 0m0.002s sys 0m0.002s rm -r aaaaaaaa sandbox time ./cwd real 0m6.282s user 0m2.360s sys 0m3.902s same machine zfs on nvme drive, dir is /var/tmp (no special performance sesttings on zfs side) 02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black NVMe SSD zroot/ROOT/default on / type zfs (rw,relatime,xattr,posixacl) rm -rf aaaaaaaa time ./cwd real 0m0.023s user 0m0.001s sys 0m0.022s rm -rf aaaaaaaa sandbox time ./cwd real 0m9.452s user 0m2.255s sys 0m7.085s ext4 default options on nvme drive /dev/nvme0n1p4 on /mnt type ext4 (rw,relatime) rm -rf aaaaaaaa time ./cwd real 0m0.013s user 0m0.002s sys 0m0.011s rm -rf aaaaaaaa SANDBOX_WRITE=. sandbox time ./cwd real 0m6.708s user 0m2.191s sys 0m4.402s SANDBOX_WRITE was needed because it was non-tmp location at /mnt
I think I see something remotely reproducible on =sys-apps/man-db-2.7.6.1-r2 package on stable default/linux/amd64/17.0. It hangs up for about a minute on my machine (Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz). With FEATURES='-sandbox -usersandbox' it becomes about a second long run. I use tmpfs for builds but not sure it's what ./configure test hits. Will explore more to check if the reproducer is similar to #comment14.
FWIW, my approach has been to add /etc/portage/package.env with this contents: sys-apps/coreutils getcwd_path_max app-arch/tar getcwd_path_max and then /etc/portage/env/getcwd_path_max with this: export gl_cv_func_getcwd_path_max=yes That way, configure doesn't run the check but takes the result as given. It's an ugly hack, but it's better than waiting hours for a configure check that will have the same outcome every time.
(In reply to Matt Whitlock from comment #14) > The underlying problem appears to be that libsandbox's mkdir() scales very > poorly. > > > #include <stdio.h> > #include <time.h> > #include <unistd.h> > #include <sys/stat.h> > > int main() { > for (int depth = 0; depth < 1000; ++depth) { > struct timespec start, end; > clock_gettime(CLOCK_MONOTONIC, &start); > if (mkdir("aaaaaaaa", 0700) < 0 || chdir("aaaaaaaa") < 0) { > break; > } > clock_gettime(CLOCK_MONOTONIC, &end); > fprintf(stderr, "%d: %lld\n", depth, (end.tv_sec - start.tv_sec) * > 1000000000LL + end.tv_nsec - start.tv_nsec); > } > return 0; > } > > > When executing the above program without sandbox, each loop iteration runs > in approximately constant time. However, when run under sandbox, the loop > iterations get slower and slower, and beyond a certain threshold they become > *extremely* slow and continue getting slower. I think this is due to realpath, the longer the path, the more resolves it needs to do. Will check later to see if there is a way to optimise for this.
The slowness comes from glibc own getcwd calls. But this isn't a problem when running outside sandbox, because, well, the test program doesn't do any getcwd calls. If you add a getcwd call like this: #include <stdio.h> #include <time.h> #include <unistd.h> #include <sys/stat.h> #include <stdlib.h> #define SB_PATH_MAX (4096*2) int main() { char *tmpbuf, *buf; buf = malloc(SB_PATH_MAX); for (int depth = 0; depth < 1000; ++depth) { struct timespec start, end; clock_gettime(CLOCK_MONOTONIC, &start); tmpbuf = getcwd(buf, SB_PATH_MAX); clock_gettime(CLOCK_MONOTONIC, &end); fprintf(stderr, "%d getcwd: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec); clock_gettime(CLOCK_MONOTONIC, &start); if (mkdir("aaaaaaaa", 0700) < 0) { break; } clock_gettime(CLOCK_MONOTONIC, &end); fprintf(stderr, "%d mkdir: %lld\n", depth, (end.tv_sec - start.tv_sec) * 1000000000LL + end.tv_nsec - start.tv_nsec); if (chdir("aaaaaaaa") < 0) { break; } } return 0; } then you will also see enormous slowdowns of these raw getcwd calls outside sandbox after some 450 depth (once returned cwd path is longer than ~4096 bytes): 451 getcwd: 8143 451 mkdir: 26785 452 getcwd: 8277 452 mkdir: 27468 453 getcwd: 26447617 453 mkdir: 142374 454 getcwd: 5278355 454 mkdir: 74516 455 getcwd: 3966265 455 mkdir: 62434 And all these slow getcwd calls do succeed until about depth 908, at which point they start giving ERANGE error (but still take this much longer amount of time to complete the call). Now the reason that sandbox hits this is that the sandbox code ends up calling egetcwd three times (via 3 calls to erealpath), each of which calls sb_unwrapped_getcwd (the actual glibc getcwd function), which all will take a much longer time once the configure/testcase cwd is longer than 4096 bytes. On my computer on ZFS disk that's some 500-1000 times slower per cwd call at that point OUTSIDE sandbox. Inside sandbox the relative slowdown actually seems much smaller due to other things taking time too. First call is in libsandbox/wrapper-funcs/mkdirat_pre_check.c via canonicalize; I didn't chase down the other two, but probably other sandbox wrapping steps of mkdir or the involved syscalls. Outside sandbox these long path getcwd calls succeed for deeper depths, but in sandbox all these calls that are taking a long time actually interestingly end up with ENAMETOOLONG. Due to that I peeked into glibc too - getcwd implementation seems to first do a syscall to getcwd, if that returns ENAMETOOLONG, then it calls a generic_getcwd implementation -- this is probably the reason getcwd starts taking a long time outside sandbox as well. That generic_getcwd implementation (sysdeps/posix/getcwd.c __getcwd) probably does some other libc or syscalls that the sandbox shortcicruits to ENAMETOOLONG or something, but not sure as I would have expected to see an EARLY FAIL debug print, but there isn't any. I don't quite like that under sandbox it has different error conditions, but I don't think that's unheard of for paths with length 4096..4096*2
Not sure what we can do at all here. At best I can think of ways to only end up calling slow getcwd only once, not three times, leading up to only a 3x improvement. Given the mkdir API taking only the subdir name to create under cwd, I don't see how we can do better, because we can't know that the cwd path is longer than 4096 bytes (and shortcircuit it to ENAMETOOLONG directly) before we actually make one call to cwd. Maybe some complicated cwd tracking, shrug... That said - this all is why the test case in the comments here is slower than expected for me. It doesn't explain why on some system people end up with hours of runtime. Maybe the adjusted testcase that shows slow getcwd even without sandbox could confirm that it's just generic_getcwd being really that slow on their filesystem, or show that isn't the case and we can try to debug these extreme slow cases to hopefully find a different performance bottleneck, which maybe could be fixed without huge work, unlike the one I found.
some tests with new reproducer lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 3 Socket(s): 2 Vendor ID: ARM Model: 4 Model name: Cortex-A53 Stepping: r0p4 CPU max MHz: 1992.0000 CPU min MHz: 408.0000 BogoMIPS: 48.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 filesystem is tmpfs time ./cwd2 real 0m3.606s user 0m0.532s sys 0m3.052s time sandbox ./cwd2 real 0m14.280s user 0m2.212s sys 0m12.036s with sandbox it slows down after 455 items zfs 0 getcwd: 18375 0 mkdir: 787496 1 getcwd: 5833 1 mkdir: 594706 2 getcwd: 5542 2 mkdir: 608705 3 getcwd: 6125 3 mkdir: 611914 4 getcwd: 6125 4 mkdir: 715747 5 getcwd: 6125 5 mkdir: 625039 ... 450 mkdir: 1328827 451 getcwd: 16916 451 mkdir: 1403202 452 getcwd: 16916 452 mkdir: 1396202 453 getcwd: 17208 453 mkdir: 818705 454 getcwd: 12718943 454 mkdir: 40632486 455 getcwd: 8090506 455 mkdir: 40427446 456 getcwd: 9446458 456 mkdir: 40719403 457 getcwd: 8744420 457 mkdir: 41254025 ... 906 getcwd: 16298261 906 mkdir: 65635500 907 getcwd: 16089137 907 mkdir: 65741374 908 getcwd: 16654967 908 mkdir: 65281710 909 getcwd: 16148345 real 0m31.779s user 0m2.360s sys 0m29.048s ====================== another board: Architecture: armv7l Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 2 Vendor ID: ARM Model: 3 Model name: Cortex-A7 Stepping: r0p3 CPU max MHz: 2000.0000 CPU min MHz: 200.0000 BogoMIPS: 12.00 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae filesystem is tmpfs time ./cwd2 0 getcwd: 65249 0 mkdir: 26958 1 getcwd: 3333 1 mkdir: 15875 2 getcwd: 3541 2 mkdir: 14917 3 getcwd: 3334 3 mkdir: 14958 4 getcwd: 3250 4 mkdir: 14250 ... 454 getcwd: 15042 454 mkdir: 12750 455 getcwd: 8422895 455 mkdir: 13374 456 getcwd: 8150480 456 mkdir: 12375 457 getcwd: 8606685 457 mkdir: 14375 ... 997 getcwd: 17527243 997 mkdir: 14416 998 getcwd: 17504326 998 mkdir: 14167 999 getcwd: 18021489 999 mkdir: 14625 real 0m7.725s user 0m0.816s sys 0m6.446s time sandbox ./cwd2 0 getcwd: 7417 0 mkdir: 221999 1 getcwd: 3333 1 mkdir: 143541 2 getcwd: 2708 2 mkdir: 181665 3 getcwd: 2792 3 mkdir: 147124 4 getcwd: 2667 4 mkdir: 146332 ... 453 getcwd: 14875 453 mkdir: 2598897 454 getcwd: 15042 454 mkdir: 1528822 455 getcwd: 8650352 455 mkdir: 43861128 456 getcwd: 8933600 456 mkdir: 45635240 457 getcwd: 9079765 457 mkdir: 45433658 ... 907 getcwd: 18199863 907 mkdir: 72035875 908 getcwd: 18475819 908 mkdir: 72654078 909 getcwd: 18007573 909 mkdir: 71803626 910 getcwd: 18235988 real 0m33.329s user 0m3.937s sys 0m27.426s on btrfs everything looks sad 0 getcwd: 8000 0 mkdir: 389789 1 getcwd: 4375 1 mkdir: 350456 2 getcwd: 3333 2 mkdir: 274998 3 getcwd: 3500 3 mkdir: 275165 4 getcwd: 3458 4 mkdir: 310873 ... 450 mkdir: 1855695 451 getcwd: 13874 451 mkdir: 2154317 452 getcwd: 15375 452 mkdir: 2603647 453 getcwd: 15249 453 mkdir: 1619862 454 getcwd: 15856339 454 mkdir: 78100537 455 getcwd: 15748589 455 mkdir: 77571291 456 getcwd: 15466092 456 mkdir: 77818997 457 getcwd: 15636715 457 mkdir: 77616957 ... 906 getcwd: 35287901 906 mkdir: 143502129 907 getcwd: 35585899 907 mkdir: 145845945 908 getcwd: 35813689 908 mkdir: 145026201 909 getcwd: 35690648 real 1m2.865s user 0m4.443s sys 0m57.154s
fun fact: sandbox's own 'make check' runs getcwd test from gnulib. ~/dev/git/sandbox $ time tests/getcwd-gnulib_tst real 0m0,138s user 0m0,003s sys 0m0,133s [sf] ~/dev/git/sandbox $ time LD_PRELOAD=libsandbox.so tests/getcwd-gnulib_tst real 0m4,714s user 0m0,231s sys 0m4,450s `perf record` claims most of time is spent in audit kernel code: # perf report | cat 39.00% getcwd-gnulib_t [kernel.vmlinux] [k] __audit_inode 2.77% getcwd-gnulib_t [kernel.vmlinux] [k] generic_bin_search.constprop.16 2.37% getcwd-gnulib_t [kernel.vmlinux] [k] __radix_tree_lookup 1.24% getcwd-gnulib_t [kernel.vmlinux] [k] _raw_spin_lock 1.19% getcwd-gnulib_t [kernel.vmlinux] [k] __memcpy __audit_inode() is a quite complicate function. I suspect sandbox changes syscall pattern enough to break audit cache. Can you could check if audit is enabled for you? And can you try kernel configuration with CONFIG_AUDIT disabled and see if it helps?
(In reply to Sergei Trofimovich from comment #23) > Can you could check if audit is enabled for you? > And can you try kernel configuration with CONFIG_AUDIT disabled and see if > it helps? I've never enabled kernel auditing. $ zgrep AUDIT /proc/config.gz # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDIT_ARCH=y
If I read the comments correctly nobody experiences more than a minute delay on a synthetic test. I would declare it good enough. Original post implies that task is CPU-bound on luks/lvm and makes more than a minute. If someone still has such a bad behaviour it would be nice to get a dev-util/perf trace of such a system: In one terminal run a workload: $ sudo emerge -v1 cpio In another run a sampler for 100 seconds (or how logs it usually takes) when getcwd gets stuck and grab the stats: $ sudo perf record -a sleep 100 $ sudo perf report | head -n 20 The output usually looks like: """ # Overhead Command Shared Object Symbol 11.96% swapper [kernel.vmlinux] [k] intel_idle 1.63% emerge libpython3.6m.so.1.0 [.] _PyEval_EvalFrameDefault 1.01% emerge libpython3.6m.so.1.0 [.] sre_ucs1_match 0.92% as ld-2.30.so [.] _dl_relocate_object 0.89% cc1 cc1 [.] _cpp_lex_direct 0.85% as [kernel.vmlinux] [k] copy_page 0.71% emerge libpython3.6m.so.1.0 [.] collect 0.48% x86_64-pc-linux [kernel.vmlinux] [k] do_syscall_64 0.45% cc1 cc1 [.] rtx_cost """ Here my system was busy doing nothing most of the time ('intel_idle'). Your's should show where real overhead comes from.
The main problem is on slower arches with eg. SDIO 'disks'/storage where you have none of the optimisations of fast clocks or storage, eg. ARM systems. This won't, of course, manifest itself on an ARM server box, eg. Cavium Thunders, but will on any SBC in the land... R-pi, Beaglebone, Cubie, etc, etc. Of course, you can cross-compile to an extent, but I suspect qemu-arm will probably exhibit a similar issue with correct CPU emulations...
(In reply to Michael 'veremitz' Everitt from comment #26) > The main problem is on slower arches with eg. SDIO 'disks'/storage where you > have none of the optimisations of fast clocks or storage, eg. ARM systems. > > This won't, of course, manifest itself on an ARM server box, eg. Cavium > Thunders, but will on any SBC in the land... R-pi, Beaglebone, Cubie, etc, > etc. > > Of course, you can cross-compile to an extent, but I suspect qemu-arm will > probably exhibit a similar issue with correct CPU emulations... Slow storage does not manifest as 100% CPU usage. I'd prefer looking at specific numbers than speculations here.
(In reply to Sergei Trofimovich from comment #27) > > Slow storage does not manifest as 100% CPU usage. I'd prefer looking at > specific numbers than speculations here. Noted, as soon as I have a reproducer, I will run the tools as suggested.
I wonder if https://gitweb.gentoo.org/proj/sandbox.git/commit/?id=b393720bc6e87d16c5013839553c894f2b163961 fixed it by accident (given that it's not reproducible for everyone).
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=39e7dae5ae53f438bb94c464b05aba169938d9fb commit 39e7dae5ae53f438bb94c464b05aba169938d9fb Author: Sam James <sam@gentoo.org> AuthorDate: 2024-01-22 16:11:52 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-01-23 15:13:47 +0000 profiles: workaround sandbox bug with getcwd() configure test (gl_cv_func_getcwd_path_max) Workaround for sandbox bug which causes this gnulib configure test to take many real hours on slower machines, and certainly a huge amount of CPU hours on others. Spoof the same result as configure gets on a modern glibc & musl system for now. Bug: https://bugs.gentoo.org/447970 Closes: https://bugs.gentoo.org/922652 Signed-off-by: Sam James <sam@gentoo.org> profiles/default/linux/make.defaults | 9 ++++++++- profiles/features/musl/make.defaults | 7 +++++++ 2 files changed, 15 insertions(+), 1 deletion(-)
This came up recently on bug-gnulib at https://lists.gnu.org/archive/html/bug-gnulib/2024-03/msg00444.html which led to https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=d96e86fe0a2a6b84f94f94b81aa3edaefc5edc6e.
that seems not great.. given that sandbox isn't representative of the system