Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 171809 - Calculating dependencies is very slow - useless file accesses ?
Summary: Calculating dependencies is very slow - useless file accesses ?
Status: RESOLVED NEEDINFO
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core - Interface (emerge) (show other bugs)
Hardware: x86 Linux
: High normal with 1 vote (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks: 835380
  Show dependency tree
 
Reported: 2007-03-22 15:37 UTC by Michael Scondo
Modified: 2023-05-26 13:58 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
catch the 'OSError: [Errno 2] No such file or directory' exception (enoent.patch,576 bytes, patch)
2007-03-23 00:23 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Scondo 2007-03-22 15:37:50 UTC
Emerging anything is very slow on my system.

e.g. :
time emerge -vp joe

These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild  N    ] app-editors/joe-3.5  USE="-xterm" 600 kB
Total: 1 package (1 new), Size of downloads: 600 kB
real    1m2.731s
user    0m5.160s
sys     0m0.944s

I tried to figure out why and used a separate 500MB partittion for /usr/portage, 
however it didn't speed the process up.

Here is the further examination I did:

strace -e type=file emerge -vp kmtrace 2> log
log ha a size of 2.2Mb and 21753 lines
meaning that there are nearly 21753 file operations ( stat,access, open )

grep "No such file or directory" log | wc -l
8586
meaning that portage tries 8586 times to access files which don't exist !

grep "^stat" log | wc -l
8648

grep "^access" log | wc -l
8770

grep "^open" log | wc -l
4226


I'd really like to say this is a bug.
Please, could someone perhaps explain why there are so many file accesses ?

Michael

Reproducible: Always
Comment 1 Michael Scondo 2007-03-22 15:43:28 UTC
# emerge --info
Portage 2.1.2-r9 (default-linux/x86/2006.1, gcc-4.1.1, glibc-2.4-r4, 2.6.18-32-splash i686)
=================================================================
System uname: 2.6.18-32-splash i686 AMD Turion(tm) 64 X2 Mobile Technology TL-60
Gentoo Base System version 1.12.6
Timestamp of tree: Tue, 20 Mar 2007 15:50:01 +0000
dev-java/java-config: 1.3.7, 2.0.30
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.60
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -pipe -msse -msse2 -msse3 -march=athlon64"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/splash /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -pipe -msse -msse2 -msse3 -march=athlon64"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.intergenia.de http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LDFLAGS="-Wl,-O1"
LINGUAS="de"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/gentoo-de /usr/portage/local/layman/sunrise"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="3dnowext X aac acpi akode alsa amr apache2 arts asf audiofile bash-completion berkdb bitmap-fonts bluetooth bzip2 calendar cdb cdparanoia cdr chroot cli cpudetection cracklib crypt cscope cups dbus de dga dio divx doc dri dv dvb dvd dvdr dvdread encode examples expat extraicons fam fastcgi fbcon fdftk ffmpeg fftw filter firefox flac foomaticdb fortran ftp gcj gd gdbm geoip german gif gimp ginac glut gmedia gnome-print gnuplot gphoto2 gpm graphicsmagick graphiz gstreamer gtk gtk2 gtkhtml gzip hal hddtemp iconv icq idn imagemagick imap imlib innodb iodbc ipv6 isdnlog ithreads java javascript jbig jikes jpeg jpeg2k kde kdecards kdeenablefinal kdepim kdgraphics kdm kexi ladspa lash latex lcms ldap libg++ libgda libwww lirc live lm_sensors lua mad maildir mbox mcal md5sum memlimit mhash midi mikmod milter mmap mmx mng mozcalendar mozdom mozilla mozsha1 mozsvg mp3 mpeg mplayer mpm-worker musicbrainz mysql mysqli ncurses nls nptl nptlonly nsplugin objc odbc offensive ofx ogg openal opengl osc oscar pam pcmcia pcre pda pdf perl pg-hier pg-intdatetime pg-vacuumdelay php plotutils png portaudio posix postfix postgres ppds pppd python qt3 quicktime readline real realmedia reflection reiserfs samba sasl scanner sdl sensord session sharedext shorten smartcard smtp snmp soap sockets speex spell spl sse sse2 ssl startup-notification svg svga szip tcl tcltk tcpd theora threads tidy tiff tk tokenizer truetype truetype-fonts type1 type1-fonts udev unicode usb vcd vhosts videos vim vim-pager vim-syntax vim-with-x visualization vorbis wddx wifi win32codecs wmf wmp wxwindows x86 xanim xine xinerama xml xmlrpc xorg xosd xpm xscreensaver xv xvid zeroconf zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="radeon"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2007-03-22 16:16:26 UTC
Assuming you use the rsync'd tree and you don't have say...an anoncvs checkout overlayed over your main tree you can try;

'emerge --metadata' and then try your emerge operations again.

Portage uses (by default) a flat file cache to speed up metadata queries.  If the cache is stale (or non-existent) by default portage will spawn a bash process and source every ebuild in the deptree (and every eclass that ebuild inherits, and so forth).  This is very slow (as you noticed).

Please re-open the bug if emerge --metadata does not improve the speed of portage.

Thanks.
Comment 3 Zac Medico gentoo-dev 2007-03-22 16:38:48 UTC
If emerge --metadata doesn't help then you are probably experiencing some variation of bug 124041.
Comment 4 Michael Scondo 2007-03-22 22:21:18 UTC
Thanks for the explanation !

Before I started fiddling around, I have already tried emerge --metadata with no success.
I also tried emerge --regen.

Somewhere I read that adding 
portdbapi.auxdbmodule = cache.metadata_overlay.database

in /etc/portage/modules could help, I followed the instructions and deleted /var/cache/edb/dep,
after this I did emerge --sync
emerge --metadata.

I tried again emerge --regen and got this error:

# emerge --regen
Regenerating cache entries...
Traceback (most recent call last):
  File "/usr/bin/emerge", line 5481, in ?
    retval = emerge_main()
  File "/usr/bin/emerge", line 5443, in emerge_main
    action_regen(settings, portdb)
  File "/usr/bin/emerge", line 4283, in action_regen
    dead_nodes[mytree] = set(portdb.auxdb[mytree].iterkeys())
  File "/usr/lib/portage/pym/cache/metadata_overlay.py", line 85, in iterkeys
    for cpv in self.db_ro.iterkeys():
  File "/usr/lib/portage/pym/cache/flat_hash.py", line 110, in iterkeys
    for l in os.listdir(dirs[0]):
OSError: [Errno 2] No such file or directory: '/usr/local/portage/metadata/cache'

I removed /etc/portage/modules again and did a sync, however the above error shows still up.

I also noticed that the directory /var/cache/edb/dep/usr/portage has only one folder named local now, (I backed up the old dep dir, I can see there were some more files in it)

Could it possibly be related to this ?
e.g.
~/tmp $ strace -e trace=file cp d e
execve("/usr/bin/cp", ["cp", "d", "e"], [/* 83 vars */]) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/tls/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/tls/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/xorg/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/tls/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/tls/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/tls/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/tls/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/i686/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/i686/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/sse2/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/fglrx/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib/libc.so.6", O_RDONLY)        = 3
stat64("e", 0xbfbb49e4)                 = -1 ENOENT (No such file or directory)
stat64("d", {st_mode=S_IFREG|0644, st_size=26596, ...}) = 0
stat64("e", 0xbfbb4834)                 = -1 ENOENT (No such file or directory)
open("d", O_RDONLY|O_LARGEFILE)         = 3
open("e", O_WRONLY|O_CREAT|O_LARGEFILE, 0644) = 4
Process 19752 detached
Comment 5 Zac Medico gentoo-dev 2007-03-23 00:23:36 UTC
Created attachment 114092 [details, diff]
catch the 'OSError: [Errno 2] No such file or directory' exception

(In reply to comment #4)
>   File "/usr/lib/portage/pym/cache/metadata_overlay.py", line 85, in iterkeys
>     for cpv in self.db_ro.iterkeys():
>   File "/usr/lib/portage/pym/cache/flat_hash.py", line 110, in iterkeys
>     for l in os.listdir(dirs[0]):
> OSError: [Errno 2] No such file or directory:
> '/usr/local/portage/metadata/cache'

If this patch is saved as /tmp/enoent.patch then it can be applied as follows:

patch /usr/lib/portage/pym/cache/flat_hash.py < /tmp/enoent.patch
Comment 6 Michael Scondo 2007-03-23 02:05:09 UTC
(In reply to comment #5)
> Created an attachment (id=114092) [edit]
> catch the 'OSError: [Errno 2] No such file or directory' exception
> 
Works, thanks a lot !
But doesn't change the speed of emerge. 

I found out that the output of (strace -e trace=file cp d e) is normal. 
Makes me wondering, however, these file accesses are also the minor part.
~ # grep "\.so" log | wc -l
1275
accesses out of  8586 where no file has been found.

The log file I create (strace -e file emerge -vp kmtrace 2> log) gets also bigger.
 2.2M and 21753 lines yesterday, 2.4M and 23525 lines today.

Comment 7 Zac Medico gentoo-dev 2007-03-23 02:38:18 UTC
(In reply to comment #6)
> But doesn't change the speed of emerge. 

Is it still slow after you run `emerge --regen`?  Also, make sure that /var/cache/edb/vdb_metadata.pickle exists.  That caches the files in /var/db/pkg since they need to be loaded for dependency calculations.  That cache is only created when a dependency calculation (like `emerge -vp kmtrace`) is run as a the root user.
Comment 8 Michael Scondo 2007-03-23 03:08:52 UTC
> Is it still slow after you run `emerge --regen`?  Also, make sure that
> /var/cache/edb/vdb_metadata.pickle exists.  That caches the files in
> /var/db/pkg since they need to be loaded for dependency calculations.  That
> cache is only created when a dependency calculation (like `emerge -vp kmtrace`)
> is run as a the root user.
> 
Yes, it's still slow, and I did all tests as root.

The file exists.
~ # ls -lh /var/cache/edb/vdb_metadata.pickle
-rw-r--r-- 1 root portage 4.0M Mar 22 22:44 /var/cache/edb/vdb_metadata.pickle


Comment 9 Zac Medico gentoo-dev 2007-03-23 17:48:05 UTC
When you run `emerge --regen` now does in complete quickly?  It reuses valid cache, so it should be quite fast if your cache is valid.
Comment 10 Michael Scondo 2007-03-24 00:56:12 UTC
(In reply to comment #9)
> When you run `emerge --regen` now does in complete quickly?  It reuses valid
> cache, so it should be quite fast if your cache is valid.
> 
Yes, emerge --regen completes without error.

However, here is the output of time emerge -vp kmtrace:
real    0m58.851s
user    0m6.172s
sys     0m1.032s


strace -e file emerge -vp kmtrace 2> log
still gives a log file with a size of 2.4M and 23513 lines,
containing mostly entries about file not found.
e.g. :
stat64("/var/db/pkg/dev-php5", 0xbfbabc08) = -1 ENOENT (No such file or directory)

Maybe I should also tell that I´m running a 2Ghz Dual Turion with 1GB Ram.
Comment 11 Michael Scondo 2007-03-24 01:34:16 UTC
In case you are interested:
I would give you acces to my pc via ssh, I'm also reachable via icq (#372475338)
If it's a bug it's not really severe, but annoying.
And according to google I'm not the only one experiencing this, so it would be great if this one would be solved anyhow.
Comment 12 Zac Medico gentoo-dev 2007-03-24 02:31:48 UTC
(In reply to comment #0)
> strace -e file emerge -vp kmtrace 2> log
> log ha a size of 2.2Mb and 21753 lines
> meaning that there are nearly 21753 file operations ( stat,access, open )
> 
> grep "No such file or directory" log | wc -l
> 8586
> meaning that portage tries 8586 times to access files which don't exist !
> 
> grep "^stat" log | wc -l
> 8648
> 
> grep "^access" log | wc -l
> 8770
> 
> grep "^open" log | wc -l
> 4226

I get similar counts for system calls here.  The command completes in about 2 seconds on my system.  Mine is optimized by storing the entire portage tree and metadata cache in squashfs though (much more efficient for accessing all those files).

There are lots of optimizations that we can do to minimize the number of system calls.  For example, the current matching algorithm always accesses every version of a particular package in the tree.  We can optimize it so that guesses which versions it needs to access and doesn't access any more after it's found a match.
Comment 13 Michael Scondo 2007-03-24 04:01:11 UTC
> I get similar counts for system calls here.  The command completes in about 2
> seconds on my system.  Mine is optimized by storing the entire portage tree and
> metadata cache in squashfs though (much more efficient for accessing all those
> files).
Isn't squashfs readonly ?
2 seconds - I guess you have the filesystem already loaded into the ram.
I have around 3 seconds here, if the files are cached.
But still over 60 seconds after boot or unmounting /usr/portage and mounting again.
The log is also growing after each sync, so I'd think in 1 or 2 years it will take 60 minutes.
If there isn't something I could do..?

> 
> There are lots of optimizations that we can do to minimize the number of system
> calls.  For example, the current matching algorithm always accesses every
> version of a particular package in the tree.  We can optimize it so that
> guesses which versions it needs to access and doesn't access any more after
> it's found a match.
> 

I don't want to buffle anyone, portage is the best package system I know.
However, what has e.g. nano to do with gnome ?
# strace -e file emerge -vp nano 2> log
# grep gnome log | wc -l
395

If I had any clue about python I'd dive into it, but sadly I understand not so much if I look through the sources of emerge. (Besides "Have you mooed today"..lol)

Comment 14 Zac Medico gentoo-dev 2007-03-24 04:36:29 UTC
(In reply to comment #13)
> Isn't squashfs readonly ?

Unionfs can be used to make copy-on-write branch:
http://gentoo-wiki.com/HOWTO_VERY_small_Portage_Tree_with_SquashFS_and_UnionFS

> 2 seconds - I guess you have the filesystem already loaded into the ram.
> I have around 3 seconds here, if the files are cached.
> But still over 60 seconds after boot or unmounting /usr/portage and mounting
> again.

For testing purposes:
echo 3 > /proc/sys/vm/drop_caches

> The log is also growing after each sync, so I'd think in 1 or 2 years it will
> take 60 minutes.

In the future we may distribute the tree in a different format in order to improve efficiency.  More advanced filesystems are another way (squashfs is pretty optimal, but something that's writable would be nicer).

> I don't want to buffle anyone, portage is the best package system I know.
> However, what has e.g. nano to do with gnome ?

Technically, dependency calculations need to account for interactions between all installed packages (to avoid conflicts).
Comment 15 Michael Scondo 2007-03-25 01:43:01 UTC
> > I don't want to buffle anyone, portage is the best package system I know.
> > However, what has e.g. nano to do with gnome ?
> 
> Technically, dependency calculations need to account for interactions between
> all installed packages (to avoid conflicts).
> 

Thanks for your hints, I'll try squashfs.

But wouldn't it be possible to store all conflicts, dependencies, etc in one file or database ?
Dependencies are perhaps not needed to be stored separate, but conflicts could be looked up quite fast (if the possible conflicts would be stored in a senseful way).

As I said before, I'd love to do some work on this - If I'd knew python, on the other side, python doesn't seem to me being so hard to learn (I already know some languages). Could someone perhaps point me out where the checking is done, so I could take a look ?

I guess somewhere around line 4300 in portage.py, but the whole source of portage seems to me quite complex. A tip would be quite useful.

Comment 16 Zac Medico gentoo-dev 2007-03-25 02:47:18 UTC
(In reply to comment #15)
> But wouldn't it be possible to store all conflicts, dependencies, etc in one
> file or database ?

Yes, that's what the /var/cache/edb/vdb_{metadata,blockers}.pickle files do for installed packages.  For the portage tree, you can use the sqlite cache module:

http://gentoo-wiki.com/TIP_speed_up_portage_with_sqlite

If you use the metadate_overlay module together with a portage tree in squashfs, changing to the sqlite module probably won't provide any benefit since reading ${PORTDIR}/metadata/cache/ from squashfs is quite efficient.