Summary: | sys-apps/portage-2.1.11.10 is slow | ||
---|---|---|---|
Product: | Portage Development | Reporter: | wbrana |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | pageexec, tomwij |
Priority: | Normal | ||
Version: | 2.1 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
wbrana
2012-08-15 07:53:40 UTC
22 seconds with Python 3 and emerge -vpu1 --keep-going `qlist -IC` 51 seconds with same as in comment 1, but after "echo 3 > drop_caches" Are you saying that it's slow in comparison to some other version of portage, or just slower than you would expect given the task at hand? It's slow compared to Debian APT. Some of the performance difference for Portage is related to that fact that it's possible to modify ebuilds at any time, and Portage is expected to detect these changes automatically and account for them. The makes access to the available package database somewhat less efficient than it would be for something like Debian APT. You can try alternative package managers in order to compare their performance, such as sys-apps/pkgcore and sys-apps/paludis. I've heard that pkgcore is the fastest one. emerge should get ebuild dependencies only from sqlite database and shouldn't scan ebuilds. ebuild digest should update database. emerge --sync shouldn't download all ebuilds emerge --sync should only download option 1: xz compressed sqlite database with dependencies among all ebuilds option 2: xz compressed files with SQL commands which update sqlite dependency database, one file per day ebuilds should be downloaded on demand, e.g. emerge sqlite will download sqlite-1.2.3.ebuild I think the existing repository layout and database/cache formats are pretty optimal for development-oriented scenarios. If you we support both development-oriented and consumer-oriented scenarios simultaneously, then conflicting goals will lead to sacrifices that negatively impact both kinds of users. So, I think it would be optimal to introduce an entirely separate repository layout and database/cache format for the consumer-oriented scenario. This has been discussed previously as a proposed GSOC project: http://wiki.gentoo.org/wiki/Google_Summer_of_Code/2012/Ideas#Repository_of_self-contained_ebuild_source_packages emerge -vpu1 --keep-going `qlist -IC` takes 100 seconds after emerge-delta-webrsync The time is more a result of algorithmic complexity [1] than it is of how the information is obtained, note that there is already a cache present in /usr/portage/metadata/md5-cache/ which limits the amount of data that needs to be accessed; there is barely any benefit of putting this in sqlite, if you want to try how it works with that information cached in SQLite then go ahead [2] since the functionality is already there and hasn't found to be a remarkable improvement. So, back to the algorithmic complexity: There are a lot of visibility checks going on (highest version, USE flag conditional dependencies, masked USE flags, masked packages, keyworded packages, slots, subslots and so on...). It's where Portage is spending most of its time, evaluating all these checks [1]. So, if one would like to see a speed up one would need to write caches for some or all of these things. That on its own is a task that requires some time to fix. An alternative would be to come up with another algorithmic way of accomplishing this task, but is it really worth rewriting Portage for this? Some people said yes and wrote alternatives, but those all seem to run behind on Portage (not support all of its features, not supporting EAPI 5, ...). So, work is to be done; now we only need to find people interested in doing it. [1]: http://i.imgur.com/A93CdNR.png [2]: http://webcache.googleusercontent.com/search?q=cache:Sv5MJhN0eD0J:en.gentoo-wiki.com/wiki/Portage_SQLite_Cache Related to bug 468486 It isn't big problem. |