Summary: | sys-apps/portage: emerge search actions should use an index to improve performance (like esearch) | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Zac Medico <zmedico> |
Component: | Core - Interface (emerge) | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | esigra |
Priority: | Normal | Keywords: | InVCS |
Version: | 2.2 | ||
Hardware: | All | ||
OS: | All | ||
URL: | http://thread.gmane.org/gmane.linux.gentoo.portage.devel/4640 | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=412471 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 240187, 484436 | ||
Attachments: |
emerge --search: use description index
emerge --search: use description index emerge --search: use description index |
Description
Zac Medico
2014-10-18 02:58:04 UTC
Created attachment 386860 [details, diff]
emerge --search: use description index
This adds an egencache --update-pkg-desc-index action which generates a plain-text index of package names, versions, and descriptions. The index can then be used to optimize emerge --search / --searchdesc actions. If the package description index is missing from a particular repository, then all metadata for that repository is obtained using the normal pordbapi.aux_get method.
Searching of installed packages is optimized to take advantage of vardbdbapi._aux_cache, which is backed by vardb_metadata.pickle. See the IndexedVardb docstring some more details.
What happens if the index is outdated? If Portage doesn't fallback to regular search, then it's a major regression over the current code. (In reply to Michał Górny from comment #2) > What happens if the index is outdated? It assumes that the list of packages in the index is correct, so it only searchs those packages. If any of those ebuilds have been removed, it triggers the "emerge: search: aux_get() failed, skipping" message in search.py. > If Portage doesn't fallback to regular search, then it's a major regression over > the current code. I suppose we could add a --search-index=<y|n> option. Would it be acceptable to you to have this option enabled by default? Then you could set EMERGE_DEFAULT_OPTS="--search-index=n" if you wanted to persistently disable it. Created attachment 386866 [details, diff] emerge --search: use description index This updated patch adds --search-index < y | n >: For users that would like to modify ebuilds in a repository without running egencache afterwards, the new emerge --search-index < y | n > option can be used to get non-indexed search. Alternatively, the user could simply remove the stale index file, in order to disable the search index for a particular repository. I'll be maintaining this patch in the following branch: https://github.com/zmedico/portage/tree/bug_525718 Additionally, 1.5M extra update on each rsync run would be a noticeable extra load. We can make a news item, if a user doesn't want it, they can add it to the rsync exclude list. It's not that big a file! Created attachment 386988 [details, diff]
emerge --search: use description index
This updated patch changes the index format to use spaces instead of commas, for readability. This example given in man/portage.5:
sys-apps/sed 4.2 4.2.1 4.2.1-r1 4.2.2: Super-useful stream editor
sys-apps/usleep 0.1: A wrapper for usleep
Hopefully that's easier on the eyes (thanks to Michał Górny for the suggestion).
Also, Michał has brought it to my attention that git will send the whole file instead of the delta, unless an expensive `git repack` operation is performed. Maybe it's possible to repack the user.git each time the index is generated? Currently, the master rsync mirror runs egencache every 30 minutes. If user.git syncs at the same interval, it would need to be repacked at the same interval.
Anyway, it would be nice to merge this patch, even if we don't have the resources now to generate the index for gentoo on the server side. We could follow up this patch later with a post emerge --sync hook for client-side index generation.
In order to optimize IndexedVardb so that it doesn't have to do expensive listdir calls in /var/db/pkg/*, I plan to have vardbapi maintain a small log file that is updated with each merge and unmerge. The log will keep track of all merges/unmerges that have occurred since the most recent update of vdb_metadata.pickle (vdb_metadata.pickle is not updated for every single merge/unmerge, since that would lead to excessive re-writing of a large file). Then, IndexedVardb can use this log file together with vdb_metadata.pickle to get a complete view of /var/db/pkg, without the need to call listdir inside /var/db/pkg/*. This is in the master branch now: https://github.com/gentoo/portage/commit/d22ba91d4adc177551be8b4be95a6fc1f061fc2e https://github.com/gentoo/portage/commit/7bc992a02c89c6c3f76b09bfc978c104fb1c2b9a https://github.com/gentoo/portage/commit/96c2e57685659211c1e33281e08dfaff04d05b58 https://github.com/gentoo/portage/commit/5424b91133b3b155b0e6ddc08fb46ba301d971f8 https://github.com/gentoo/portage/commit/55c8c8bc7a781e3f71ce92922eea64ad4cafce3c https://github.com/gentoo/portage/commit/d800d224ab38c0f524d3fe858ebe201cbfa903c1 https://github.com/gentoo/portage/commit/646b671d4afb92e0bb81664568544e01e8456dc2 Released in portage-2.2.16. |