Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 83371

Summary: Portage module to use DJB's cdb (constant database) for performance reasons, in ebuild format
Product: Gentoo Linux Reporter: Matan Peled <chaosite>
Component: New packagesAssignee: Portage team <dev-portage>
Status: VERIFIED LATER    
Severity: enhancement CC: abraham, hoffbrinkle, leho, mstearn, pacho, wschlich
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc.html
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: app-portage/portage-cdb
distfile

Description Matan Peled 2005-02-26 06:16:18 UTC
Reading through the forum thread, I saw major improvements in metadata generation, searching, etc. times.

On a chat with carpaski on #gentoo, he explained why the module could not be made default (external runtime dependancies subject to a segfault), but said it might be included.

So, I made an (hackish, I'm new at this) ebuild out of it.

the dev-db/cdb ebuild is keyworded for x86, alpha, and ~amd64, but the python-cdb ebuild is only keyworded for x86 and ~amd64. So either the ebuild(s?) need to be fixed, or this only applies to x86/amd64.

Reproducible: Always
Steps to Reproduce:
Comment 1 Matan Peled 2005-02-26 06:17:24 UTC
Created attachment 52185 [details]
app-portage/portage-cdb
Comment 2 Matan Peled 2005-02-26 06:17:54 UTC
Created attachment 52186 [details]
distfile
Comment 3 Brian Harring (RETIRED) gentoo-dev 2005-02-27 19:30:21 UTC
Make database.sync actually do something, rather then silently acting as if it did what was requested please :)
Re: database instance caching, don't like the approach offhand- binding within the class rather then w/in the module namespace is preferable imo, but that's just my opinion.  Beyond that, why cache category db instances?  That's not the slowdown (exempting repoman crazyness, the caching of new instances isn't required).

Aside from that, self.modified shouldn't be False till after the data has actually been sync'd to disk- if an exception bails out afterwards, the module is now in an invalid state.
iirc, cPickle.HIGHEST_PROTOCOL w/ 2.2 -> 2.3, has an issue- the highest protocol was upped for 2.3, leading to incompatible pickle'd data.  Something to note...

Meanwhile, marking it as LATER- take a look at 
http://dev.gentoo.org/~ferringb/cache/
also please.  I'm intending on moving the cache db classes from being category specific, to being repository specific- template.py, and fs_template.py should be usable for your CDB backend.

If the framework above isn't usable enough/have suggestions, please give a yell (preferably on the bug).
Comment 4 Brian Harring (RETIRED) gentoo-dev 2005-02-27 20:45:58 UTC
Addendum, setting /etc/modules by default I'm not much for...
We also lack any form of policy on that so suggestions are welcome.
Comment 5 Brian Harring (RETIRED) gentoo-dev 2005-02-27 23:21:12 UTC
*** Bug 26447 has been marked as a duplicate of this bug. ***
Comment 6 Tobias Bell 2005-05-10 23:58:07 UTC
The problem with database.sync is, that it's done after every key-value insert by emerge --sync or --metadata. That would make cdb unbelieveable slow, because normaly it's a constant database. And there is no problem with setting self.modified to True because you can't corrupt the database. The realSync method creates a new database and makes an rename to the old database. This is a rather atmic operation. It works or not, but no corruption. I think the whole portage db-caching needs a new design. My module is so hackish to gain a bit performance.
Comment 7 Marius Mauch (RETIRED) gentoo-dev 2007-01-11 14:41:41 UTC
Closing due to old age (module does't work with portage-2.1 anyway).