Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 290428 - portage should update /var/db/pkg mtime for any modifications inside
Summary: portage should update /var/db/pkg mtime for any modifications inside
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: PMS/EAPI (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 547622 288499
  Show dependency tree
 
Reported: 2009-10-25 01:55 UTC by Zac Medico
Modified: 2015-04-25 00:46 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zac Medico gentoo-dev 2009-10-25 01:55:11 UTC
In order to optimize /var/db/pkg cache validation, package managers should be required to update /var/db/pkg mtime before doing any modifications inside. If we put this in the spec now and add support to all package managers, we'll be able to rely on it in a couple of months.
Comment 1 Brian Harring gentoo-dev 2009-10-25 02:00:57 UTC
Before *and* after.

For anything that does mtime comparison of nodes w/in, the 'after' update is important (although not completely critical).
Comment 2 Petteri Räty (RETIRED) gentoo-dev 2009-10-25 10:55:39 UTC
(In reply to comment #0)
> In order to optimize /var/db/pkg cache validation, package managers should be
> required to update /var/db/pkg mtime before doing any modifications inside. If
> we put this in the spec now and add support to all package managers, we'll be
> able to rely on it in a couple of months.
> 

PMS: The VDB (/var/db/pkg). Ebuilds must not access this or rely upon it existing or being in any particular format.

Basically this is PM internal. Because the tree still uses vdb with things like built_with_use it has been implemented in all PMs. In the long term when these are nuked, PMs are free to implement vdb however they want. There's nothing for the spec here but of course coordination between PMs so that they use the current vdb similarly is good.
Comment 3 Ciaran McCreesh 2009-10-25 20:42:48 UTC
VDB's not covered by PMS at all.

I'm also not sure that messing around with mtimes on directories like this is the way to go. We might be better off with a less voodoo mechanism for letting package managers know whether they can carry on using caches. Something like a /var/cache/gentoo/ directory containing files like 'installable-changed', 'installed-changed' and probably others. Package managers would then echo their name into those files whenever they performed the operations in question.
Comment 4 Brian Harring gentoo-dev 2009-10-25 22:27:14 UTC
(In reply to comment #3)
> I'm also not sure that messing around with mtimes on directories like this is
> the way to go. We might be better off with a less voodoo mechanism for letting
> package managers know whether they can carry on using caches. Something like a
> /var/cache/gentoo/ directory containing files like 'installable-changed',
> 'installed-changed' and probably others. Package managers would then echo their
> name into those files whenever they performed the operations in question.

It doesn't make much sense storing core repository data outside of the repository in this case- the main impetus behind having this timestamp supported is so that PM's can have *two* vdbs in use.  Essentially layering an optimized/cached vdb in front of the normal, using the mtime from the underlying vdb to see if changes have occured (requiring it to resync itself to the underlying vdb).  As is, at least for pkgcore we already do mtime detection to regenerate virtuals caches as needed (suspect portage might also).  Having a proper timestamp to rely on simplifies that immensely, and also is the first point towards more finegrained timestamps for invalidation.

For 'installed-changed', if it is to have the paludis meaning of that particular cache, that wouldn't be enough- that's just the collapsed form of a find basically, doesn't cover actual metadata.
Comment 5 Ciaran McCreesh 2009-10-25 22:29:37 UTC
You're talking about going to extreme lengths to avoid having to fix things properly. I'm really not convinced that this is a sensible solution.
Comment 6 Brian Harring gentoo-dev 2009-10-25 22:59:02 UTC
(In reply to comment #5)
> You're talking about going to extreme lengths to avoid having to fix things
> properly. I'm really not convinced that this is a sensible solution.

Adding a single timestamp into the vdb isn't exactly extreme.

As to building a cache in front of the vdb that pkgcore relies on for it's ops, that's not extreme either (all managers have some form of a cache for the vdb, paludis included)- no different then pushing a cache in front of ebuild trees to avoid paying the cost of continual metadata regeneration.

You're also ignoring one key benefit of why I'm going to do this- it provides compatibility with the shitty tools out there that access the vdb, while improving vdb operations for managers that do caching (which again, all do).

Regardless of your opinion on my own efforts, the timestamp is a way to improve compatibility between the three PMs when it comes to their caches for the vdb.  Paludis could simply check the timestamp, if it's not what it expects, bitch at the user that they need to do a regen of installed-cache (or in a more friendly fashion, regenerate said cache).

There are pretty tangible benefits to this *now* for all managers (yours included).  So no, it's not extreme lengths- it's pragmatic design.
Comment 7 Ciaran McCreesh 2009-10-25 23:13:58 UTC
How does Paludis know which operations might invalidate some arbitrary cache that other package managers may or may not have? How do other package managers know which operations might invalidate some arbitrary cache that we invent for Paludis next week?

There's currently no compatibility whatsoever on the whole cache front. Making this one small change isn't going to guarantee that any assumptions anyone cares to make are valid; all it does is add in a false sense of security that gets in the way of doing all this properly. And, as can be seen from all the caching we're all doing, this is something that needs doing properly.
Comment 8 Brian Harring gentoo-dev 2009-10-25 23:48:01 UTC
(In reply to comment #7)
> How does Paludis know which operations might invalidate some arbitrary cache
> that other package managers may or may not have? How do other package managers
> know which operations might invalidate some arbitrary cache that we invent for
> Paludis next week?

Bad choice of an counter-arg; if we did the 'installed-cache' as you suggested, this would apply in full.  I'm not proposing an 'installed-cache' however.

Note however the push is for a simple timestamp updated when the vdb has been mutated/modified.  If changes have occurred outside that managers purview, by definition it would have to scan the vdb to identify those changes to update it's cache (same task pkgcore/portages caches do now).

> There's currently no compatibility whatsoever on the whole cache front. Making
> this one small change isn't going to guarantee that any assumptions anyone
> cares to make are valid; all it does is add in a false sense of security that
> gets in the way of doing all this properly.

Please look closer- you're starting to blend what you proposed (shared caches to some degree) with what I proposed- a simple synchronization timestamp.

What I'm proposing adds in a way for the manager to identify that there has been modifications done to the vdb that it doesn't know about.  Just that, nothing more (doing more gets into nastier problems that frankly aren't worth the effort).

That's not a false sense of security.  Modifiers of the vdb cooperate on updating that timestamp, it provides a synchronization point for caches to use to discern when they're potentially out of date.

> And, as can be seen from all the
> caching we're all doing, this is something that needs doing properly.

Note the 'pragmatic design' comment I made.  There has been discussion about redesigning the vdb at least since '04.  Note we still have the same POS implementation- the reason being that there has been no api provided from the managers to manipulate it (none really usable), as such tools have developed their own access methods.

Now we can either sit back and keep doing the academic approach of "it should be replaced in full!", or we can be realistic and get the gains now of maintaining compatibility with the old on disk vdb while building caches/new format in front of it, that is a duplicate in metadata content of the old.

Think through where the vdb is truly a cost- access, dependency calculation, etc.  The cost isn't in maintaining/updating it.  A synchronization timestamp in the old vdb format provides a way for managers to identify that their cached/new form of the vdb is no long authoritative, and to resync the 'new' with the old (and maintain compatibility).

Faster, and compatible w/ one minor tweak.

Finally, please seperate your notion of what this is to be, from what is proposed- your arguments against it thus far seem to be based on doing more complex cache interop between the managers, all that is proposed is providing effectively a counter the managers can check to see if someone else has done something to the vdb requiring them to resync.
Comment 9 Brian Harring gentoo-dev 2009-10-25 23:54:36 UTC
Note one question I didn't directly address, although it was answered already in comment #1:
> How does Paludis know which operations might invalidate some arbitrary cache
> that other package managers may or may not have?

It doesn't, and doesn't need to- all it needs to know is that if it modifies the vdb (update md5, apply pkg moves, prune out cache entries, whatever, pretty much all modification) it has to update the timestamp.  Period.
Comment 10 Ciaran McCreesh 2009-10-26 00:14:58 UTC
So you're asking for Paludis to modify mtime on VDB, when by doing that modification does not change anything, because even after that change you still can't guarantee the validity of any particular cache logic. In other words, it's a pointless false sense of security that just puts off fixing things properly yet again. You still have no guarantee that another package manager won't have made a change you haven't thought of that invalidates your cache logic.

Why don't we just fix it properly instead, then? Agree on a suitable directory layout for a sensible replacement for VDB that removes the need for caches. Keep the contents of the leaf directories the same to make things easy. Keep a backup /var/db/pkg structure for read-only legacy tools. Implement it. This *is* the pragmatic solution, since it doesn't involve everything breaking horribly when you suddenly start thinking that making this change would allow you to assume anything at all about any validity of any cache.
Comment 11 Brian Harring gentoo-dev 2009-10-26 00:26:40 UTC
(In reply to comment #10)
> So you're asking for Paludis to modify mtime on VDB, when by doing that
> modification does not change anything, because even after that change you still
> can't guarantee the validity of any particular cache logic.

Guessing you're just missing something obvious, but I'll walk through an example using paludis's installed names cache.

1) paludis generates said cache, binds into said cache the timestamp of the vdb at the time it ran.
2) emerge goes and replaces a pkg; updates the timestamp of the vdb.
3) paludis gets ran again, notices that the timestamp it stored in it's cache doesn't matche the vdb timestamp- thus knowing it's data is stale, it regenerates (or more likely tells the user to do it).

Pretty simple, frankly.

> In other words,
> it's a pointless false sense of security that just puts off fixing things
> properly yet again. You still have no guarantee that another package manager
> won't have made a change you haven't thought of that invalidates your cache
> logic.

Blah blah blah.  Look at the steps above and pick them apart.  Provide counter examples rather then making claims please.

> Why don't we just fix it properly instead, then? Agree on a suitable directory
> layout for a sensible replacement for VDB that removes the need for caches.
> Keep the contents of the leaf directories the same to make things easy.Keep a
> backup /var/db/pkg structure for read-only legacy tools. Implement it. This
> *is* the pragmatic solution, since it doesn't involve everything breaking
> horribly when you suddenly start thinking that making this change would allow
> you to assume anything at all about any validity of any cache.

Quick question.  Under your grand new scheme of having two vdbs, how are you planning on handling the following:

1) pkgcore supports the new vdb format, emerge does.  Paludis however doesn't.
2) pkgcore/emerge are invoked, new vdb generated, old vdb kept in sync.
3) paludis runs.  updates only the old vdb.
4) pkgcore/emerge are invoked.  Unless they have some way to detect that paludis didn't update the new format, they're boned.

Your scheme is fundamentally flawed.  Like it or not, the vdb is not readonly.  For your scheme to fly at the package manager level (let alone any consuming tools), it would require the user to upgrade their PMs (all of them) to versions supporting the new vdb format *and* old.

That isn't pragmatic.  That's pie in the sky lunacy.

The fix your proposal is to add in and rely on some synchronization marker.  That synchronization marker is the timestamp you're arguing against.

Note I'm not against having a unified a new vdb between the managers.  Thing is, I'd be pushing for the new vdb to have a similar timestamp on it for scenarios where the VDB is remotely stored, but a local cached version needs be stored.  That little timestamp has a lot of potential uses, both for transitioning away from the old vdb and for optimizing it's usage in the interim.
Comment 12 Ciaran McCreesh 2009-10-26 00:35:52 UTC
(In reply to comment #11)
> 3) paludis gets ran again, notices that the timestamp it stored in it's cache
> doesn't matche the vdb timestamp- thus knowing it's data is stale, it
> regenerates (or more likely tells the user to do it).

Except that there is absolutely no guarantee that the cache will be valid even if the mtimes do match, so we still have to tell the user to do cache regeneration by hand if they use another package manager. No change there, except that you're introducing a false assumption that this change will make it possible for a package manager to check cache validity.

Current situation: the package manager has no way of knowing whether any of its caches remain valid, so we tell the user what to do if they use another package manager.

New situation: the package manager has no way of knowing whether any of its caches remain valid, so we tell the user what to do if they use another package manager.

> Quick question.  Under your grand new scheme of having two vdbs, how are you
> planning on handling the following:
> 
> 1) pkgcore supports the new vdb format, emerge does.  Paludis however doesn't.
> 2) pkgcore/emerge are invoked, new vdb generated, old vdb kept in sync.
> 3) paludis runs.  updates only the old vdb.
> 4) pkgcore/emerge are invoked.  Unless they have some way to detect that
> paludis didn't update the new format, they're boned.

I'm planning on us all migrating, and until we do the migration, we continue to provide the user with instructions on what to do when they use multiple package managers. There is no change there.

> For your scheme to fly at the package manager level (let alone any consuming
> tools), it would require the user to upgrade their PMs (all of them) to
> versions supporting the new vdb format *and* old.

It would require the user to upgrade any tools that write to VDB, yes. Please explain to me how this is in any way different from the mtime proposal in this bug.

As for new and old, it would simply require maintaining a small additional directory structure that fakes the old VDB layout by symlinking /var/db/pkg/cat/pn-pv to, say, /var/db/installed-packages/data/whatever.

> The fix your proposal is to add in and rely on some synchronization marker. 
> That synchronization marker is the timestamp you're arguing against.

The marker is unreliable, and thus worse than useless.
Comment 13 Brian Harring gentoo-dev 2009-10-26 01:09:04 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > 3) paludis gets ran again, notices that the timestamp it stored in it's cache
> > doesn't matche the vdb timestamp- thus knowing it's data is stale, it
> > regenerates (or more likely tells the user to do it).
> 
> Except that there is absolutely no guarantee that the cache will be valid even
> if the mtimes do match, so we still have to tell the user to do cache
> regeneration by hand if they use another package manager. No change there,
> except that you're introducing a false assumption that this change will make it
> possible for a package manager to check cache validity.
> 
> Current situation: the package manager has no way of knowing whether any of its
> caches remain valid, so we tell the user what to do if they use another package
> manager.

Paludis chooses this path.  Portage and pkgcore however validate their caches.  So that's the current situation for *your* manager, and paludis could just as easily do the validation (I presume you skipped it for speed reasons and the desire to leave compatibility concerns on the users head rather then in the manager).


> New situation: the package manager has no way of knowing whether any of its
> caches remain valid, so we tell the user what to do if they use another package
> manager.

You're missing the nature of changes like this.  New situation is reliant on the implicit transition period that goes with any change of this sort.  During this. the managers are updated to do the updates.  They however can't yet rely on them- they use whatever cache validation mechanism they have already (or tell the user it's their problem, per the paludis way- either way, same old same old during it).

Once the appropriate time has passed, push out a news item if desired, and release versions that rely on the new timestamp.  Transition period over, timestamp is required.

> > Quick question.  Under your grand new scheme of having two vdbs, how are you
> > planning on handling the following:
> > 
> > 1) pkgcore supports the new vdb format, emerge does.  Paludis however doesn't.
> > 2) pkgcore/emerge are invoked, new vdb generated, old vdb kept in sync.
> > 3) paludis runs.  updates only the old vdb.
> > 4) pkgcore/emerge are invoked.  Unless they have some way to detect that
> > paludis didn't update the new format, they're boned.
> 
> I'm planning on us all migrating, and until we do the migration, we continue to
> provide the user with instructions on what to do when they use multiple package
> managers. There is no change there.

You're missing the point- either you're leaving out a transition period of updating but not using, or you're daftly assuming users can be told to upgrade pkgcore, paludis, and portage all to specific versions.  Same transition period with the timestamp proposal.

> As for new and old, it would simply require maintaining a small additional
> directory structure that fakes the old VDB layout by symlinking
> /var/db/pkg/cat/pn-pv to, say, /var/db/installed-packages/data/whatever.

Bad idea linking the new format to the old- I presume you mean to say "linking a compatibility version of the data from the new to the old".  Solar alone would be bitching about having seperate files for each metadata key (as would I, it's inefficient both in usage and in FS consumption).


> > For your scheme to fly at the package manager level (let alone any consuming
> > tools), it would require the user to upgrade their PMs (all of them) to
> > versions supporting the new vdb format *and* old.
> 
> It would require the user to upgrade any tools that write to VDB, yes. Please
> explain to me how this is in any way different from the mtime proposal in this
> bug.
> > The fix your proposal is to add in and rely on some synchronization marker. 
> > That synchronization marker is the timestamp you're arguing against.
> 
> The marker is unreliable, and thus worse than useless.

Your proposal is that all consumers upgrade and support the new format.  Then a switch gets flipped at some point indicating that they can rely on the new format for read.

My proposal is that a timestamp gets shoved into the old vdb, we update it until such time a switch is flipped (PM releases realistically) indicating read is allowed for that timestamp (and can be trusted).

You want a new vdb, push for a new vdb outside of this modification to the old vdb.  I guarantee your new vdb is going to be reliant on this timestamp however for compatibility and transitioning (primarily since your notion of transition opens up real risks of the new/old being out of sync in an unresolvable method).

Either way, I suspect a good summation of your viewpoint is "do a new vdb instead".  Feel free to flesh out a proper transition period for your proposal that doesn't rely on some timestamp synchronization however- or stop arguing against this proposal (the irony is your new vdb you want is realistically dependent on this proposal).

Regardless, since I highly doubt you'll back down from your position and both pkgcore and portage can directly benefit from this without having to wait another 5 years for a new vdb to materialize, I intend to push this up to the council for decision.  I suggest you get your counter examples in order for the council meeting- I expect you'll want to shoot this down, and real examples will be needed.
Comment 14 Brian Harring gentoo-dev 2009-10-26 01:10:00 UTC
@solar; your custom manager also does vdb modifications.  Your commentary on regarding adding this timestamp would be appreciated.
Comment 15 Ciaran McCreesh 2009-10-26 01:20:41 UTC
(In reply to comment #13)

> Paludis chooses this path.  Portage and pkgcore however validate their caches. 

But they can't, since there's no way for them to know whether or not the assumptions they make hold. No change on mtime doesn't mean the cache remains valid, since it might have been invalidated in other ways you haven't thought of.

That's the point here: introducing mtime guarantees on VDB doesn't allow you to validate your cache.

> Once the appropriate time has passed, push out a news item if desired, and
> release versions that rely on the new timestamp.  Transition period over,
> timestamp is required.

And even when that time has passed, there's still no guarantee that your cache is valid, since you can't assume that the only factor affecting the validity of your cache is the state of VDB.

> You're missing the point- either you're leaving out a transition period of
> updating but not using, or you're daftly assuming users can be told to upgrade
> pkgcore, paludis, and portage all to specific versions.  Same transition period
> with the timestamp proposal.

No, I'm assuming that users will continue to regenerate caches by hand when using multiple package managers until we tell them not to. It's what they have to do now anyway.

> Bad idea linking the new format to the old- I presume you mean to say "linking
> a compatibility version of the data from the new to the old".  Solar alone
> would be bitching about having seperate files for each metadata key (as would
> I, it's inefficient both in usage and in FS consumption).

Changing the directory layout and changing the format of the metadata files within the directory are two different things. Both need doing at some point, but they don't have to be done together. Only the directory layout's relevant for things we're caching right now; switching to a single file for most metadata can and should be done independently.

> My proposal is that a timestamp gets shoved into the old vdb, we update it
> until such time a switch is flipped (PM releases realistically) indicating read
> is allowed for that timestamp (and can be trusted).

It can only be trusted if the cache is a function purely of the VDB contents. You've in no way guaranteed that that's true, and doing so would require extending the scope of the proposal.

> You want a new vdb, push for a new vdb outside of this modification to the old
> vdb.  I guarantee your new vdb is going to be reliant on this timestamp however
> for compatibility and transitioning (primarily since your notion of transition
> opens up real risks of the new/old being out of sync in an unresolvable
> method).

The timestamp you want doesn't provide any reliability unless you also introduce new rules about what caches can and cannot do.

> Either way, I suspect a good summation of your viewpoint is "do a new vdb
> instead".  Feel free to flesh out a proper transition period for your proposal
> that doesn't rely on some timestamp synchronization however- or stop arguing
> against this proposal (the irony is your new vdb you want is realistically
> dependent on this proposal).

No, my view is "don't introduce mtime changes for cache validation if you can't prove that mtime changes will guarantee that cache validation works correctly". Since we don't have that guarantee, introducing mtime changes is harmful.

Provide proof that all existing and future caches that would rely upon this validation mechanism are functions purely and exclusively dependent upon the VDB content, and I shall be happy to make the change.
Comment 16 Zac Medico gentoo-dev 2009-10-31 04:43:41 UTC
I'll just treat this as a portage bug. Thanks anyway. :)
Comment 17 Zac Medico gentoo-dev 2009-10-31 04:45:54 UTC
There's support in svn r14735 for updating mtime on /var/db/pkg as well as category subdirectories. This is released in portage-2.1.7.2 and 2.2_rc47.