It would be really useful for app-portage/smart-live-rebuild (bug #310975) if cvs.eclass could export some kind of 'global revision' variable like svn, git and other VCSes do. I suggest naming the var ECVS_VERSION (similarly to other VCS eclasses). To keep it simple, the value could be some kind of 'last change date' (not sure if that would notice file removals though) or something similar.
the problem with cvs is that it has no repo-level information. individual files have dates/revs, but that is it. there would be no way to find this out without querying every file in the entire CVS tree and finding out the latest changed date across them all. that means you're basically left with the checkout date, and that's no different from using `date` or something similar. if you have any clever ideas, feel free to re-open.
Hm, I see it's harder than I thought. How about simply writing down mtime of the checkout dir there? I think that would be enough to detect whether the tree was changed since last merge. That approach seems to work for me, and it is quite simple to implement and lightweight too.
i think mtime of a dir only directly reflects the files in it. so you'd need again a recursive walk of the whole CVS tree. the CVS/Entries files should contain all the rev info for particular files. so it shouldnt be too bad unless you have a large cvs tree ? do we really have any ebuilds using cvs.eclass anymore ? trouble then becomes a sorting one ... find -ipath '*/CVS/Entries' -exec sed -r 's|^/.*/.*/(.*)/.*/$|\1|' {} + \ | sort -n -k 5
Another thing worth considering might be 'cvs history'; it seems to give quite a nice results but I find it really hard to understand most of them.
i'm not sure we can rely on the history file. a lot of projects nuke it and/or symlink it to /dev/null due to its easily excessive expansion. ive seen that thing hit hundreds of megabytes before.
To sum up, we have three possibilities now: 1) simple stat of the directory as suggested by me -- might not be accurate but easy and seems enough for simple update checks, 2) CVS/Entries mangling -- would have to implement some date parsing/sorting, 3) 'cvs log' mangling -- awfully slow but sortable (8s for a simple project). I think I would personally still go with 1). What I need exactly is being able to guess (not necessarily correct) if between the last time package was merged (i.e. when the envvar was written to environment.bz2) and the current checkout state anything has changed (i.e. if any 'cvs up' call updated files). And that seems to be accomplishable with my idea, not requiring much effort nor wasting user's time.
(In reply to comment #0) > It would be really useful for app-portage/smart-live-rebuild (bug #310975) if > cvs.eclass could export some kind of 'global revision' variable like svn, git > and other VCSes do. I assume you just want some value which changes when any file has been updated, it doesn't have to correspond to a revision number or timestamp or anything? In which case: (In reply to comment #6) > 2) CVS/Entries mangling -- would have to implement some date parsing/sorting, wouldn't really need any date manipulation, it should be enough to take a hash of the relevant information (probably the path/filenames with the revision number of each file, sorted to make sure it's deterministic).
(In reply to comment #7) > (In reply to comment #6) > > 2) CVS/Entries mangling -- would have to implement some date parsing/sorting, > > wouldn't really need any date manipulation, it should be enough to take a hash > of the relevant information (probably the path/filenames with the revision > number of each file, sorted to make sure it's deterministic). We'd still need to iterate over every CVS/Entries file there, and rely on some external hashing tool. But seems considerable to me.
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > 2) CVS/Entries mangling -- would have to implement some date parsing/sorting, > > > > wouldn't really need any date manipulation, it should be enough to take a hash > > of the relevant information (probably the path/filenames with the revision > > number of each file, sorted to make sure it's deterministic). > > We'd still need to iterate over every CVS/Entries file there, and rely on some > external hashing tool. But seems considerable to me. Trying to implement that, run into a serious problem. The find tool iterates over files in a pretty random order (i.e. the filesystem order), and I think we can't guarantee CVS entries are ordered in any way too (updated files are moved to the bottom). Unless we implement a lot of sorting everywhere, this makes the hashing idea as reliable as my simple checkout directory timestamp check. I'm reopening the bug to hopefully get some more discussion on the topic.
it isnt random at all. piping the output into sort is trivial. if that is acceptable, then it's easy to add: export ECVS_VERSION=`find -ipath '*/CVS/Entries' -exec cat {} + | sort | sha1sum`
(In reply to comment #10) > it isnt random at all. piping the output into sort is trivial. > > if that is acceptable, then it's easy to add: > export ECVS_VERSION=`find -ipath '*/CVS/Entries' -exec cat {} + | sort | > sha1sum` Not exactly. The Entries file doesn't contain the full paths to the files. But I think we could prepend each line of it with them and then sort. My suggestion would be then: export ECVS_VERSION=$(find -type d -name CVS -prune -exec sed -n -e 's;^/;{}:;p' {}/Entries \; | sort | sha1sum | cut -d' ' -f1) The list consists then only of the file entries, of the form: ./CVS:black_720x576.mpg/1.1/Sat Jun 3 09:44:41 2006// ./CVS:config.c/1.94/Sun May 30 23:24:12 2010// The 'CVS' part is unnecessary/incorrect/whatever and first slash is eaten but I don't think it really matters. The formula is POSIX-compliant, except for the sha1sum call.
i dont think the full path is necessary. the likelihood of getting a file in two different paths with the same name, rev, and timestamp (down to the second) is small enough to not worry about it.
(In reply to comment #12) > i dont think the full path is necessary. the likelihood of getting a file in > two different paths with the same name, rev, and timestamp (down to the second) > is small enough to not worry about it. I think that if we're doing something already, we should make it safe, especially that that doesn't cost much.
... yet you're ok with relying on a hash function to provide collision-free results. sorry, that argument isnt going to fly. ive committed this: http://sources.gentoo.org/eclass/cvs.eclass?r1=1.73&r2=1.74
(In reply to comment #14) > ... yet you're ok with relying on a hash function to provide collision-free > results. sorry, that argument isnt going to fly. > > ive committed this: > http://sources.gentoo.org/eclass/cvs.eclass?r1=1.73&r2=1.74 One more request. Could you add LC_ALL=C to the sort call?
indeed; i should have seen that too. thanks. http://sources.gentoo.org/eclass/cvs.eclass?r1=1.74&r2=1.75