Files that contain CVS keywords are having these keywords incorrectly expanded during conversion. These are keywords that are part of patch files and not part of the typical Gentoo headers. Example: The file in git: $ git cat-file blob 3a4f1c569e5a3d43c0b1cf77605a8465aacb3cbf # $Source: /usr/local/ssd/gentoo-x86/output/dev-libs/cvs-repo/gentoo-x86/dev-libs/libtomcrypt/files/Attic/libtomcrypt-1.06-makefile.diff,v $ -# $Revision: 1.1 $ -# $Date: 2009/12/07 11:27:03 $ +# $Revision: 1.1 $ +# $Date: 2009/12/07 11:27:03 $ The file in rcs: $ cat gentoo-x86/dev-libs/libtomcrypt/files/Attic/libtomcrypt-1.06-makefile.diff,v # $Source: /cvs/libtom/libtomcrypt/makefile,v $ -# $Revision: 1.86 $ -# $Date: 2005/07/30 04:54:20 $ +# $Revision: 1.89 $ +# $Date: 2005/08/07 23:41:08 $ The output of cvs without substitution: $ cvs co -r 1.1 -ko -p gentoo-x86/dev-libs/libtomcrypt/files/libtomcrypt-1.06-makefile.diff # $Source: /cvs/libtom/libtomcrypt/makefile,v $ -# $Revision: 1.86 $ -# $Date: 2005/07/30 04:54:20 $ +# $Revision: 1.89 $ +# $Date: 2005/08/07 23:41:08 $ The output of cvs with substitution: $ cvs co -r 1.1 -p gentoo-x86/dev-libs/libtomcrypt/files/libtomcrypt-1.06-makefile.diff # $Source: /var/cvsroot/gentoo-x86/dev-libs/libtomcrypt/files/Attic/libtomcrypt-1.06-makefile.diff,v $ -# $Revision: 1.1 $ -# $Date: 2009/12/07 11:27:03 $ +# $Revision: 1.1 $ +# $Date: 2009/12/07 11:27:03 $
This is a really crappy body. Ok, it wasn't clear which copy was git vs your copy. I see that it's actually a problem in the git copy, the RCS path mangling needs to be applied to more files. See commit 5862ab3350b849327dc86824019d18a9e8217503 http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=commit;h=5862ab3350b849327dc86824019d18a9e8217503 http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=commitdiff;h=5862ab3350b849327dc86824019d18a9e8217503;hp=0c9a11efe4239b13fb1ddebfade569d71d47aeaa $Source: /usr/local/ssd/gentoo-x86/output/dev-libs/cvs-repo/gentoo-x86/dev-libs/libtomcrypt/files/Attic/libtomcrypt-1.06-makefile.diff,v $ I'm using git pickaxe to find all further locations of the problem string '/usr/local/ssd/gentoo-x86'.
Here are all of the broken commits # git log -S/usr/local/ssd/gentoo-x86/output --all commit 1ee65de750e08c8a13dab5e296b1b555b999bfc8 Author: Thilo Bangert <bangert@gentoo.org> Date: Mon Aug 9 10:08:42 2010 +0000 fix patches that failed to apply (#331427) Package-Manager: portage-2.2_rc67/cvs/Linux i686 commit 5862ab3350b849327dc86824019d18a9e8217503 Author: Thilo Bangert <bangert@gentoo.org> Date: Mon Dec 7 11:27:04 2009 +0000 rework patches so they apply (bug #245998) - fix doc install location - disables tests Package-Manager: portage-2.2_rc56/cvs/Linux i686 commit 3b7c322ad45d31b8fbe048ad067d9736271a6cc3 Author: Alin N?stac <mrness@gentoo.org> Date: Sun Feb 27 08:25:34 2005 +0000 remove broken patch Package-Manager: portage-2.0.51-r15 commit 7dfb8d3febde1b1f63c2119eaebb7ea2abeb181e Author: Alin N?stac <mrness@gentoo.org> Date: Sat Feb 26 23:22:34 2005 +0000 bump to source version 3.0.3-2 (#83205); add zlib support (#83278); remove slots Package-Manager: portage-2.0.51-r15 commit 77e3cd27a5098601cca4eaf803c310d7eb4c713e Author: Martin Holzer <mholzer@gentoo.org> Date: Tue Apr 29 20:49:33 2003 +0000 diff now done with sed. Version bumped. Closes #20011. commit 35082a660b94d9369389e25c8d46cec293519c83 Author: Seemant Kulleen <seemant@gentoo.org> Date: Sun Dec 15 01:13:32 2002 +0000 version bump commit fbb2266e02d921378c17cbda800e711e8c96acf6 Author: Thilo Bangert <bangert@gentoo.org> Date: Fri Jun 28 11:57:48 2002 +0000 removal of moved packages -- 12 not 13 btw. commit af820390ec29723ead48d06429518f5e85485e70 Author: Achim Gottinger <achim@gentoo.org> Date: Sun Dec 17 17:48:05 2000 +0000 *** empty log message ***
Created attachment 339300 [details] List of files that contain this issue in the latest run. Issue still exists in latest run, but it looks like it only impacts 7 files. The hash field contains the hashes of the blobs - git show hash will display the file. The commit field is the hash of the commit. Key is meaningless.
(In reply to comment #3) > Created attachment 339300 [details] > List of files that contain this issue in the latest run. > > Issue still exists in latest run, but it looks like it only impacts 7 files. > > The hash field contains the hashes of the blobs - git show hash will display > the file. The commit field is the hash of the commit. Key is meaningless. Think you may want to check your script. The screwups w/ mrness's name aren't in git at all. For the other revs, the differences are literally just a newline removal. The newline removal itself just maps back to either rcs or cvs usage for the blob import code. Either way, uploading a cvs version shortly; look for 2015-02-15-cvs. The issues w/ folks names isn't showing in git at all- I'm pretty sure that's a false positive in the validation code (take a look at mrness's name in the output logs; that looks like mangled latin1 -> utf8 conversion, doesn't show up in git however). Aside from that, for future lists of issues- details of what's borked for each would be useful. :)
(In reply to comment #4) > Think you may want to check your script. The screwups w/ mrness's name > aren't in git at all. I never claimed there was a problem with mrness's name - I think you're confusing this with the other bug. This bug is about CVS keyword substitution only. > > Aside from that, for future lists of issues- details of what's borked for > each would be useful. :) I'll just pick one commit then to comment on as it is illustrative: $ git show 99a85207e3686eb8b6e6cb698d0e2cfb2953f9d9 $ less gentoo-x86/app-mobilephone/bitpim/Attic/bitpim-0.8.13.ebuild,v The original line checked into in rcs is: sed -i -e 's/^__FROZEN__="$Id:.*"/__FROZEN__="$Id: '${svnrev}' $"/' \ The line in git is: sed -i -e 's/^__FROZEN__="$Id: bitpim-0.8.13.ebuild,v 1.1 2006/04/26 16:04:41 mrness Exp $Id: '${svnrev}' $"/' \ This sort of situation is a bit messy because in cases like this it isn't obviously when to substitute keywords, and when not to. Certainly cvs doesn't handle this well either. If we want to just ignore these issues during conversion that is fine by me, but I just wanted to note that they are still happening. If you checkout revision 1.1 using cvs not using -ko you get a different keyword expansion that I'd likely still consider wrong, and if you check out out using -ko you get that line correctly, but the keywords in the ebuild header will be wrong. So, the issue isn't really entirely the fault of conversion - it is impossible to get even cvs to really do the right thing in these cases.
(In reply to comment #4) > For the other revs, the differences are literally > just a newline removal. FYI - I actually did find some issues with newlines coming up as false positives and some other script issues which I'm fixing (which also gets rid of the need to use hadoop which should help us out come time to do this for real), but I don't think any of those made their way into bugs. I do check issues by manually comparing the output of cvs and git before filing bugs and evaluate them critically. However, it is entirely possible that we're miscommunicating on what the actual bugs are. This bug is about CVS keyword expansion only, and examples should be evaluated in that context. The other bug is about unicode mangling, so look at that bug for examples of this. I believe both issues are real, though the keyword issue might be unavoidable unless we just hard-code in how to handle these individual commits as the way keyword expansion works makes it hard to "do the right thing" automatically. I could easily see a debate over what the right thing even is in these cases - the output of cvs is arguably wrong.
Created attachment 340478 [details] List of files that contain this issue in run2 Latest run is down to 5 affected files. Details attached.
ferringb: can you upload your latest code+configs for the runner to the working dir?
ferringb is retiring, not sure who will continue taking care of this problem :/
(In reply to Pacho Ramos from comment #9) > ferringb is retiring, not sure who will continue taking care of this problem > :/ Honestly, I think this particular problem is finished (enough) to be ignored come migration time. However, making sure we have knowledge transitioned on the tool is critical. Robin and I have been working fairly closely with him - perhaps we should both see if we can get the tool to run ourselves.
(In reply to Robin Johnson from comment #8) > ferringb: can you upload your latest code+configs for the runner to the > working dir? What's up there is already the latest I've got. Due to the storage fire I no longer even have any fucking hardware to even work on this so there ain't much I can do there- previously I ran this on my goog hardware, but new job, no longer got that hardware either. ;) I'm available via email (@gmail), and via im (still around, just not in channels), so ping w/ questions/requests/etc. @Pacho: while retiring me, anything git related leave the @gentoo.org at and just migrate my gentoo bugzie account to my original gmail account please.
(In reply to Richard Freeman from comment #7) > Latest run is down to 5 affected files. Details attached. None of the affected files exist in portage anymore. Are there other files now? If yes, maybe they can be modified so as to work around the problem? In the case of the comment #5 example the regex could e.g. be changed a bit.
(In reply to Peter Stuge from comment #12) > (In reply to Richard Freeman from comment #7) > > Latest run is down to 5 affected files. Details attached. > > None of the affected files exist in portage anymore. Are there other files > now? > > If yes, maybe they can be modified so as to work around the problem? In the > case of the comment #5 example the regex could e.g. be changed a bit. For the most part these issues are caused by a failure to correctly commit files, but in defense of those doing the commits cvs makes it really easy to mess up stuff like this. Generally they get corrected, but the mistakes are permanently present in the cvs history unless we go manipulating the rcs files on the server directly. Robin - what are your feelings on this? My personal feeling is that little glitches like this in ancient cvs history are not worth trying to script around - I'd be concerned that changing the migration script would cause more problems than it solves at this point. I think the script itself is basically ready for production use now, though pre-migration we should do a few rehearsals to make sure we have the hang of things. The infra back-end is the bigger issue at this point, and I do intend to work on that documentation (maybe I'll do it on the wiki to make collaboration easier - we can always port it over to the website if we want it there later).
We won't be modifying the RCS files, but the last run is good enough. I'm doing a large project at work presently (big upgrade to go live July 1st), and after that I'm putting time to this (also why I'm not running for Trustees again). The hooks & merge/sign policy need active discussion more than the infra backend.
(In reply to Robin Johnson from comment #14) > The hooks & merge/sign policy need active discussion more than the infra > backend. Ok, I'll take your word on that. We can move comments on these to their respective bugs. I'm going to be brave and mark this one resolved - re-open if you disagree.