https://projects.gentoo.org/vcs-history/excluded_files.txt appears to list all files containing the word "birthday" which causes many false positives. This includes files like: gentoo/xml/htdocs/dtd/guide.dtd,v gentoo/xml/htdocs/xsl/guide.xsl,v which have a "birthday" attribute as part of the Guide XML DTD, some patches which seem to be unlikely to contain any personal information: gentoo/src/patchsets/glibc/2.5/1505_hppa_cvs-head-20061203.patch,v gentoo/src/patchsets/glibc/2.5.1/1505_hppa_cvs-head-20061203.patch,v and some Council IRC logs which are publicly available elsewhere: gentoo/xml/htdocs/proj/en/council/meeting-logs/Attic/20110201.txt,v gentoo/xml/htdocs/proj/en/council/meeting-logs/Attic/20061109.txt,v gentoo/xml/htdocs/proj/en/council/meeting-logs/Attic/20090817.txt,v gentoo/xml/htdocs/proj/en/council/meeting-logs/Attic/20080410.txt,v There are also files like: gentoo/src/fortune-gentoo-dev/fortunes/00020-20030809_00020,v which is publicly available as part of https://dev.gentoo.org/~robbat2/distfiles/fortune-gentoo-dev-20090306.tar.bz2 and is certainly a false positive :) [discussing the pronounciation of gif] <seemant> I say it with soft g <seemant> coz like I envision this exchange: <seemant> "hey, <so-and-so> I have a gif for you" <-- hard g <seemant> where <so-and-so> will reply "ooh for me? you shouldn't have! it's not even my birthday" <seemant> and then you're in an embarrassing situtation Especially guide.dtd and guide.xsl are essential if one wants to recreate any of the old documentation (and that's how I noticed the issue, in the first place). So please reconsider whether some of the files in the list could be distributed. Generally I think that these archives should be as faithful to the CVS repositories as possible and therefore only exclude such files where it is absulutely necessary.
As I discussed on IRC The current function of "whether it is absolutely necessary" is currently: for file in archive: if file.contains('birthday'): archive.remove(file) I'm happy to use a more accurate method; but I felt it better to produce a redacted archive quickly than a perfect archive never. We can always add files back into the archives; its hard to unpublish files. -A
Created attachment 680227 [details] Review of files in gentoo-projects CVS repository
Created attachment 680230 [details] Review of files in gentoo CVS repository
Created attachment 680233 [details, diff] Patch for excluded_files.txt
I've just updated the repository according to the reviewed list. Thanks for taking the time to review the list more closely.
(In reply to Max Magorsch from comment #5) > I've just updated the repository according to the reviewed list. Looks good.
I've noticed that the CVSROOT repository is missing. Presumably it won't add much value, but should be added for completeness? It contains the file CVSROOT/history which (IIUC) is a log of all reads and writes to any repo, so I believe it should better be excluded. Apart from that file, I don't see any personal data.
Created attachment 761706 [details, diff] CVSROOT.tar.gz: Add file Attached patch adds CVSROOT.tar.gz, with CVSROOT/history excluded.
antarus@marine-bay:~/gentoo/vcs-history$ patch --binary -p1 < patch File CVSROOT.tar.gz: git binary diffs are not supported. I think you need to remake the patch with git diff --binary (to generate the right diff format?) -A
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/sites/projects/vcs-history.git/commit/?id=fe3a5ddd85eacb7795e340d24468b62f3dedb482 commit fe3a5ddd85eacb7795e340d24468b62f3dedb482 Author: Ulrich Müller <ulm@gentoo.org> AuthorDate: 2022-01-09 13:02:25 +0000 Commit: Alec Warner <antarus@gentoo.org> CommitDate: 2022-01-25 17:27:00 +0000 CVSROOT.tar.gz: Add file The file CVSROOT/history is a log of all reads and writes to any repo and has been excluded from the archive. Bug: https://bugs.gentoo.org/762403#c7 Signed-off-by: Ulrich Müller <ulm@gentoo.org> Signed-off-by: Alec Warner <antarus@gentoo.org> CVSROOT.tar.gz | Bin 0 -> 165234 bytes checksums.b2 | 1 + excluded_files.txt | 9 ++++++++- 3 files changed, 9 insertions(+), 1 deletion(-)
AFAICS all done. Closing.