Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 333685 - dev-vcs/git: require jk/pack-bitmap functionality on servers for git migration
Summary: dev-vcs/git: require jk/pack-bitmap functionality on servers for git migration
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Git (show other bugs)
Hardware: All Linux
: Highest blocker with 1 vote (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 333531
  Show dependency tree
 
Reported: 2010-08-20 19:40 UTC by Thilo Bangert (RETIRED) (RETIRED)
Modified: 2014-08-06 04:52 UTC (History)
11 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thilo Bangert (RETIRED) (RETIRED) gentoo-dev 2010-08-20 19:40:59 UTC
for gentoo to use git for the main portage tree, git needs to support a pre-upload-hook.

latest status update:
http://archives.gentoo.org/gentoo-scm/msg_2ca6574cc4509ac831f50a33b737cdbf.xml
Comment 1 Johannes Huber (RETIRED) gentoo-dev 2011-12-12 10:59:56 UTC
Whats the state of this task?
Comment 2 Arun Raghavan (RETIRED) gentoo-dev 2011-12-14 03:19:44 UTC
No progress since last status update. Feel free to take this up if you're interested. Feel free to ping me on IRC if you've got questions.
Comment 3 Richard Freeman gentoo-dev 2012-10-01 17:18:56 UTC
(In reply to comment #0)
> for gentoo to use git for the main portage tree, git needs to support a
> pre-upload-hook.

It might not hurt to at least put a brief sentence about why this is needed in this bug.
Comment 4 Brian Harring (RETIRED) gentoo-dev 2012-10-02 04:08:56 UTC
this script needs to do gpg validation of incoming commits, mapping it back to the claimed dev's given keys- if the signage doesn't match, reject the commit keeping it out of history.

I've got an email I'm writing with the details; feel free to copy/paste over here (preferably fixing my english).
Comment 5 Arun Raghavan (RETIRED) gentoo-dev 2012-10-02 04:31:28 UTC
(In reply to comment #4)
> this script needs to do gpg validation of incoming commits, mapping it back
> to the claimed dev's given keys- if the signage doesn't match, reject the
> commit keeping it out of history.

That's not what this bug was about. The objective of this hook was to prevent initial git clones (which would cause significant server load given the size of our history). The idea is that you pull a tarball snapshot of the repository over http, unpack, and use that.
Comment 6 Alex Xu (Hello71) 2013-06-02 17:09:03 UTC
Has anyone actually tested git clone to see how much resource(s?) it uses on the server?

Seems to me like it just pulls all the packs and metadata, does some integrity checks (?) and that's it. I could be wrong though.

Anyways, it seems implausible that our Git log could be larger than the largest projects using Git (i.e. Linux) by so much that it is necessary to institute such an odd process for obtaining it. (referring to tarballs)

http://gitolite.com/other-stuff/server-sizing.html

    ^^Further details on this, courtesy charon on #git:

    Even when computing deltas to send, git reuses packs. This means that objects that are already packed are not delta compressed again -- send-pack just uses the existing delta-compressed form. Anyway it doesn't really send the entire pack, only the parts that are relevant, so this works fine.

    As a result, cpu usage on the server is now mostly in the "counting objects" phase (i.e., figuring out what to send), which is not much of a load.^^

Of course, I could very well be missing the benchmarks that someone has put somewhere in these long chains of mails.
Comment 7 Brian Harring (RETIRED) gentoo-dev 2013-06-04 04:44:33 UTC
(In reply to Arun Raghavan from comment #5)
> (In reply to comment #4)
> > this script needs to do gpg validation of incoming commits, mapping it back
> > to the claimed dev's given keys- if the signage doesn't match, reject the
> > commit keeping it out of history.
> 
> That's not what this bug was about. The objective of this hook was to
> prevent initial git clones (which would cause significant server load given
> the size of our history). The idea is that you pull a tarball snapshot of
> the repository over http, unpack, and use that.

Yes, this is what this bug is about.  Gpg validation of incoming commits is to verify that the ssh key + gpg validation map to the same dev, so that I can't push a signed commit from you (making it appear as if you broke the tree, rather than me).

The shit about load is ancillary to what this bug is actually about; the point of this ticket is exactly what I described there; talk to dolsen if in doubt, he was last working on these bits in conjunction to me tracking it.
Comment 8 Arun Raghavan (RETIRED) gentoo-dev 2013-06-04 04:53:40 UTC
(In reply to Brian Harring from comment #7)
[...]
> Yes, this is what this bug is about.  Gpg validation of incoming commits is
> to verify that the ssh key + gpg validation map to the same dev, so that I
> can't push a signed commit from you (making it appear as if you broke the
> tree, rather than me).

The bug title says "pre-upload-hook", which deals with when a client does a git fetch. What you're talking about is completely unrelated.

The link in the first comment is still the last I looked at it: http://archives.gentoo.org/gentoo-scm/msg_2ca6574cc4509ac831f50a33b737cdbf.xml

Considering I haven't looked at this task in a very long time, I'm dropping myself as assignee.
Comment 9 Arun Raghavan (RETIRED) gentoo-dev 2013-06-04 05:05:05 UTC
Undoing assignment to infra (just saw that none of the other git migration bugs are assigned that way.

Brian, if you're still unconvinced, here's some more context for this bug: http://lists-archives.com/git/709462-removal-of-post-upload-hook.html
Comment 10 Jeroen Roovers (RETIRED) gentoo-dev 2013-06-04 23:28:17 UTC
Well, bug-wranglers cannot fix this either.
Comment 11 Peter Stuge 2013-06-20 12:27:38 UTC
(In reply to Brian Harring from comment #7)
> Yes, this is what this bug is about.  Gpg validation of incoming commits is
> to verify that the ssh key + gpg validation map to the same dev, so that I
> can't push a signed commit from you (making it appear as if you broke the
> tree, rather than me).

I really like to have real users and use a POSIX ACL instead of gitolite. I think that would make such checks a lot easier too. But I guess infra already considered this option and rejected it for a reason I don't know. (Please tell me if you know.)

Thanks!
Comment 12 Alex Xu (Hello71) 2013-06-20 16:13:18 UTC
(In reply to Peter Stuge from comment #11)
> (In reply to Brian Harring from comment #7)
> > Yes, this is what this bug is about.  Gpg validation of incoming commits is
> > to verify that the ssh key + gpg validation map to the same dev, so that I
> > can't push a signed commit from you (making it appear as if you broke the
> > tree, rather than me).
> 
> I really like to have real users and use a POSIX ACL instead of gitolite. I
> think that would make such checks a lot easier too. But I guess infra
> already considered this option and rejected it for a reason I don't know.
> (Please tell me if you know.)
> 
> Thanks!

No, it wouldn't. To do that, you'd need to have a daemon (or daemon-like) running on the server doing git-specific access checks which sshd and/or PAM just don't do. Such a software would be, for example, gitolite.
Comment 13 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2013-12-11 05:01:02 UTC
Are there specific requirements for this bug? If so, they should be stated in the bug and not in a mailing list?  What hooks need writen?
Comment 14 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2014-01-29 05:43:49 UTC
so, need a spec to do the work......
Comment 15 Matthew Thode ( prometheanfire ) archtester Gentoo Infrastructure gentoo-dev Security 2014-01-29 05:50:04 UTC
given that we do not currently do this (gpg validation), should we mark this as a blocker or a wish?
Comment 16 Alex Xu (Hello71) 2014-02-21 22:34:10 UTC
I believe that this feature (forcing first-clones from HTTP/rsync) is not necessary.

If someone can produce benchmarks showing otherwise (i.e. fetching a pack over git protocol or smart HTTP is less efficient than dumb HTTP or manual HTTP), feel free to reopen.
Comment 17 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-21 23:45:46 UTC
Alex:
I'll reconfirm, but last I checked in 2012, this WAS still needed. The problem was with large packs, the initial clone ate up a lot of memory on the server side. It wasn't repacking, but building the new pack in memory to send (using the existing deltas).

promethanfire/ferringb:
i'm splitting out the gpg verification to another bug
Comment 18 Alex Xu (Hello71) 2014-02-22 00:22:01 UTC
(In reply to Robin Johnson from comment #17)
> Alex:
> I'll reconfirm, but last I checked in 2012, this WAS still needed. The
> problem was with large packs, the initial clone ate up a lot of memory on
> the server side. It wasn't repacking, but building the new pack in memory to
> send (using the existing deltas).
> 
> promethanfire/ferringb:
> i'm splitting out the gpg verification to another bug

git automatically builds and reuses packs if the repository is large enough:

$ git clone https://github.com/honestbleeps/Reddit-Enhancement-Suite.git
Cloning into 'Reddit-Enhancement-Suite'...
remote: Reusing existing pack: 8605, done.
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 8609 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (8609/8609), 7.15 MiB | 3.69 MiB/s, done.
Resolving deltas: 100% (5056/5056), done.
Checking connectivity... done.

Note that I said *benchmarks*. :)
Comment 19 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-22 00:31:53 UTC
(In reply to Alex Xu (Hello71) from comment #18)
> git automatically builds and reuses packs if the repository is large enough:
> 
> $ git clone https://github.com/honestbleeps/Reddit-Enhancement-Suite.git
> Cloning into 'Reddit-Enhancement-Suite'...
> remote: Reusing existing pack: 8605, done.
> remote: Counting objects: 4, done.
> remote: Compressing objects: 100% (4/4), done.
> remote: Total 8609 (delta 0), reused 0 (delta 0)
> Receiving objects: 100% (8609/8609), 7.15 MiB | 3.69 MiB/s, done.
> Resolving deltas: 100% (5056/5056), done.
> Checking connectivity... done.
> 
> Note that I said *benchmarks*. :)

I don't get that message of "Reusing existing pack:" at all.

Client side:
root@vm1:/tmp#  git  clone --bare git://git.overlays.gentoo.org:19418/exp/gentoo-x86.git /tmp/gentoo-x86
Cloning into bare repository '/tmp/gentoo-x86'...
remote: Counting objects: 6117919, done.
remote: Compressing objects: 100% (1716865/1716865), done.
Receiving objects:   0% (49032/6117919), 9.86 MiB | 85 KiB/s     

Server side, that single git process is presently using 3.4G VSZ, 2.74G RSZ, 1.2G SHR, and burned took 2.5 minutes of cputime to prepare the pack.
Comment 20 Alex Xu (Hello71) 2014-02-22 01:19:03 UTC
I lied, apparently. Git*hub* reuses existing packs, presumably for performance reasons.

some lines omitted due to size

[19:37:51] <Hello71> 19:22:54 < Hello71> under what conditions will git be "reusing existing pack" on clone?
[19:37:56] <Hello71> it's relevant to https://bugs.gentoo.org/show_bug.cgi?id=333685
[19:38:10] <Hello71> i.e. git uses too much resources on initial clone
[19:39:19] <Hello71> that is on the remote end
[19:39:30] <jrnieder> Hello71: that's a patch github uses
[19:39:46] <Hello71> ahh.
[19:40:08] <Hello71> I found it odd as to why the string wasn't in my source.
[19:40:29] <jrnieder> Hello71: the technique came up when we were talking about bitmaps on-list
[19:40:43] <jrnieder> Hello71: jgit has support for bitmaps and the streaming-pack thing
[19:41:12] <jrnieder> Hello71: looks like they got the streaming-pack thing working on standard git (yay!)
[19:43:31] <Hello71> so how could I apply this to my (our) situation?
[19:43:43] <jrnieder> wait for the patches to be upstreamed?
[19:44:08] <Hello71> I thought that they were.
[19:44:17] <jrnieder> sorry, s/on/on top of/ above
[19:44:54] <Hello71> Not wishing to read all the git mail for the past five months or so...
[19:45:10] <Hello71> Would that be in 1.9.0_rc3 or -9999?
[19:45:29] <jrnieder> nope, at least I never saw patches for this on-list
[19:45:39] <jrnieder> I assume they're just testing it first
[19:46:07] <jrnieder> the jgit patch is https://git.eclipse.org/r/2388
[19:49:00] <jrnieder> Hello71: git://github.com/peff/git.git branch jk/bitmaps has that string
[19:51:03] <jrnieder> Hello71: ah, that version of the patch is in 'pu', too
[19:51:28] <jrnieder> Hello71: 'next', even
[20:01:58] <Hello71> jrnieder: I don't know any of those repos; are they likely to be merged upstream/have they been?
[20:03:16] <jrnieder> Hello71: yeah, I confused myself before
[20:03:34] <jrnieder> Hello71: the patch you're interested in is in Junio's "next" branch
[20:04:04] <jrnieder> Hello71: probably it will be part of the next feature release, meaning 1.10.0 or 2.0.0 or whatever it's called
[20:05:05] <jrnieder> Hello71: depending on your point of view, it already has been merged upstream.  That's what I confused myself about before (I remembered that the bitmap series had been applied but had forgotten that it included the cached pack trick.)
[20:06:21] <Hello71> so bitmap is in 1.9.0_rc3 or 9999 (i.e. master)?
[20:06:29] <Hello71> since it's definitely not in 1.8.5.4
[20:06:30] <jrnieder> there's no such thing as 9999
[20:06:34] <jrnieder> it's not in master
[20:06:46] <Hello71> I refer to upstream's master.
[20:06:54] <jrnieder> yes, it's not in upstream's master
[20:07:27] <jrnieder> do you have a link to the ebuild?
[20:10:25] <Hello71> http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/dev-vcs/git/git-9999.ebuild
[20:10:48] <Hello71> that's the -9999 of git-9999, so to speak
[20:11:03] <jrnieder> thanks. looking at git.eclass now
[20:11:19] <Hello71> EGIT_REPO_URI="git://git.kernel.org/pub/scm/git/git.git"
[20:11:41] <jrnieder> Hello71: probably adding EGIT_MASTER=next would get it working
[20:12:26] <Hello71> approximately how long until release (assuming regular release cycle)?
[20:12:34] <Hello71> like, weeks, months, years...
[20:12:36] <jrnieder> Hello71: there's a calendar --- let me see
[20:12:51] <jrnieder> http://tinyurl.com/gitcal
[20:13:30] <jrnieder> the calendar seems to only cover the past so far :)
[20:13:44] <jrnieder> looks like 1-2 months until the next release, if the past is any guide
[20:13:54] <Hello71> that seems suspiciously like a *past* calendar.
[20:14:02] <Hello71> OK, I just wanted a timeline.
[20:14:15] <Hello71> mind if I post this back-and-forth to b.g.o?
[20:14:27] <jrnieder> sure, go ahead
[20:15:10] <jrnieder> Debian experimental is tracking 'next' fwiw
[20:15:14] <jrnieder> so there's precedent there
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-22 01:51:49 UTC
(In reply to Alex Xu (Hello71) from comment #20)
> I lied, apparently. Git*hub* reuses existing packs, presumably for
> performance reasons.
> 
> some lines omitted due to size
http://thread.gmane.org/gmane.comp.version-control.git/242412

* jk/pack-bitmap (2014-02-12) 26 commits
Borrows the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
Will cook in 'next'.

I'll see about including an ebuild for -next, so we can test performance of the patch, and maybe this bug can become really unneeded then.
Comment 22 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-22 22:20:57 UTC
I tested the jk/pack-bitmap functionality now, and it will obsolete the need for the pre-upload hook entirely, when it's finally available in git.

The only slight downside is that the bitmap index adds another 200MB on top of the 1.3GB packsize, but I think that is acceptable given the incredible performance boost.

If anybody else wants to use it, run git-9999-r2 on your git server side (specifically you're after the feature in 'jk/pack-bitmap', part of the 'next' branch), and add a bitmap to your packs.
root@bohr-int:/var/git/gentoo-x86.git # time git repack -A --window 250  -b 
Counting objects: 6492587, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1819935/1819935), done.
Writing objects: 100% (6492587/6492587), done.
Selecting bitmap commits: 708135, done.
Building bitmaps: 100% (446/446), done.
Total 6492587 (delta 4653421), reused 6492587 (delta 4653421)

real	2m49.142s
user	2m46.988s
sys	0m6.106s

My clone starts almost immediately, compared to before where we had 2.5 minutes while the server churned.

Server: almost no cputime usage on the server, memory usage 1.3G VSZ, 150MB RSS, 37MB SHR; wallclock time of ~50 seconds.

Client side:
$ /usr/bin/time git clone --bare git://172.16.9.8:9418/gentoo-x86.git /tmp/gentoo-x86
Cloning into bare repository '/tmp/gentoo-x86'...
remote: Reusing existing pack: 6492587, done.
remote: Total 6492587 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6492587/6492587), 1.28 GiB | 31.10 MiB/s, done.
Resolving deltas: 100% (4653421/4653421), done.
Checking connectivity... done.
304.35user 89.33system 3:15.00elapsed 201%CPU (0avgtext+0avgdata 3790208maxresident)k
0inputs+3347920outputs (0major+601467minor)pagefaults 0swaps
Comment 23 Brian Harring (RETIRED) gentoo-dev 2014-02-23 07:43:46 UTC
(In reply to Robin Johnson from comment #22)
> I tested the jk/pack-bitmap functionality now, and it will obsolete the need
> for the pre-upload hook entirely, when it's finally available in git.

Just adding an ack here; requiring pack-bitmap makes perfect sense to me- this feature of jgit is on of the core reasons gerrit is a freaking beast in handling load.

One question here; you're stating "on servers"- not "on the git server".  Afaik, the plan was dev's having a git access, and a git tier for non devs would be resolved as a later step.  Has that plan changed?  The plural 'servers' is what caught my eye here- I could just be misreading however.

Meanwhile, I'd suggest landing the bitmap bits on anongit ;)
Comment 24 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-02-23 08:01:33 UTC
(In reply to Brian Harring from comment #23)
> One question here; you're stating "on servers"- not "on the git server". 
> Afaik, the plan was dev's having a git access, and a git tier for non devs
> would be resolved as a later step.  Has that plan changed?  The plural
> 'servers' is what caught my eye here- I could just be misreading however.
anongit.g.o
git.overlays.g.o
git.g.o
Plus there was talk about both US&EU-local mirrors for dev pulls at one point.

> Meanwhile, I'd suggest landing the bitmap bits on anongit ;)
I was hoping to wait for it to land in a released version.
Comment 25 Alex Xu (Hello71) 2014-06-04 16:11:49 UTC
git 2.0 has pack-bitmap apparently :)
Comment 26 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2014-08-06 04:52:37 UTC
Released upstream now; to be deployed soon