Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 503708 - git-r3: 'single branch' fetching from Google Code seems terribly inefficient
Summary: git-r3: 'single branch' fetching from Google Code seems terribly inefficient
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Eclasses (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Michał Górny
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-07 10:39 UTC by Michał Górny
Modified: 2014-03-24 21:37 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-03-07 10:39:04 UTC
(CC-ing git maintainers in case you could help)

It seems that 'git fetch' on a single branch from Google Code (dumb http transport?) causes the same objects to be downloaded twice -- once for the branch, and then again with tags.

Easy way to reproduce:

  cd $(mktemp -d)
  git init --bare
  git fetch https://code.google.com/p/pkgcore/ \
    refs/heads/master:refs/heads/master

Supposedly this would fetch all the commits in the branch, and then the few extra objects corresponding to tags in the branch. But in fact, it seems to re-fetch all the commits along with them...

I'm hitting this with git-1.9.0. 'git gc --prune' reduces 'objects' dir size to half, so I guess the packs contain a lot of duplicated commits. Any suggestions here?

--

remote: Counting objects: 34648, done.
Receiving objects: 100% (34648/34648), 6.01 MiB | 559.00 KiB/s, done.
Resolving deltas: 100% (27148/27148), done.
From https://code.google.com/p/pkgcore
 * [new branch]      master     -> master
remote: Counting objects: 33061, done.
Receiving objects: 100% (33061/33061), 5.51 MiB | 637.00 KiB/s, done.
Resolving deltas: 100% (25924/25924), done.
 * [new tag]         v0.3       -> v0.3
 * [new tag]         v0.3.1     -> v0.3.1
 * [new tag]         v0.3.2     -> v0.3.2
[...]
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-03-07 22:29:16 UTC
After some wireshark fun, it seems that:

1) git requests the branch along with relevant tags (via option),

2) server returns the branch without tags,

3) git matches commit tree w/ remote tags and requests appropriate tags (stating that it has all the commits already),

4) server returns the tags along with all commits.

So pretty much it looks like a really poor server implementation where easy-cacheable packs are more important than users downloading the same thing twice.

Unless Google's going to fix their server, a possible workaround is to request all tags -- if we fetch all of them in a single request, we avoid the second request and redundant download.

So, I see two possible solutions:

1) use EGIT_MIN_CLONE_TYPE=mirror -- which shouldn't really hurt the few affected projects,

2) introduce a new EGIT_CLONE_TYPE that would download the branch and *all* tags -- possibly a better/more general solution.

Any thoughts?
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-03-24 21:37:22 UTC
+  24 Mar 2014; Michał Górny <mgorny@gentoo.org> git-r3.eclass:
+  Add a single+tags mode to handle Google Code more efficiently, bug #503708.

+  24 Mar 2014; Michał Górny <mgorny@gentoo.org> snakeoil-9999.ebuild:
+  Switch to single+tags in order to handle clones more efficiently, bug #503708.

+  24 Mar 2014; Michał Górny <mgorny@gentoo.org> pkgcore-9999.ebuild:
+  Switch to single+tags in order to handle clones more efficiently, bug #503708.