(CC-ing git maintainers in case you could help) It seems that 'git fetch' on a single branch from Google Code (dumb http transport?) causes the same objects to be downloaded twice -- once for the branch, and then again with tags. Easy way to reproduce: cd $(mktemp -d) git init --bare git fetch https://code.google.com/p/pkgcore/ \ refs/heads/master:refs/heads/master Supposedly this would fetch all the commits in the branch, and then the few extra objects corresponding to tags in the branch. But in fact, it seems to re-fetch all the commits along with them... I'm hitting this with git-1.9.0. 'git gc --prune' reduces 'objects' dir size to half, so I guess the packs contain a lot of duplicated commits. Any suggestions here? -- remote: Counting objects: 34648, done. Receiving objects: 100% (34648/34648), 6.01 MiB | 559.00 KiB/s, done. Resolving deltas: 100% (27148/27148), done. From https://code.google.com/p/pkgcore * [new branch] master -> master remote: Counting objects: 33061, done. Receiving objects: 100% (33061/33061), 5.51 MiB | 637.00 KiB/s, done. Resolving deltas: 100% (25924/25924), done. * [new tag] v0.3 -> v0.3 * [new tag] v0.3.1 -> v0.3.1 * [new tag] v0.3.2 -> v0.3.2 [...]
After some wireshark fun, it seems that: 1) git requests the branch along with relevant tags (via option), 2) server returns the branch without tags, 3) git matches commit tree w/ remote tags and requests appropriate tags (stating that it has all the commits already), 4) server returns the tags along with all commits. So pretty much it looks like a really poor server implementation where easy-cacheable packs are more important than users downloading the same thing twice. Unless Google's going to fix their server, a possible workaround is to request all tags -- if we fetch all of them in a single request, we avoid the second request and redundant download. So, I see two possible solutions: 1) use EGIT_MIN_CLONE_TYPE=mirror -- which shouldn't really hurt the few affected projects, 2) introduce a new EGIT_CLONE_TYPE that would download the branch and *all* tags -- possibly a better/more general solution. Any thoughts?
+ 24 Mar 2014; Michał Górny <mgorny@gentoo.org> git-r3.eclass: + Add a single+tags mode to handle Google Code more efficiently, bug #503708. + 24 Mar 2014; Michał Górny <mgorny@gentoo.org> snakeoil-9999.ebuild: + Switch to single+tags in order to handle clones more efficiently, bug #503708. + 24 Mar 2014; Michał Górny <mgorny@gentoo.org> pkgcore-9999.ebuild: + Switch to single+tags in order to handle clones more efficiently, bug #503708.