Similar to bug 500358 for layman.... Initially the sync-type git is creating a shallow clone with clone --depth=1, however the following sync to not. --depth=1 for both would be appropriate for the tree. Reproducible: Always
This is fixed since 2.2.16: https://gitweb.gentoo.org/proj/portage.git/commit/?id=037527cdea703840f4108b7a909ba2775a19a756
I'm on sys-apps/portage-2.2.20: Doing the first clone I have 1 history item. Later after emerge --sync I have several other history items. Doing a git pull --depth=1 again on $PORTDIR I have again 1 single history item. So I wonder if emerge --sync does use --depth=1. The output says: # emerge --sync >>> Syncing repository 'gentoo' into '/usr/portage'... /usr/bin/git pull Already up-to-date.
It seems that it only passes the git --depth option for clone operations. It needs to be fixed to do it for pull operations too.
Please note the problems with "git pull --depth 1" mentioned at https://bugs.gentoo.org/show_bug.cgi?id=568890#c1 I am happy if we close either bug as a duplicate.
*** Bug 568890 has been marked as a duplicate of this bug. ***
As bug 568890 has been marked as a duplicate, moving discussion here... https://bugs.gentoo.org/show_bug.cgi?id=568890#c1 Zac, what you noticed on that comment, I almost every time encounter on layman: https://bugs.gentoo.org/show_bug.cgi?id=500358#c3 As of today both portage and layman are syncing by git, so what is the way to do the sync/pull, to get both finally fixed? Syncing portage I have not seen this issue yet.
*** This bug has been marked as a duplicate of bug 500358 ***
Reverting back on that, my apologies.
(In reply to Massimo Burcheri from comment #6) > As of today both portage and layman are syncing by git, so what is the way > to do the sync/pull, to get both finally fixed? I think it's reasonable to have them do `git fetch --depth=1` followed by `git reset --hard origin/master`, as discussed in bug 568890. > Syncing portage I have not seen this issue yet. That's because portage doesn't try to do a pull with --depth=1, because the default merge strategy will inevitably fail. We can make it do the `git fetch --depth=1 && git reset --hard origin/master` thing as soon as we merge the patch from bug 568934.
The patch for bug 568934 is in the master branch now.
Patch posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/a0ba81a01e82348c210784686bcca855 https://github.com/gentoo/portage/pull/34
Are you sure that this works as expected? Back when I fiddled with those things, 'git fetch --depth=1' meant *refetching all the objects* as if you were doing 'git clone --depth=1'. Including those objects you already have. In other words, much, much more than regular 'git fetch' that only fetched new objects.
Just now, I've tested fetch with and without --depth=1, on a repository with full history. I used a btrfs snapshot to ensure that the initial state was identical for both fetches: Test #1: $ git fetch origin --depth=1 remote: Counting objects: 492, done. remote: Compressing objects: 100% (110/110), done. remote: Total 492 (delta 436), reused 423 (delta 381) Receiving objects: 100% (492/492), 158.36 KiB | 0 bytes/s, done. Resolving deltas: 100% (436/436), completed with 426 local objects. From git+ssh://git.gentoo.org/repo/gentoo + 415df80...0004f2c master -> origin/master (forced update) Test #2 (initial state identical to test #1): $ git fetch origin remote: Counting objects: 595, done. remote: Compressing objects: 100% (206/206), done. remote: Total 595 (delta 505), reused 434 (delta 388) Receiving objects: 100% (595/595), 193.06 KiB | 0 bytes/s, done. Resolving deltas: 100% (505/505), completed with 427 local objects. From git+ssh://git.gentoo.org/repo/gentoo 415df80..0004f2c master -> origin/master So, apparently --depth=1 reduced the total number of objects, but objects were still reused. $ git --version git version 2.7.4
Could you also test https:// and git:// while you're at it? It's quite possible they fixed it in a newer release. It's been a while since I've been working on git-2.
Same results with https:// and git:// protocols from anongit.gentoo.org: $ git fetch origin --depth=1 remote: Counting objects: 497, done. remote: Compressing objects: 100% (115/115), done. remote: Total 497 (delta 438), reused 423 (delta 381) Receiving objects: 100% (497/497), 162.98 KiB | 0 bytes/s, done. Resolving deltas: 100% (438/438), completed with 427 local objects. From https://anongit.gentoo.org/git/repo/gentoo + 415df80...40e4e30 master -> origin/master (forced update) $ git fetch origin remote: Counting objects: 611, done. remote: Compressing objects: 100% (222/222), done. remote: Total 611 (delta 515), reused 434 (delta 388) Receiving objects: 100% (611/611), 200.12 KiB | 0 bytes/s, done. Resolving deltas: 100% (515/515), completed with 428 local objects. From https://anongit.gentoo.org/git/repo/gentoo 415df80..40e4e30 master -> origin/master $ git fetch origin --depth=1 remote: Counting objects: 497, done. remote: Compressing objects: 100% (115/115), done. remote: Total 497 (delta 438), reused 423 (delta 381) Receiving objects: 100% (497/497), 162.98 KiB | 165.00 KiB/s, done. Resolving deltas: 100% (438/438), completed with 427 local objects. From git://anongit.gentoo.org/repo/gentoo + 415df80...40e4e30 master -> origin/master (forced update) $ git fetch origin remote: Counting objects: 611, done. remote: Compressing objects: 100% (222/222), done. remote: Total 611 (delta 515), reused 434 (delta 388) Receiving objects: 100% (611/611), 200.12 KiB | 190.00 KiB/s, done. Resolving deltas: 100% (515/515), completed with 428 local objects. From git://anongit.gentoo.org/repo/gentoo 415df80..40e4e30 master -> origin/master $ git --version git version 2.7.4
This is in the master branch: https://gitweb.gentoo.org/proj/portage.git/commit/?id=84413bb1dd9df322568ce25efc5b7854a43d03c7
Please revert this. You've caused a major performance regression and making a mess of the underlying file system. Normal 'git pull' used to check out only differening files. Now you're forcing re-checkout of every single file on every sync.
Not to mention it bumps mtimes of all files and causes all updates to be reapplied after every sync.
(In reply to Michał Górny from comment #17) > Please revert this. Maybe we should, but let's document the reasons well. > You've caused a major performance regression and making > a mess of the underlying file system. What version of git are you using? Not sure what you mean by "mess" here. What type of file system are you using? Is it re-writing all of the files? Do the inode numbers change in the process? How about timestamps? > Normal 'git pull' used to check out > only differening files. Now you're forcing re-checkout of every single file > on every sync. I have observed cases where `git reset --hard` leaves the existing inode in place if it has not changed. We should document what causes which version of git to re-checkout the file (and with what file system if that matters). (In reply to Michał Górny from comment #18) > Not to mention it bumps mtimes of all files and causes all updates to be > reapplied after every sync. Okay, I suppose we could store md5 instead of mtime in the /var/cache/edb/mtimedb.
(In reply to Zac Medico from comment #19) > (In reply to Michał Górny from comment #17) > > Please revert this. > > Maybe we should, but let's document the reasons well. > > > You've caused a major performance regression and making > > a mess of the underlying file system. > > What version of git are you using? Not sure what you mean by "mess" here. 2.9.1 > What type of file system are you using? Is it re-writing all of the files? > Do the inode numbers change in the process? How about timestamps? fuse.unionfs. Yes, it is re-writing all the files which is causing them to re-appear on the top filesystem. Which means it's terribly slow, and even device numbers change, and I have to repack them back into squashfs. if you really insist on doing something insane, use at least 'git checkout' and not go straight for 'git reset --hard'. Maybe that would work saner.
(In reply to Michał Górny from comment #20) > fuse.unionfs. Yes, it is re-writing all the files which is causing them to > re-appear on the top filesystem. Which means it's terribly slow, and even > device numbers change, and I have to repack them back into squashfs. Okay, I figured that you must be using something exotic like that. We can accommodate fuse.unionfs, but we need to recognize that fuse.unionfs users are probably a minority and that they may need to deviate from the default sync settings if there's difficulty getting it to behave well. > if you really insist on doing something insane, use at least 'git checkout' > and not go straight for 'git reset --hard'. Maybe that would work saner. That's worth a try. I'll look into creating a patch for that.
(In reply to Zac Medico from comment #21) > > if you really insist on doing something insane, use at least 'git checkout' > > and not go straight for 'git reset --hard'. Maybe that would work saner. > > That's worth a try. I'll look into creating a patch for that. I've tested using `git reset --soft origin/master && git checkout f` with unionfs-fuse-1.0 and git-2.7.4, but all of the files are still being re-written in the write layer. We might want to try overlayfs to see if git behaves any better with it. Anyway, for now, I think we should advise unionfs-fuse users to set sync-depth = 0 in repos.conf.
(In reply to Zac Medico from comment #22) > (In reply to Zac Medico from comment #21) > > > if you really insist on doing something insane, use at least 'git checkout' > > > and not go straight for 'git reset --hard'. Maybe that would work saner. > > > > That's worth a try. I'll look into creating a patch for that. > > I've tested using `git reset --soft origin/master && git checkout f` with > unionfs-fuse-1.0 and git-2.7.4, but all of the files are still being > re-written in the write layer. Actually, only the changed files are being re-written, but I get the same result with `git reset --hard origin/master`. So I'm getting good results either way. This is with btrfs for the RO layer and tmpfs for the RW layer. I wonder if your problem is a mismatch timestamp resolution between your RO layer and RW layer, since squashfs only has 1s timestamp resolution.
For git reset --hard, I think that it only re-writes files when they differ from it entry in the index. There's some good explanations about how files are compared with the index here: https://github.com/git/git/blob/master/Documentation/technical/racy-git.txt Note the file will be recognized as "different" if its inode number is different from that recorded in the index, and obviously something like unionfs-fuse can easily make your inode numbers differ from those in the index. It's possible that running `git reset-index` prior to `git reset --hard` will mitigate the issues with unionfs-fuse. I'll create a patch for you to test.
Created attachment 440904 [details, diff] refresh index before git reset Hopefully this solves the unionfs-fuse issues.
(In reply to Zac Medico from comment #25) > Created attachment 440904 [details, diff] [details, diff] > refresh index before git reset > > Hopefully this solves the unionfs-fuse issues. Maybe we should change --refresh to --really-refresh.
Created attachment 440906 [details, diff] refresh index before git reset Updated to use the --really-refresh option.
Yes, add more hackery to the wrong concept to hopefully make it less wrong, and advise users not to use the defaults. That's the very Portage way. So what's the problem with just checking out, *without* any resetting? Since no resetting should be necessary if you don't touch the checkout.
(In reply to Michał Górny from comment #28) > Yes, add more hackery to the wrong concept to hopefully make it less wrong, > and advise users not to use the defaults. That's the very Portage way. > > So what's the problem with just checking out, *without* any resetting? Since > no resetting should be necessary if you don't touch the checkout. If you don't like my patch, please provide your own.
In speaking with Michał Górny on irc, he made it clear that he would prefer that sync-depth setting only affect the initial clone operation. If that's how he want it, then I suggest that we rename it to clone-depth.
No, you didn't get my point at all. I don't mind --depth=1 as long as it is done properly, i.e. without resorting to ugly hacks that break more than fix. Because this whole bug is in essence the typical result of Gentoo hackery. 'I wanted to do X but it doesn't seem to work -- so I'll try to run some random git commands until I get a result that seems to make X work, then I'll commit them without even caring to understand what they do a few hours later without even testing them more widely'. What you caused is a major change in behavior that goes completely unexpected to our users: 1. All local commits are discarded without warning (previously the default was to attempt a merge), 2. All local non-committed changes are discarded without warning (previously git either did not care or complained that they collide with the update), 3. There is a major performance loss. Previously, git had to only care about changed files. Now it has to recheck all files in the repository, and -- as proven -- in some cases rewrites them all. 4. This is a minor problem but the repository is now full of dangling objects and the history is so broken you can't do pretty much anything with it without unshallowing the repository (i.e. fetching the few hundred megs of remaining data). Previously, you could cleanly work with the commits since first shallow clone. And yes, I will try to find a way to get this to work. But it will take some time since I need to undo all the damage caused by last sync.
Performance comparison, for two week period from 2016-07-03 20:24:02 UTC to 2016-07-17 10:43:03 UTC. - git pull (fast-forward): 1m6.876s ; 6087 files created in rw snapshot (counting only visible files); - git reset --hard origin/master 2m8.060s ; 140302 files created in rw snapshot (likewise); - git checkout origin/master 1m3.047s ; 6087 files (i.e. the same as with 'git pull'); - git pull --depth=1 --rebase causes unionfs-fuse to segv but supposedly slower since it needs to construct and apply a patch. So far I haven't found anything faster and more correct than 'git checkout origin/master; git branch -D master; git checkout -b master'.
(In reply to Michał Górny from comment #31) > Because this whole bug is in essence the typical result of Gentoo hackery. > 'I wanted to do X but it doesn't seem to work -- so I'll try to run some > random git commands until I get a result that seems to make X work, then > I'll commit them without even caring to understand what they do a few hours > later without even testing them more widely'. Sorry, I can't test every filesystem that exists in the world. It works fine for me on btrfs. How was I supposed to know that it wouldn't work well on every filesystem? > What you caused is a major change in behavior that goes completely > unexpected to our users: > > 1. All local commits are discarded without warning (previously the default > was to attempt a merge), > > 2. All local non-committed changes are discarded without warning (previously > git either did not care or complained that they collide with the update), Well, I think it's a pretty good assumption that sync-depth = 1 means that people expect essentially the same behavior as rsync, which means that local changes are simply discarded. (In reply to Michał Górny from comment #32) > So far I haven't found anything faster and more correct than 'git checkout > origin/master; git branch -D master; git checkout -b master'. I've tested that locally on btrfs, and it works for me. It test by running `stat skel.ebuild` before and after, and it keeps the same inode number.
I think we should go for 'git reset --merge origin/master'. I no longer have the issues I originally had with it, so they were probably due to a dirty copy. It has the behavior closest to 'git pull', including appropriate tree status checks.
Comment on attachment 440906 [details, diff] refresh index before git reset (In reply to Michał Górny from comment #34) > I think we should go for 'git reset --merge origin/master'. I no longer have > the issues I originally had with it, so they were probably due to a dirty > copy. It has the behavior closest to 'git pull', including appropriate tree > status checks. Thanks for looking into this. I've gone ahead and pushed your suggested change: https://gitweb.gentoo.org/proj/portage.git/commit/?id=55aef9bf297ef8cbf29921acb454449d01313818
Looks like this landed in the latest portage release. In case anybody else is startled by it, to get the old behavior back, add "sync-depth = 0" to your repos.conf file.
I had tried the git reset --merge origin/master strategy in portage-postsyncd-mv, but today it errored out on the https://anongit.gentoo.org/git/data/glsa.git repository, although I certainly never touched any file in it locally: error: Entry 'glsa-201610-02.xml' not uptodate. Cannot merge. It seems that the only reliable way to honour upstream rebasing (or whatever change it was which caused the above issue) seems to be git reset --hard origin/master I think it is quite reasonable to distinguish by an option whether the user wants to discard local changes or whether he wants to keep local changes as far as possible: In the first case, the right way is probably git reset --hard, because in the other case, no matter which strategy is chosen, some problems might arise. Note that even strategies like -Xtheirs have their issues, because they do not remove deleted files reliably. Other strategies are discussed in http://stackoverflow.com/questions/4911794/git-command-for-making-one-branch-like-another/4912267
(In reply to Martin Väth from comment #37) > I had tried the > > git reset --merge origin/master > > strategy in portage-postsyncd-mv, but today it errored out on the > https://anongit.gentoo.org/git/data/glsa.git > repository, although I certainly never touched any file in it locally: > > error: Entry 'glsa-201610-02.xml' not uptodate. Cannot merge. > > It seems that the only reliable way to honour upstream rebasing (or whatever > change it was which caused the above issue) seems to be It's your filesystem being unhappy with git. If you run 'git status' first to have git re-evaluate all the local files, it should be happy to proceed. I have to do this all the time because of overlayfs.
(In reply to Michał Górny from comment #38) > > It's your filesystem being unhappy with git. If you run 'git status' first > to have git re-evaluate all the local files, it should be happy to proceed. > I have to do this all the time because of overlayfs. The original filesystem was indeed overlayfs (+squashfs/ext4). Therefore, I still had a copy of the original repository. I have now retried (by archiving this repository and unpacking) on various filesystems: ext4, tmpfs, and again on overlayfs (squashfs with writable part on ext4): In all 3 cases, there was the same error, but "git status" indeed solved the issue - thanks for the hint. Perhaps this should also be done by portage? At least portage-2.3.2 does not seem to contain a "git status" call in the git module. Although this call takes about 5 seconds for the gentoo repository and writes about 12 MB (measured with overlayfs), I think not erroring out is more important than a few seconds more for syncing and 12 MB...
(In reply to Martin Väth from comment #39) > Therefore, I still had a copy of the original repository. I have now retried > (by archiving this repository and unpacking) on various filesystems: > ext4, tmpfs, and again on overlayfs (squashfs with writable part on ext4): > > In all 3 cases, there was the same error, but "git status" indeed solved the > issue - thanks for the hint. When you copy the files to a new filesystem, you need to run "git status" at least once in order to update the index ("git update-index --refresh" might be an alternative). > Perhaps this should also be done by portage? Unless there's a bug in git, it should not be needed under normal operating conditions (with overlayfs being an exceptional case). We can add a repos.conf setting to force the index to be refreshed.
It would be nice to actually talk this over with git upstream. However, I never got around to trying to figure this out in more detail, and I don't have the time to work on that right now.
The issue is really mysterious: > ("git update-index --refresh" might be an alternative). I wanted to try whether this or --really-refresh (as you had mentioned earlier) would be necessary, and found that all of a sudden nothing of these are needed: I can unpack the "broken" archive on every filesystem and I can just use fetch+reset --merge without error It seems that the only commit in the remote repository was the adding of an unrelated file (glsa-201610-10.xml)
(In reply to Zac Medico from comment #40) > git update-index --refresh Just for the record: According to the git sources, "git status" is doing git update-index --refresh -q --unmerged Both of these options are perhaps necessary to avoid problems in certain cases.
It may have something to do with filesystem cache. I have that problem every reboot, so I'll check that for you when I get home.
Plain 'git update-index --refresh' works fine for me.
This change has completely broken Portage handling git repos for me. Even when I specify "sync-depth = 0" in repos.conf files, Portage *still* checks out shallow clones, even if I already had a full copy of the repo. Which versions of Portage do I need to mask in order to avoid the new behaviour?
(In reply to Justus Ranvier from comment #46) > This change has completely broken Portage handling git repos for me. > > Even when I specify "sync-depth = 0" in repos.conf files, Portage *still* > checks out shallow clones, even if I already had a full copy of the repo. Please file a new bug and attach your repos.conf file. > Which versions of Portage do I need to mask in order to avoid the new > behaviour? It's portage-2.3.1 and later.
commit f77fcd6b0b4ebb49ca62f5767cd5c931127c3dbb Author: Michał Górny <mgorny@gentoo.org> Date: Sun Oct 30 20:14:11 2016 [sync] Run `git update-index --refresh` when doing shallow pulls Run `git update-index --refresh` to force proper index recheck before running `git reset --merge` on a shallow pull. This fixes syncing on some filesystem configurations including overlayfs on squashfs. Reviewed-by: Zac Medico <zmedico@gentoo.org>
(In reply to Michał Górny from comment #48) > commit f77fcd6b0b4ebb49ca62f5767cd5c931127c3dbb I work with overlays (science, sage-on-gentoo) that don't regularly update their profiles/use.local.desc files, but still these files are present in repos. I regenerate the files manually via a simple egencache wrapper from repo.postsync.d. Everything worked smoothly, but now portage refuses to sync these overlays because it detects local changes due to regenerated profiles/use.local.desc files. Can smth be done about this case, please? I believe my wish to have an up-to-date profiles/use.local.desc is sane and simple.
(In reply to Coacher from comment #49) > Can smth be done about this case, please? > I believe my wish to have an up-to-date profiles/use.local.desc is sane and > simple. Maybe it helps if you ask the overlay add /profiles/use.local.desc to .gitignore.
Can we get all of this git experiments back to portage-2.2.2x because it worked nicely. Users do not really care about whether its 'git pull' or 'git fetch' as long their trees synced. It;s not ok distribution wise to make such breaking changes without understanding what users need from "sync". Proper way to do this is to create separate branch like git-sync-revise. Ask developers and interested users to test this to death. Then merge to master. It is not tested enough to be in any of portage-2.3.x releases.
(In reply to Coacher from comment #49) > I regenerate the files manually via a simple egencache wrapper from > repo.postsync.d. Everything worked smoothly, but now portage refuses to sync > these overlays because it detects local changes due to regenerated > profiles/use.local.desc files. This issue is not related to this bug. Anyway, as Zac already suggested, the correct solution would be if upstream provides a matching .gitignore which contains profiles/use.local.desc and which should also contain metadata/md5-cache unless upstream provides an up-to-date cache, so that you can also update this cache with egencache (I recommend this if you use eix). For overlays in which upstream fails to do this, you can do: 1. Provide a .gitignore containing these entries as well as .gitignore. 2. If upstream provides a .gitignore, but without these entries, you can edit your local gitconfig (either in the corresponding .git or perhaps even system wide if it matches to all your overlays) to contain [core] excludesfile = /some/path/to/gitignore and create a corresponding gitignore file with the required entries.
(In reply to Oleg from comment #51) > Can we get all of this git experiments back to portage-2.2.2x because it > worked nicely. It didn't work, because users wanting a shallow clone would not get it. > Proper way to do this is to create separate branch like git-sync-revise. > Ask developers and interested users to test this to death. That's what the testing version of portage is for, isn't it? The "missing" git update-index was observed only recently, because apparently it only hits users with non-standard setups in certain exceptional situations, and thus its detection could easily be longer than any testing phase. But also this problem was fixed quickly soon after its detection. So IMHO, the upgrade policy was completely correct.
(In reply to Martin Väth from comment #52) > This issue is not related to this bug. It is. Before changes related to this bug were introduced, portage behaved nicely. > Anyway, as Zac already suggested, the correct solution would be if upstream > provides a matching .gitignore which contains profiles/use.local.desc ... Unless this solution becomes a policy and this policy is enforced, there will be overlays that serve outdated use.local.desc file without a matching entry in .gitignore. > For overlays in which upstream fails to do this, you can do: > ... Thank you for the suggestion, but this is an ugly workaround. Of course I'll forget to update system-wide git configuration when I add a new overlay. Also why portage should require extra configuration of a third-party tool to function normally?
(In reply to Coacher from comment #54) > (In reply to Martin Väth from comment #52) > > This issue is not related to this bug. > It is. Before changes related to this bug were introduced, portage behaved > nicely. I had this problem much earlier. It is very likely just accidental that the change of portage came with an outdated use.local.desc in your corresponding overlay. > Unless this solution becomes a policy and this policy is enforced You cannot enforce any policy on overlays. > will be overlays that serve outdated use.local.desc file without a matching > entry in .gitignore. If they have an entry in .gitignore they actually should not serve any use.local.desc at all. > Thank you for the suggestion, but this is an ugly workaround. I know, but I think it is the only thing which can be done about this issue. > Of course I'll forget to update system-wide git configuration > when I add a new overlay. If it is system-wide (in /etc/gitconfig) you don't have to update it. But this means that you will have to use egencache even for overlays which would provide a working use.local.desc or even an md5-cache. > Also why portage should require extra configuration of a third-party tool to > function normally? The problem is broken upstream of your overlay: Shipping an outdated use.local.desc is simply broken. Things become even more involved: If upstream uses a different VCS than git, the .gitignore trick cannot be used. For example, I don't know similar workarounds for bzr or mercurial overlays: If upstream does not include correct use.local.desc and md5-cache, you are probably lost...
(In reply to Martin Väth from comment #55) > I had this problem much earlier. It is very likely just accidental that the > change of portage came with an outdated use.local.desc in your corresponding > overlay. No. Outdated files were there since portage got repos.conf support, i.e. for months. I use plain ext4 though. > You cannot enforce any policy on overlays. Exactly my point. > If it is system-wide (in /etc/gitconfig) you don't have to update it. Say I have just added overlay foo in repos.conf. How my system-wide git config/gitignore file for overlays will know about it automatically? > The problem is broken upstream of your overlay: Shipping an outdated > use.local.desc is simply broken. I know it's the root cause. Still overlay maintainers prefer to serve an outdated use.local.desc instead of not serving it at all and I cannot do anything about it. They even acknowledge it's outdated and manually update it from time to time.
(In reply to Coacher from comment #56) > No. Outdated files were there since portage got repos.conf support, i.e. for > months. "git pull" (which portage used before) will always complain about locally modified files. Something else must have changed if the file was really locally modified and git had not complained. Perhaps the distributed .gitignore had been changed... > I use plain ext4 though I am very sure that this is unrelated here. > Say I have just added overlay foo in repos.conf. > How my system-wide git config/gitignore file for overlays will know about it > automatically? It is system wide, i.e. it will apply to _every_ repository (of every of user). The problem is more the opposite: It will then apply even to repositories which are not related to gentoo at all... > overlay maintainers prefer to serve an outdated use.local.desc [...] > and I cannot do anything about it. Broken upstream is not anything which can be solved technically. It is the same if any other upstream serves some broken git repository which needs some local patching: You have to apply the patches manually or write some tricky code to apply them automatically. But that tricky code in turn might always break...
Coacher's problem seems to stem from portage making the following call: git update-index --refresh Testing with a dummy repo: % emaint sync -r dummy >>> Syncing repository 'dummy' into '/home/floppym/repos/dummy'... /usr/bin/git fetch origin --depth 1 remote: Total 0 (delta 0), reused 0 (delta 0) profiles/use.local.desc: needs update !!! git pull error in /home/floppym/repos/dummy Action: sync for repo: dummy, returned code = 1 Reading the manpage for the git-update-index command: -q Quiet. If --refresh finds that the index needs an update, the default behavior is to error out. This option makes git update-index continue anyway. If I modify the git sync module to always pass -q to git-update-index, syncing a repository with local changes to use.local.desc succeeds. % emaint sync -r dummy >>> Syncing repository 'dummy' into '/home/floppym/repos/dummy'... /usr/bin/git fetch origin --depth 1 remote: Total 0 (delta 0), reused 0 (delta 0) === Sync completed for dummy Action: sync for repo: dummy, returned code = 0
Created attachment 452192 [details, diff] always pass -q to git-update-index Patch for my previous comment.
Created attachment 452194 [details, diff] always pass -q to git-update-index Cleaner version of previous patch.
Created attachment 452196 [details, diff] always pass -q to git-update-index Better commit message.
(In reply to Mike Gilbert from comment #58) > Coacher's problem seems to stem from portage making the following call: > > git update-index --refresh That's interesting, I wasn't considering this in my reply to Coacher: It means that Coacher's remark that "it worked before" did not mean "before the patch from this bug" (as I falsely assumed for my reply) but actually: "_after_ the patch from comment #9 but before the very latest patch" Indeed, _as a side effect_ the patch from comment #9 solves Coacher's problem (at least with git reset --hard; I am not sure concerning git reset --merge: Maybe in that case, some history will successively grow with each sync?). This indicates that perhaps the new merge strategy should be always used, i.e. even if unlimited depth is desired. > always pass -q to git-update-index I would like to remind once more that besides -q also --unmerged should perhaps be passed.
There is no patch directly associated with comment 9. Typo?
I replied initially to a specific comment that referred to a specific commit. So 'after' and 'before' refers to that specific commit as well.
(In reply to Mike Gilbert from comment #63) > There is no patch directly associated with comment 9. Typo? Not a typo. I meant the change described there: "git pull" being replaced by "git fetch [...] && git reset --hard" which meanwhile is in portage in the form "git fetch [...] && git reset --merge". As mentioned, I do not understand the meaning of the latter well enough to judge whether it still "solves" Coacher's problem or whether it just "shifts" it by letting some git history grow with each syncing.
Fixed in the master branch to use git update-index -q --unmerged --refresh: https://gitweb.gentoo.org/proj/portage.git/commit/?id=d075422a8902617833ec945d94beb0bb334d44c9
(In reply to Zac Medico from comment #66) > Fixed in the master branch to use git update-index -q --unmerged --refresh: > > https://gitweb.gentoo.org/proj/portage.git/commit/ > ?id=d075422a8902617833ec945d94beb0bb334d44c9 Thank you. This commit resolves problems I've described above.
Given the performance issues introduced by `git update-index` and `git prune` (see bug 599008), I think it's time to revert all changes related to this bug. We can consider adding optional support for shallow fetch, but shallow fetch doesn't seem to be a practical default at this time.
Here is my proposal to revert all changes related to this bug: https://archives.gentoo.org/gentoo-portage-dev/message/e0314d5c748ec4098605c20d9b42b2a9
(In reply to Zac Medico from comment #69) > Here is my proposal to revert all changes related to this bug: > > https://archives.gentoo.org/gentoo-portage-dev/message/ > e0314d5c748ec4098605c20d9b42b2a9 Thx, this is sane approach. There is no evident gain in this changes but known (and perhaps unknown) regressions. It really make sense to do initial repo clone (called by emerge --sync) with --depth=1. When it's done, emerge --sync should do `git pull` so that is correctly getting diffs. I'd say after revert, stabilization of 2.3.3 (when tagged) version make sense.
Reverted for now: https://gitweb.gentoo.org/proj/portage.git/commit/?id=ab840ac982d3c8b676b89f6bedd14e85dd06870f In the next iteration, I think we're going to want separate sync-depth and clone-depth settings, or something like that.
(In reply to Oleg from comment #70) > Thx, this is sane approach. There is no evident gain in this changes The gain is very evident: shallow git clone consumes less disk space. Then it depends on setup: on some this difference is sufficient, on others it can be ignored. But with time number of commits will grow and full clone will be more and more expensive, so in the long run this problem must be solved.
(In reply to Zac Medico from comment #71) > In the next iteration, I think we're going to want separate sync-depth and > clone-depth settings, or something like that. This patch renames sync-depth to clone-depth, and shows a warning message if the deprecated sync-depth option is used: https://archives.gentoo.org/gentoo-portage-dev/message/768eab99bffa6c2615a0dbf60109ee6c https://github.com/gentoo/portage/pull/118
(In reply to Zac Medico from comment #73) > (In reply to Zac Medico from comment #71) > > In the next iteration, I think we're going to want separate sync-depth and > > clone-depth settings, or something like that. > > This patch renames sync-depth to clone-depth, and shows a warning message if > the deprecated sync-depth option is used: > > https://archives.gentoo.org/gentoo-portage-dev/message/ > 768eab99bffa6c2615a0dbf60109ee6c > https://github.com/gentoo/portage/pull/118 That's in the master branch now: https://gitweb.gentoo.org/proj/portage.git/commit/?id=b3f6297a791a748a4bca370283705576568a20c2
I have some ideas for the next iteration of shallow pull support: * Add support for a "sync-git-update-index = yes" setting in repos.conf, which people can enable when necessary (like for overlayfs). * Trigger periodic calls to git prune (for bug 599008), based on the timestamp of .git/FETCH_HEAD. We can add a repos.conf setting to control the time interval between prune calls, for example "sync-git-prune-interval-days = 7" would prune every 7 days, and "sync-git-prune-interval-days = 0.5" would prune every 12 hours. Maybe we should support syntax like "sync-git-prune-interval = 7d" or "sync-git-prune-interval = 12h". I really don't know how often we would have to prune if someone syncs at a high frequency like every 30 minutes.
(In reply to Zac Medico from comment #75) > * Trigger periodic calls to git prune (for bug 599008), based on the > timestamp of .git/FETCH_HEAD. On second thought, the timestamp of .git/FETCH_HEAD is not very useful since it changes with each fetch even when nothing has changed. So, I think we need to store a timestamp of the last prune somewhere, so that we know when to trigger the next prune. Along with the timestamp of the last prune, we could store the HEAD commit hash and avoid pruning again for the same commit hash.
If we have a state file, we can track the number of times that the HEAD has changed since the last prune. I'm thinking that we could have one json file per repository that's automatically loaded and passed into the sync module, then automatically saved afterwards if the sync module makes any changes.
Git can do more than just pruning, and usually I call this sequence on all git repositories somewhat regularly: git prune git repack -a -d git reflog expire --expire=now --all git gc --prune=all --aggressive git repack -a -d git prune (yes, the repetition is intentional). However, for the git repository (even of full depth) some of this breaks the repository occasionally for reasons I do not remember. Maybe it was git prune. Anyway, if you use a timer anyway, it might be appropriate to do the working parts of this sequence somewhat regularly. Moreover, in the moment fetching from the source repository is not supported, and one has to use postsync.d hoooks like here https://github.com/vaeth/portage-postsyncd-mv to fetch additional data like news, glsa, xml-schema, dtd, projects.xml. Perhaps also management of developer keys and verification of latest commits with these keys should be done here? Perhaps this functionality should eventually become part of portage itself? Note that all of these tasks usually should be done much less frequently than syncing. The above-mentioned scripts use their own timestamp-files, but an independent file containing all timestamps in an extendible format might perhaps be even better.
(In reply to Martin Väth from comment #78) > Git can do more than just pruning, and usually I call this sequence on all > git repositories somewhat regularly: > > git prune > git repack -a -d > git reflog expire --expire=now --all > git gc --prune=all --aggressive > git repack -a -d > git prune > > (yes, the repetition is intentional). Wow, thanks for that impressive list of commands. After reading the man pages for those commands, it seems that the simplest command for automated maintenance is this: git gc --auto > However, for the git repository (even of full depth) some of this breaks the > repository occasionally for reasons I do not remember. Maybe it was git > prune. > > Anyway, if you use a timer anyway, it might be appropriate to do the working > parts of this sequence somewhat regularly. If possible I'd like to use `git gc --auto` by default, and maybe have an option for the user to specify a maintenance script in repos.conf. Maybe something like this for defaults: sync-git-gc-cmd = git gc --auto > Moreover, in the moment fetching from the source repository is not > supported, and one has to use postsync.d hoooks like here > https://github.com/vaeth/portage-postsyncd-mv to fetch additional data like > news, glsa, xml-schema, dtd, projects.xml. Perhaps also management of > developer keys and verification of latest commits with these keys should be > done here? > Perhaps this functionality should eventually become part of portage itself? I'd prefer not to bundle something that complex with portage, so a custom sync module seems like a good way to go. For now, I'm mainly interested in targeting the gentoo git mirror as described here: https://wiki.gentoo.org/wiki/Portage_Security#git-mirror_repo > Note that all of these tasks usually should be done much less frequently > than syncing. The above-mentioned scripts use their own timestamp-files, but > an independent file containing all timestamps in an extendible format might > perhaps be even better. I wonder if `git gc --auto` is suitable to run for every sync, since it does nothing unless the the gc.auto (default 6700) or gc.autoPackLimit (default 50) threshold is exceeded. That would eliminate the need for the sync module to store state of its own.
> For now, I'm mainly interested in targeting the gentoo git mirror I know; I was mainly posting this as a hint to keep a possible timestamp file open for future extensions. The problem with the gentoo git mirror is that it has a quickly growing history for metadata/md5-cache changes (with every eclass change) which nobody needs on his harddisk.
Patch posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/50c449bff26d0bda2e446f8bde74a30b https://github.com/gentoo/portage/pull/336
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=903c4b1a67689c4b8cc59113a56d58575cf7db8e commit 903c4b1a67689c4b8cc59113a56d58575cf7db8e Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2018-07-10 07:03:35 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2018-07-11 06:10:41 +0000 GitSync: support sync-depth (bug 552814) Support sync-depth for shallow sync, using git reset --merge just like in the earlier implementation that was reverted in commit ab840ac982d3c8b676b89f6bedd14e85dd06870f. Also, run git gc --auto in the foreground, in order to trigger periodic housekeeping and hopefully avoid errors from automatic git gc calls as reported in bug 599008. The default sync-depth is unlimited, which means that default behavior remains unchanged (unlike the previous implementation that was reverted). Bug: https://bugs.gentoo.org/552814 Bug: https://bugs.gentoo.org/599008 man/portage.5 | 3 ++- pym/portage/repository/config.py | 4 ---- pym/portage/sync/modules/git/git.py | 26 +++++++++++++++++++++++++- 3 files changed, 27 insertions(+), 6 deletions(-)
Do already existing fully cloned trees need to be re-pulled or portage will take care of it automatically during next sync?
(In reply to Coacher from comment #83) > Do already existing fully cloned trees need to be re-pulled or portage will > take care of it automatically during next sync? It's automatic. If you have any local branches that you don't need then you should remove them so that the objects that they reference can be garbage collected.
Is there any comparison rsync VS git about difference in sync speed and used disk space for common use case (like ext4 without overlayfs/unionfs)?
(In reply to Coacher from comment #83) > Do already existing fully cloned trees need to be re-pulled or portage will > take care of it automatically during next sync? Nothing changes by default. You need to use "sync-depth = 1" in your repos.conf. The size of the portage repo also didn't change for me. I had to delete it and then re-sync, which cut its size in half.
pity that portage claims it needs to be clone-depth in stead of sync-depth... /usr/lib64/python2.7/site-packages/portage/repository/config.py:182: UserWarning: repos.conf: sync-depth is deprecated, use clone-depth instead warnings.warn(_("repos.conf: sync-depth is deprecated," kind of misses the issue in the news item.... [3] https://bugs.gentoo.org/552814 sys-apps/portage: support shallow git pull by setting sync-depth = 1 in repos.conf
(In reply to Nico Baggus from comment #87) > pity that portage claims it needs to be clone-depth in stead of sync-depth... > > /usr/lib64/python2.7/site-packages/portage/repository/config.py:182: > UserWarning: repos.conf: sync-depth is deprecated, use clone-depth instead > warnings.warn(_("repos.conf: sync-depth is deprecated," > > > kind of misses the issue in the news item.... > > [3] https://bugs.gentoo.org/552814 sys-apps/portage: support shallow > git pull by setting sync-depth = 1 in repos.conf That UserWarning means you need to upgrade to sys-apps/portage-2.3.42 first.