Hi, I'm currently improving a custom ebuild for Matlab, and I'm hitting the following problem: The package mostly consists of ~65614 small files to install. They are copied from the 3 installation CD's in about 10 minutes. My problem: with the latest stable portage (2.0.54), the installation phase takes a lot of time (more than 2 hours on a 2 GHz athlon64) even though there's no compilation. It happens in /usr/lib/portage/bin/ebuild.sh. After displaying a QA Notice concerning executable stacks, the NEEDED information is saved (which takes a couple seconds). Then, two loops spawn (in turn) a chown and a chgrp process for each file, which takes an insane amount of time with ~65k files. The processes can be seen: ps aux | grep chgrp chgrp 0 /var/tmp/portage/matlab-7.0.1/image//opt/matlab/toolbox/curvefit/curvefit/private/definev.m Same thing happens with newest ~x86 portage that I just tried. Except for these two loops, the rest is fast. Thoughts ?
I think spanky owns the code in question.
i highly doubt that i'm pretty sure the code in question is right after scanelf: find "${D}/" -user portage | while read file; do count=$(( $count + 1 )) if [ -L "${file}" ]; then lchown ${PORTAGE_INST_UID} "${file}" ... find "${D}/" -group portage | while read file; do count=$(( $count + 1 )) if [ -L "${file}" ]; then lchgrp ${PORTAGE_INST_GID} "${file}" ...
post the environment file from the merge please. Wondering what do_stat is set to...
SpanKY: Yes, this is the code in question. Thanks for helping btw :-) OK, I've added "set >/tmp/env-snapshot" as first command in src_install() and will post the result shortly.
Created attachment 78793 [details] Environment data at entry in src_install() I removed the function definitions.
Ok, so I guess I was being blonde.. didn't see do_stat was a function definition. What would you want me to do ? 1) add a statement in the ebuild to dump the do_stat definition 2) add a statement inside stat_perms(), but where ? Thanks
(In reply to comment #6) > Ok, so I guess I was being blonde.. didn't see do_stat was a function > definition. > > What would you want me to do ? > 1) add a statement in the ebuild to dump the do_stat definition > 2) add a statement inside stat_perms(), but where ? Just modify /usr/lib/portage/bin/ebuild.sh's stat_perm's func, and dump do_stat if it's defined. Easiest route, just remember to remove it afterwards.
does that code really need to be written like that ? seems to me that the same thing could be accomplished with a delicately written `find | xargs chown`
Ok, here's the result: - On stat_perms() entry: dumped after the line "local f": do_stat not defined. - On stat_perms() exit: dumped before the line "f=$(do_stat "$@") || return": do_stat () { f=$($(type -p stat) -c '%f' "$1") || return $?; printf '%o' "0x$f" } SpanKY: yes, it would be good if less processes were spawned to process a given number of files...
Created attachment 78877 [details, diff] portage-simpler-user-group-reset.patch note: this is untested :P this patch uses find/xargs to do chown/chgrp on files in $D ... also, it removes the stat_perm stuff as i dont think we should really worry about that doing a chown/chgrp will only reset sticky bits and imo, those packages that need the sticky bits restored should fix their ebuild instead
Modifications applied ! Now it's very fast but I get the following message, twice: ------------------------------------------------- * 65613 files were installed with user portage! xargs: invalid option -- a Usage: xargs [-0prtx] [-e[eof-str]] [-i[replace-str]] [-l[max-lines]] [-n max-args] [-s max-chars] [-P max-procs] [--null] [--eof[=eof-str]] [--replace[=replace-str]] [--max-lines[=max-lines]] [--interactive] [--max-chars=max-chars] [--verbose] [--exit] [--max-procs=max-procs] [--max-args=max-args] [--no-run-if-empty] [--version] [--help] [command [initial-arguments]] Report bugs to <bug-findutils@gnu.org>. ------------------------------------------------
Ok, updated to findutils 4.3.0 and the new 'xargs' supports the '-a' argument. After emerging matlab, the files belong to root:root and the directories belong to portage:portage.
> After emerging matlab, the files belong to root:root and the directories belong > to portage:portage. odd ... the dirs should have been set the same as files ...
Sorry for the delay.. Ok, actually it *does* work, *but* for the directory ownerships to be correct I had to "rm -R /opt/matlab" before emerging. Now, the directories inside /opt/matlab belong to root:root as expected.
Created attachment 84338 [details, diff] updated patch against portage svn
(In reply to comment #14) > Sorry for the delay.. Jo
(In reply to comment #14) > Sorry for the delay.. Joël: could you give this new version a spin ? this one should work fine with all versions of findutils > Ok, actually it *does* work, *but* for the directory ownerships to be correct I > had to "rm -R /opt/matlab" before emerging. Now, the directories inside > /opt/matlab belong to root:root as expected. this is normal/expected behavior
Is there a reason that the code uses a temporary file instead of just using a pipe? That would solve all problems with the "-a" option.
Hi SpanKY, it works great: * 65615 files were installed with user portage! * 65615 files were installed with group portage! - The big delay that occured before merging, is now reduced by a factor of 50 or so. - The better handling of fd's (integrated after bug 128284) speeds up merging quite a lot. Overall, the whole process has become more than 10 times faster ! The total time on a 2GHz amd x86 laptop, is ~ 26 minutes (about half of which, is Matlab's installation stuff itself), which is pretty decent for a package of this size, IMHO ;-) I have just noticed one thing, and that's not specific to this patch. During the emerge, I get a few instances of this: libsandbox: Can't resolve chmod: (null)
For the record, I have tested with: - portage 2.1_pre7-r5, with patch from comment #15 - findutils 4.3.0 - sandbox 1.2.12
> Is there a reason that the code uses a temporary file instead of just using a > pipe? That would solve all problems with the "-a" option. read the new patch
merged latest version into svn
*** Bug 129935 has been marked as a duplicate of this bug. ***
(Forwarding comment #0 on bug #129935) > One of the patches added by portage 2.1_pre8 (1020_r3118_bug_121368) uses the > -xtype parameter on find call. That parameter is GNU specific, so doesn't work > on Alt ports. > > Also, it seems to me it's used wrongly. As it is, it will look for files owned > by portage group that are symlinks to symlinks, while afaics, it should check > for files that are links (as it's used to run chmod -h). > > Replacing -xtype l with -type l seems to work for me on cvs that triggers that > part of the code. >
Created attachment 84688 [details, diff] misc-functions-less-work.patch when i rewrote it, i was going for changing of style/syntax rather than functionality i'm pretty sure we can drop the first `find` altogether as the default behavior of find is to not follow symlinks
Created attachment 84691 [details, diff] no temp file This is equivalen to misc-functions-less-work.patch but it eliminates the temp file usage.
Comment on attachment 84691 [details, diff] no temp file no, `find -exec` should never be used that'd just put us right back where we started ... portage takes way too long to merge a ton of files
Created attachment 84694 [details, diff] no temp file, with xargs instead of -exec This is like the previous patch except with xargs instead of -exec.
Comment on attachment 84694 [details, diff] no temp file, with xargs instead of -exec which is also wrong if find gives 0 files, xargs executes chown/chgrp and you see a pretty "missing operand" error
Comment on attachment 84694 [details, diff] no temp file, with xargs instead of -exec except that we already make sure we dont execute with 0 files as Zac points out, so patch is ok
Released in 2.1_pre9.