Bug 147516

Summary:	[PATCH] Parallel portage can reduce build times
Product:	Portage Development	Reporter:	devsk <funtoos>
Component:	Enhancement/Feature Requests	Assignee:	Portage team <dev-portage>
Status:	RESOLVED FIXED
Severity:	enhancement	CC:	again, arthur, bug.hunter, carpaski, caster, chrulri, ferringb, geevh, genkaku, gentoo, gent_bz, honk-online, hyperquantum, icb1, ingmar, jgeisler, kevinlyles, m.debruijne, maartenlameris, mchosch2006, mudrii, napopa_2000, niko.vuokko, pacho, philantrop, pointman, psychotical, rsa4046, saffi, semhirage, tcunha, tenebrarum, thomas.bettler, thothonegan
Priority:	High	Keywords:	InVCS
Version:	2.1
Hardware:	All
OS:	Linux
URL:	http://forums.gentoo.org/viewtopic-t-484842-postdays-0-postorder-asc-start-0.html
Whiteboard:
Package list:		Runtime testing required:	---
Bug Depends on:
Bug Blocks:	184128, 216231
Attachments:	Patch against 2.1.1 Overlay tar for portage containing portage ebuild with the patch Fix broken "--tree" option. Latest patch - it is same as the one in the ebuild tar Patch against current stable portage (2.1.2-r9) The incremental patch to fix a copy-paste bug Patch for 2.1.2-r10 in the tree Fix a --fetchonly related bug Add more useful info in the status Fix minor color bug. Update patch to r11, pass nospinner to child Further cleanups, size reduction Patch against 2.1.2.2 Some more misc fixes. Fix broken world update Fix for a small typo Fix a small bug reported in the forums Another painful "bring it fwd" Fix a bug triggered by circular deps and reported on forums Fix a resume bug with --nodeps Bring it fwd, fix exception on new install Bring it fwd to 2.1.2.12 Bring it fwd to 2.1.4 Fix resume and update to 2.1.4.1 patch por lastest portage version updated to 2.1.4.4 2.1.4.4 with python 2.5.2 portage 2.2 with python 2.5.2 2.1.4.4 with python 2.4.4

Description devsk 2006-09-13 18:48:15 UTC

This is not really a bug, but request for comment and possible inclusion of code into portage.

Motivation for doing parallel merges:

1. Configuration, link and merge phases of packages are essentially single threaded. So, a multi core or multi cpu system is using at the most 50% during these phases of the package install. These phases are also I/O intensive so cpu usage is even lower most of the time.

2. Many packages do MAKEOPTS=-j1 which makes them not use the cpu upto 100%. So, there are idle cpu cycles available to be utilized during emerges.

3. Not all packages in the dependency list need to be merged in sequentially one after the other.

Given these facts, consider this simple example:

I am trying to install gentoolkit. Some of the dependencies are:

gentoolkit->portage->pycrypto->gmp
gentoolkit->perl

perl and gmp are not related to each other and can be emerged at the same time. In general, the dep graph will always have nodes which are not related to each other. This is probably always true if you are doing an update on world where there is a variety of potentially unrelated packages being emerged. If we have multiple leaf nodes, and portage finds them, portage can start a separate build for each leaf node for upto a maximum, like 2 for single cpu and 3 for dual cpu, at a time. Once leaves are done it can proceed to the next level of unrelated packages.

So, I went ahead and put together a patch for portage (its on portage-2.1.1). The patch is attached as a patch file as well as an ebuild tar to enable easy install.

The changes that you will see and and hopefully like are:

1. Parallel install of certain packages keeping your cpu pegged. note that it may not always be possible e.g. you want to emerge -O kdelibs kdebase...:)

2. The output is very concise if PORT_LOGDIR is set. Only important messages (like setup and post install messages which are currently lost in the garbage that flies by) are shown in the terminal, rest go to log files in /var/log/portage. I highly recommend setting it. If you don't, your configure/compile/install garbage will be thrown on the terminal as before. Less output on the terminal speeds up emerges a lot.

3. If a package fails, emerge doesn't stop immdiately. It will keep emerging as long as there are packages in the current slot to merge. It will stop after the current slot. A slot is like a level in the depgraph. All leaf nodes (packages which don't depend on anything) are at slot 0 (or 1). Next level packages are at slot 1, and so on. All packages at the same slot are not related to each other and can be emerged at the same time. In nutshell, portage will stop only when its absolutely impossible to proceed with any emerge.

Also, note that you will see some spurious looking merge slot related messages before the emerge starts. Don't worry they are telling you about which packages will be emerged at which slot. The emerging starts at slot 0 (if present, otherwise 1). I would like this output when you submit a problem. These will be removed when the patch becomes stable.

For the code reviewers: python is not my native language (I speak C). So, please make as many suggestions to improve the code as you can. I put this together real quick and I might have overlooked many things. So, please review and make suggestions.

I have seen more than 25% improvement in performance on my dual core AMD 3800.

Things that I am working on where potential issues might be present are:

1. vdb or related races because of parallel forks.
2. potential bad usage of python...:)

Comment 1 devsk 2006-09-13 19:03:02 UTC

Created attachment 96922 [details, diff]
Patch against 2.1.1

Comment 2 devsk 2006-09-13 19:03:56 UTC

Created attachment 96923 [details]
Overlay tar for portage containing portage ebuild with the patch

Comment 3 devsk 2006-09-13 22:51:16 UTC

Please visit http://forums.gentoo.org/viewtopic-t-484842-postdays-0-postorder-asc-start-0.html for discussions and latest patches.

Comment 4 Jason Stubbs (RETIRED) gentoo-dev

2006-09-14 08:36:36 UTC

I like the sound of this as I'm affected by the same idle CPU issue. However, that's a very big patch. I've been planning to put together a patch that will put full dependency information into digraph. If I can get that together quickly, it should cut your patch down a fair bit - especially the slots stuff which is kind of redundant. How's that sound?

Comment 5 Zac Medico gentoo-dev

2006-09-14 10:26:34 UTC

I like the idea too.  I think it would be a good idea to use threading instead of forks (same with parallel-fetch).  Prior to 2.1.1, it wasn't really feasible to use threading due to problems with global variables, but it should be feasible now.

Comment 6 devsk 2006-09-14 10:37:37 UTC

that sounds ok to me. As long as its not too far in the future. Please understand that maintaining the patch upto date is a pain.

IMO, it won't reduce the patch much because merge_slot related stuff is not that big. Its the splitting of merge() function, the new function which does the real fork work, and output redirection related changes that's making the patch fat.

How about reviewing the other parts of the patch? like output re-direction? This was necessary to resolve because the stdout will be mangled in parallel case. Moreover, there are proven speed enhancements (please refer the forum post where I posted analysis results) in not writing anything to stdout. If PORT_LOGDIR is set, this is what people should get: nothing but important messages on stdout and everything else in log files. "Important" is driven by the action table where I have marked setup and postinst as important.

What I would like see is that someone installing it in a chroot (I am doing the same) and playing with it to find if there are any gotchas. Then, look at the flow and various pieces to see if they fit good. I know the patch "looks" behemoth, but its actually not because of the way the diff works. There are a few big chunks it moved around for example, which when looked at alongside the original code, are very obvious.

Comment 7 devsk 2006-09-14 10:38:33 UTC

sorry, mid-air collision...I was replying to jason...:)

Comment 8 devsk 2006-09-14 18:18:34 UTC

Created attachment 97020 [details]
Fix broken "--tree" option.

I will not be uploading the patch anymore. Instead ebuild tar only will be updated. It contains the patch in there.

Comment 9 Jason Stubbs (RETIRED) gentoo-dev

2006-09-15 08:49:45 UTC

I'm planning to do the depgraph patch in the next 24 hours.. With regards to the behemothicity(sp? ;) diff -b should fix that when reviewing I guess. Don't know about the chroot bit though - too used to running uncertain (but known) code.

With regard to updating your overlay only, please post updates to the patch. If there are that many updates that it is bothersome, perhaps it's not ready for merging...

Comment 10 Jason Stubbs (RETIRED) gentoo-dev

2006-09-15 08:52:04 UTC

By the way, only uploading the patch rather than a bzip'ed overlay would be much handier.

Comment 11 devsk 2006-09-15 09:22:23 UTC

I just wanted to upload one thing, and since patch is present in the  ebuild, I thought ebuild is better candidate. Bu I think I agree with you. The ebuild hasn't changed (except for one line of epatch). Also, patch is not changing much either.

Comment 12 devsk 2006-09-15 09:26:25 UTC

Created attachment 97056 [details, diff]
Latest patch - it is same as the one in the ebuild tar

I use parallel on my main system and haven't noticed any breakage so far.

Comment 13 devsk 2006-09-15 09:53:04 UTC

>> I'm planning to do the depgraph patch in the next 24 hours..

Jason, are you planning to do it with my patch in or you plan to do it on 2.1.1 and leave the diff merge and getting-rid-of-merge-slots-stuff to me?

IMO, it would be better if you did with my patch in because that way I will get a code review like nothing else...;-)

I am never the one to miss the criticism of my python usage...Because I never learned this language formally, I am always looking to learn about potential bad usage.

Comment 14 Zac Medico gentoo-dev

2006-09-15 21:35:43 UTC

(In reply to comment #13)
> IMO, it would be better if you did with my patch in because that way I will get
> a code review like nothing else...;-)

I'm strongly opposed to merging any new code that uses forks in an unsafe manner like parallel-fetch does.  The problem is that some resources (database connections that have writable file descriptors, for example) may not be safe to access from multiple processes concurrently.  For portage to use forks safely, the child process has to close all file descriptors except those explicitly shared by the parent.

Comment 15 devsk 2006-09-15 21:53:51 UTC

> I'm strongly opposed to merging any new code that uses forks in an unsafe
> manner like parallel-fetch does.  The problem is that some resources (database
> connections that have writable file descriptors, for example) may not be safe
> to access from multiple processes concurrently. 
I am sorry if it came across as merging. I meant that whatever changes he is planning for depgraph, he make those changes on top of my changes in his personal test machine. I don't expect anybody to merge my changes into portage without sufficient review and testing.

Please have a look at the code and see what potential issue might arise with it like e.g. with open FDs as you mention. I am doing the same.

Comment 16 Zac Medico gentoo-dev

2006-09-15 22:24:03 UTC

(In reply to comment #15)
> Please have a look at the code and see what potential issue might arise with it
> like e.g. with open FDs as you mention. I am doing the same.

Your forks are unsafe because the child processes can share any number of open file descriptors that they've inherited from the parent.  An safe alternative would be to use portage_exec.spawn() to run multiple ebuild(1) commands simultaneously.

Comment 17 devsk 2006-09-15 22:41:11 UTC

>> Your forks are unsafe because the child processes can share any number of open
>> file descriptors that they've inherited from the parent.

is sharing an FD alone in itself a problem or is reading/writing from/to a shared FD a problem? What we need to see is if the inherited FDs are used by the children. If not used, they should be closed and should not cause any problem. If they are needed in the child codepath and can potentially conflict with other children or parent, we will need to worry about this issue. We will need to look at specific examples of these.

Comment 18 Jason Stubbs (RETIRED) gentoo-dev

2006-09-15 22:51:04 UTC

(In reply to comment #17)
> What we need to see is if the inherited FDs are used by the children. If not
> used, they should be closed and should not cause any problem. If they are 
> needed in the child codepath and can potentially conflict with other children 
> or parent, we will need to worry about this issue. We will need to look at 
> specific examples of these.

There's no guarantee either way. The most common would be the plugins under cache. In following the API, plugins are free to cache connections and what not.

There's also a bunch of locking issues spread throughout vdb updates. The easiest/safest fix would probably be to throw a global lock around just after pkg_test right through to the end of the merge process of a package.

Comment 19 Zac Medico gentoo-dev

2006-09-15 22:53:51 UTC

(In reply to comment #17)
> is sharing an FD alone in itself a problem or is reading/writing from/to a
> shared FD a problem?

The sharing itself it the problem, because there is no way to predict the possible interactions that could occur.  It's an inherent problem with forks.  The only way to use forks with complete safety it to close *all* file descriptors that aren't explicitly shared.  For this reason, practical use of a forked python process is quite limited.  In general, it's only useful for things like portage_exec.spawn (which closes all file descriptors before calling exec).

Comment 20 devsk 2006-09-16 13:55:59 UTC

(In reply to comment #19)
> The sharing itself it the problem, because there is no way to predict the
> possible interactions that could occur.  It's an inherent problem with forks. 
> The only way to use forks with complete safety it to close *all* file
> descriptors that aren't explicitly shared.  For this reason, practical use of a
> forked python process is quite limited.  In general, it's only useful for
> things like portage_exec.spawn (which closes all file descriptors before
> calling exec).
I did a little experiment where in I just fstat'ed all the FDs present in the child immd. after the fork() in fork_one_emerge function. All FDs from 3 to 1024 were BAD in all the emerge runs I tried. I think your concern with shared FDs after fork is an academic one.

>> because there is no way to predict the possible interactions that could occur.

It is predictable. The application and its code is right in front of you.

I will concentrate on Jason's concern regarding vdb, which is a valid one and more easily fixable....;-)

Comment 21 Zac Medico gentoo-dev

2006-09-16 16:04:02 UTC

(In reply to comment #20)
> I did a little experiment where in I just fstat'ed all the FDs present in the
> child immd. after the fork() in fork_one_emerge function. All FDs from 3 to
> 1024 were BAD in all the emerge runs I tried.

Just because it's not a problem in your current environment doesn't mean that it will never be a problem.

> I think your concern with shared FDs after fork is an academic one.

Potentially unsafe forks are a bad practice, plain and simple.  The parallel-fetch code, may do it, but we're not going any further down that road.

Comment 22 devsk 2006-09-16 16:41:30 UTC

(In reply to comment #21)
> Just because it's not a problem in your current environment doesn't mean that
> it will never be a problem.

I don't see it as a property of my current environment. Its the property of this application i.e. the portage code. The code execution path is not having any open FDs at the time of fork in fork_one_emerge. If it has, please point it out in the code (its there for all of us to see). We are not talking about a third party closed-source application.

> Potentially unsafe forks are a bad practice, plain and simple.  The

Please prove its unsafe. I am saying its safe because I haven't seen otherwise - neither from the look I had at the code nor from the fstats that I have done.

Comment 23 Alec Warner (RETIRED) archtester

2006-09-16 16:47:27 UTC

> > Potentially unsafe forks are a bad practice, plain and simple.  The
> 
> Please prove its unsafe. I am saying its safe because I haven't seen otherwise
> - neither from the look I had at the code nor from the fstats that I have done.

Or..prove it's safe?

See this argument is silly ;)

Comment 24 Zac Medico gentoo-dev

2006-09-16 17:52:04 UTC

(In reply to comment #22)
> I don't see it as a property of my current environment. Its the property of
> this application i.e. the portage code. The code execution path is not having
> any open FDs at the time of fork in fork_one_emerge. If it has, please point it
> out in the code (its there for all of us to see). We are not talking about a
> third party closed-source application.

If we use unsafe forks, then it puts a limitation on both the portage code and any libraries that it may use.

Comment 25 devsk 2006-09-16 18:53:51 UTC

(In reply to comment #24)
> If we use unsafe forks, then it puts a limitation on both the portage code and
> any libraries that it may use.

But what are our options. portage_exec.spawn?

Comment 26 Zac Medico gentoo-dev

2006-09-16 19:09:59 UTC

(In reply to comment #25)
> But what are our options. portage_exec.spawn?

That's one option.  Like I said before, you could spawn the ebuild(1) command separately for each package separately.  The spawn function has a returnpid option  so that the parent process won't wait for the child to complete.

Another option would be to use threading, which will probably be more work to implement but is likely to lead to a better design.

Comment 27 devsk 2006-09-16 19:37:27 UTC

(In reply to comment #26)
> That's one option.  Like I said before, you could spawn the ebuild(1) command
> separately for each package separately.  The spawn function has a returnpid
> option  so that the parent process won't wait for the child to complete.

Doesn't that mean I would be doing all that doebuild does currently but with the final spawn with returnpids=True? That's a lot of code (mosly duplicated). Or you mean I modify spawnebuild() in such a fashion that it calls portage_exec.spawn directly instead of portage.spawn (which spawns 'command' in bash or sandbox using portage_exec.spawn) if an additional switch passed to it is True (e.g. returnpids).

I am wondering what would we gain with this. Ultimately, it will still do a fork(), close all descriptors but 0,1,2. So, to preserve any other FDs we would need to know about them before we call spawn() in spawnebuild(), which I don't see happening right now i.e. no call to spawn is using fd_pipes, which defaults to None. And if we knew about these FDs (which  was my intention when I asked for specific examples), we can preserve them even in the current patch.

In nutshell, I am not convinced that the other option is any better than what the patch does right now. I can close all descriptors other than 0,1,2 in the child immd. following the fork() and still be as safe as the current code is.

> Another option would be to use threading, which will probably be more work to
> implement but is likely to lead to a better design.

This will be beyond me frankly, given the amount of python I know....:)

Comment 28 Zac Medico gentoo-dev

2006-09-16 19:47:09 UTC

(In reply to comment #27)
> In nutshell, I am not convinced that the other option is any better than what
> the patch does right now. I can close all descriptors other than 0,1,2 in the
> child immd. following the fork() and still be as safe as the current code is.

After you've closed all the file descriptors, there's no assurance that the rest of the portage api or any of the libraries it uses will continue to work correctly.  If you spawn the ebuild(1) command, it will start a fresh python process that will create new file descriptors as necessary (file descriptors that aren't shared with the parent).

Comment 29 devsk 2006-09-17 02:32:16 UTC

> If you spawn the ebuild(1) command, it will start a fresh python
> process that will create new file descriptors as necessary (file descriptors
> that aren't shared with the parent).

Alright, I am going this path. Is there anyone in here who thinks spawning ebuild program to implement parallel is a bad idea? Please raise your voice now or forever hold your peace....:)

Comment 30 Zac Medico gentoo-dev

2006-09-17 15:22:26 UTC

(In reply to comment #29)
> Alright, I am going this path. Is there anyone in here who thinks spawning
> ebuild program to implement parallel is a bad idea? Please raise your voice now
> or forever hold your peace....:)

I think it fits the current design of portage pretty well.  The way I imagine it is that the child process will run everything up to the install (or package) phase and then hand off to the parent for the merge phase.

Comment 31 devsk 2006-09-17 15:50:21 UTC

(In reply to comment #30)
> I think it fits the current design of portage pretty well.  The way I imagine
> it is that the child process will run everything up to the install (or package)
> phase and then hand off to the parent for the merge phase.

That's precisely what I was planning to do as well. A good side effect of this would that I don't need to worry about vdb locking during merge.

Does anybody have strong opinions about what I did with output re-direction? I didn't get any feedback on that.

Comment 32 Zac Medico gentoo-dev

2006-09-17 16:46:20 UTC

(In reply to comment #31)
> Does anybody have strong opinions about what I did with output re-direction?

I think PORT_LOGDIR should be a requirement for parallel builds.  All PORT_LOGDIR support is currently handled within the doebuild function, but emerge will need some way to redirect all of the child process output (including output from the python side) into the log.

Comment 33 devsk 2006-09-17 19:25:11 UTC

(In reply to comment #32)
> I think PORT_LOGDIR should be a requirement for parallel builds.  All

in the sense that if PORT_LOGDIR is not set 1.) disable parallel or 2.) set PORT_LOGDIR to /var/log/portage if parallel is enabled?

But the requirement also raises one maintenance question about how these logs are cleaned up. Do we just leave it to user to clean up weekly/monthly? Or we provide some sort of switch to enable a weekly/monthly cron job?

> PORT_LOGDIR support is currently handled within the doebuild function, but
> emerge will need some way to redirect all of the child process output
> (including output from the python side) into the log.

portage_exec.spawn() does support logfile arg and I added quiet argument to it. quiet when True disables 'tee' and redirects the output/error to logfile.

Comment 34 Zac Medico gentoo-dev

2006-09-17 20:10:35 UTC

(In reply to comment #33)
> (In reply to comment #32)
> > I think PORT_LOGDIR should be a requirement for parallel builds.  All
> 
> in the sense that if PORT_LOGDIR is not set 1.) disable parallel or 2.) set
> PORT_LOGDIR to /var/log/portage if parallel is enabled?

I think we should force the user to set PORT_LOGDIR explicitly, since that's the way it's always been.  If it's unset, we could force emerge to exit immediately with an error message or just disable it.  The advantage of exiting immediately is that the user is sure to notice the problem and correct it.

> But the requirement also raises one maintenance question about how these logs
> are cleaned up. Do we just leave it to user to clean up weekly/monthly? Or we
> provide some sort of switch to enable a weekly/monthly cron job?

I think we should just leave it to the user to set up a cron job, since that's the way it's always been.

> portage_exec.spawn() does support logfile arg and I added quiet argument to it.
> quiet when True disables 'tee' and redirects the output/error to logfile.

Instead of adding a quiet parameter, you can use the fdpipes parameter instead.  That's how I've reimplemented parallel-fetch here:

http://sources.gentoo.org/viewcvs.py/portage/main/trunk/bin/emerge?r1=4459&r2=4467&makepatch=1&diff_format=h

I should have called logfile.close() after the spawn call, because the parent process shouldn't keep that file descriptor open.

Comment 35 devsk 2006-09-17 22:26:27 UTC

(In reply to comment #34)
> I think we should force the user to set PORT_LOGDIR explicitly, since that's
> the way it's always been.  If it's unset, we could force emerge to exit
> immediately with an error message or just disable it.  The advantage of exiting
> immediately is that the user is sure to notice the problem and correct it.

I think a good user experience is always when they don't have to do anything. I don't see any problem in using /var/log/portage/ by default if they enable parallel and don't set PORT_LOGDIR. Most products use /var/log/<product> without requiring users to set that dir.

> I think we should just leave it to the user to set up a cron job, since that's
> the way it's always been.

Again, its the user experience that matters. What do we lose by providing a conf.d file for portage and two policy variables, one of which just state that tar up the logs at end of the day/week/month and other one saying delete logs every week/month. A script is setup in the cron. I think portage needs to manage its garbage...:)

> Instead of adding a quiet parameter, you can use the fdpipes parameter instead.

yes, I thought about that. But it will break the useful 'tee' invocation used currently. So, 'quiet' keeps the current behaviour but makes it so that the output can be optionally sent to only logfile or stdout as well as to logfile. Please refer to the changes in the actionmap in my patch and you will understand what I am trying to say.

Comment 36 Zac Medico gentoo-dev

2006-09-17 23:25:10 UTC

(In reply to comment #35)
> yes, I thought about that. But it will break the useful 'tee' invocation used
> currently. So, 'quiet' keeps the current behaviour but makes it so that the
> output can be optionally sent to only logfile or stdout as well as to logfile.
> Please refer to the changes in the actionmap in my patch and you will
> understand what I am trying to say.

We don't really need to put quiet into the actionmap like that though.  If we redirect all of the ebuild command output to a log, the ebuild command doesn't even have to open the log itself.  It can just output to stdout and stderr as usual.

Comment 37 Jason Stubbs (RETIRED) gentoo-dev

2006-09-18 02:28:49 UTC

My work to build up a full dependency graph is now in subversion and attached to bug #147766. You should be able to use it in a similar fashion to the following pseudo code.

active_pkgs = []
while not mygraph.is_empty() and active_pkgs:
    if active_pkgs and len(active_pkgs) == max_concurrency:
        wait_for_pkg_to_finish()
    if pkg_has_finished():
        pkg = pkg_that_has_finished()
        active_pkgs.remove(pkg)
        mygraph.remove(pkg)
        continue
    next_pkgs = [pkg for pkg in mygraph.leaf_nodes() if pkg not in active_pkgs]
    if not next_pkgs:
        if active_pkgs:
            continue
        next_pkgs = mygraph.leaf_nodes(ignore_soft_deps=True)[:1]
    next_pkgs = next_pkgs[:max_concurrency - len(active_pkgs)]
    for pkg in next_pkgs:
        start_next_pkg(pkg)

This code assumes that there are no hard unresolveable circular dependencies, but that should be checked before heading into the merging stage anyway.

Comment 38 Zac Medico gentoo-dev

2006-09-18 03:29:06 UTC

Anonymous svn is supposed to be available soon.  Until then, you may want to use the latest snaphshot from here:  http://dev.gentoo.org/~zmedico/portage/snapshots/

Comment 39 devsk 2006-09-18 07:22:55 UTC

(In reply to comment #36)
> We don't really need to put quiet into the actionmap like that though.  If we
> redirect all of the ebuild command output to a log, the ebuild command doesn't
> even have to open the log itself.  It can just output to stdout and stderr as
> usual.

I don't understand this part (may be I am too slow today). The intention was to redirect certain phases(e.g. setup, postinst) of the ebuild merge process to stdout as well as to the logfile. If I did what you just said, I will miss these important messages on the stdout, because without the 'tee', they go only in one place, either stdout (which will end up in logfile with my ebuild invocation) or logfile.

Comment 40 Alec Warner (RETIRED) archtester

2006-09-18 07:28:18 UTC

(In reply to comment #35)
> (In reply to comment #34)
> > I think we should force the user to set PORT_LOGDIR explicitly, since that's
> > the way it's always been.  If it's unset, we could force emerge to exit
> > immediately with an error message or just disable it.  The advantage of exiting
> > immediately is that the user is sure to notice the problem and correct it.
> 
> I think a good user experience is always when they don't have to do anything. I
> don't see any problem in using /var/log/portage/ by default if they enable
> parallel and don't set PORT_LOGDIR. Most products use /var/log/<product>
> without requiring users to set that dir.
> 

The most common case where LOGDIR isn't set is during a first time install; I'd rather have emerge die telling me I have a bad config; as opposed to having it log to the improper place for 2 months and then having me realize it.

If you consider PORT_LOGDIR as a "required" configuration directive then there is no problem dying on it.

> > I think we should just leave it to the user to set up a cron job, since that's
> > the way it's always been.
> 
> Again, its the user experience that matters. What do we lose by providing a
> conf.d file for portage and two policy variables, one of which just state that
> tar up the logs at end of the day/week/month and other one saying delete logs
> every week/month. A script is setup in the cron. I think portage needs to
> manage its garbage...:)
> 

The user experience for portage has been traditionally "do it yourself"; maybe I don't want to use cron; maybe I want to use logrotate; maybe I'm on OSX and I have this Launch thing; who knows.

> > Instead of adding a quiet parameter, you can use the fdpipes parameter instead.
> 
> yes, I thought about that. But it will break the useful 'tee' invocation used
> currently. So, 'quiet' keeps the current behaviour but makes it so that the
> output can be optionally sent to only logfile or stdout as well as to logfile.
> Please refer to the changes in the actionmap in my patch and you will
> understand what I am trying to say.
>

Comment 41 Jason Stubbs (RETIRED) gentoo-dev

2006-09-18 07:38:01 UTC

(In reply to comment #39)
> I don't understand this part (may be I am too slow today). The intention was to
> redirect certain phases(e.g. setup, postinst) of the ebuild merge process to
> stdout as well as to the logfile. If I did what you just said, I will miss
> these important messages on the stdout, because without the 'tee', they go only
> in one place, either stdout (which will end up in logfile with my ebuild
> invocation) or logfile.

The messages your looking for already have special handling by way of PORTAGE_ELOG*. When talking parallel merges, that's probably the best way of handling it. Even if there's little output, having pkg_setup() messages go to the screen gives no indication of what package it's coming from. There's also important messages in src_compile() just as often as pkg_setup() too.

Comment 42 devsk 2006-09-18 09:04:44 UTC

(In reply to comment #40)
> The user experience for portage has been traditionally "do it yourself"; maybe
> I don't want to use cron; maybe I want to use logrotate; maybe I'm on OSX and I
> have this Launch thing; who knows.

I never meant that we push something somebody's throat. I was talking about a default but configurable setup. Traditions are meant to be broken...;-) Traditionally, portage has thrown all garbage on screen, which not only slows down the emerge (please don't say use sreen because I proved in my thread that detached screen is still 2-3 times slower) but also forced people to invent their own log parsers and invent elog just to get to the important messages. The solution that I have proposed is much cleaner and in-portage solution. Its precisely what users should see: important messags on the screen and rest in the logs. I can scroll now in my terminal and see the whole 'emerge -e system' output and look at all the important messages it spewed.

Comment 43 devsk 2006-09-18 09:12:10 UTC

(In reply to comment #41)
> The messages your looking for already have special handling by way of
> PORTAGE_ELOG*.

ELOG is an after thought. Parallel or no parallel, people should see only important messages on the screen.

> When talking parallel merges, that's probably the best way of
> handling it. Even if there's little output, having pkg_setup() messages go to
> the screen gives no indication of what package it's coming from.

typically, they do (I haven't seen a single one so far which said something and I couldn't it relate to a package, and I have done emerge -e system and emerge -e world with my patch a fair number of times). If they don't, then ebuild writers should be asked to modify their ebuilds.

> There's also
> important messages in src_compile() just as often as pkg_setup() too.

nobody should be putting end user messages in src_compile(). That's bad ebuild manners....;-)

It seems to me that no one has tried to even use the patch/overlay to test it out and see how it feels. Just try an emerge -e system with the patch in, at least once. You may disagree with me on the techinicalities, but you have to understand that end users are not developers.

Comment 44 Alec Warner (RETIRED) archtester

2006-09-18 10:31:21 UTC

> The solution that I have proposed is much cleaner and in-portage solution. Its
> precisely what users should see: important messags on the screen and rest in

The person deciding what the user should see is the user; not us.  As long as there is a way to define where the output goes (unlike some; I like watching stuff scroll by) then I don't really care what the default is; as I will just change it.

Comment 45 devsk 2006-09-18 10:53:48 UTC

(In reply to comment #44)
> The person deciding what the user should see is the user; not us.  As long as
> there is a way to define where the output goes (unlike some; I like watching
> stuff scroll by) then I don't really care what the default is; as I will just
> change it.

But there is no portage support for not seeing not-so-useful stuff on the screen. How does a user achieve that? And from what I can tell on forums, most users want that.

With the patch, there is support for you to see the stuff scroll by.

Comment 46 Donnie Berkholz (RETIRED) gentoo-dev

2006-09-18 11:20:10 UTC

(In reply to comment #45)
> But there is no portage support for not seeing not-so-useful stuff on the
> screen. How does a user achieve that? And from what I can tell on forums, most
> users want that.

MAKEOPTS="-s" is the easiest way..

Comment 47 devsk 2006-09-18 11:38:39 UTC

(In reply to comment #46)
> MAKEOPTS="-s" is the easiest way..

it kills the logging as well. Moreover, it still leaves too much on screen e.g. end users don't really care about what "checking for size_t...." is? It may be useful only if an error occurs e.g. a configure check failure leads to pkg emerge failure.

Comment 48 Mudrii 2006-09-18 17:36:23 UTC

I think it is usefull and flexible future parallel compile without screen running as mad with stuff that I do not need and I can check later the compiler logs if a problem arise.
I test parallel patch and works well as I can tell.

Comment 49 Marius Mauch (RETIRED) gentoo-dev

2006-09-19 07:19:56 UTC

Disclaimer: I haven't actually looked at the patch (don't have time atm).

(In reply to comment #35)
> I think a good user experience is always when they don't have to do anything.

By that definition an install cd that unconditionally formats a users harddrive and installs the OS contained on the CD on it would be a good user experience ...

> I don't see any problem in using /var/log/portage/ by default if they enable
> parallel and don't set PORT_LOGDIR. Most products use /var/log/<product>
> without requiring users to set that dir.

Agreed, except that for portage PORT_LOGDIR has two purposes: a) define the log location and b) en-/disable logging. This is probably a bug in itself and should be fixed at some point (as PORT_LOGDIR is now also used by other components to assemble paths).

(In reply to comment #43)
> (In reply to comment #41)
> > The messages your looking for already have special handling by way of
> > PORTAGE_ELOG*.
> 
> ELOG is an after thought. Parallel or no parallel, people should see only
> important messages on the screen.

Your opinion. Doesn't mean it's the absolute truth.

> > When talking parallel merges, that's probably the best way of
> > handling it. Even if there's little output, having pkg_setup() messages go to
> > the screen gives no indication of what package it's coming from.
> 
> typically, they do (I haven't seen a single one so far which said something and
> I couldn't it relate to a package, and I have done emerge -e system and emerge
> -e world with my patch a fair number of times). If they don't, then ebuild
> writers should be asked to modify their ebuilds.

Just because you can doesn't mean everyone else can (also think about automated processes here). And why should ebuild writers mention the package name in a message if in all current use cases that info is available via other means? (IOW: redundant information)

> > There's also
> > important messages in src_compile() just as often as pkg_setup() too.
> 
> nobody should be putting end user messages in src_compile(). That's bad ebuild
> manners....;-)

Right ...

> It seems to me that no one has tried to even use the patch/overlay to test it
> out and see how it feels. Just try an emerge -e system with the patch in, at
> least once. You may disagree with me on the techinicalities, but you have to
> understand that end users are not developers.

It doesn't matter how it "feels". It really should limit deviation from the standard behavior to an absolute minimum (and changes to the default behavior should be done in separate bugs/patches).

(In reply to comment #47)
> (In reply to comment #46)
> > MAKEOPTS="-s" is the easiest way..
> 
> it kills the logging as well. Moreover, it still leaves too much on screen e.g.
> end users don't really care about what "checking for size_t...." is? It may be
> useful only if an error occurs e.g. a configure check failure leads to pkg
> emerge failure.

How does MAKEOPTS="-s" kill logging? All it does is preventing makes command echoing which is only relevant in extreme cases (so it's only a partial solution for the problem at hand anyway).

Summary: If you want to get this in you better isolate the individual changes (so a "parallel" patch only adds that feature without changing the normal behavior more than necessary).

Btw, an utopic though of mine (probably not feasable inside portage) would be to  parallize not at the ebuild level but at the phase level, so you can allocate resources better: if you have a lot of cpu but little IO use multiple compile but few unpack/install processors (and vice versa).
Just something to think about.

Comment 50 devsk 2006-09-19 08:46:42 UTC

(In reply to comment #49)
> Agreed, except that for portage PORT_LOGDIR has two purposes: a) define the log
> location and b) en-/disable logging. This is probably a bug in itself and
> should be fixed at some point (as PORT_LOGDIR is now also used by other
> components to assemble paths).

I have no problems with having a 'log' feature but I thought current mechanism worked well and was sufficient as an enabler as well as a location.

> > ELOG is an after thought. Parallel or no parallel, people should see only
> > important messages on the screen.
> 
> Your opinion. Doesn't mean it's the absolute truth.

You mean the absolute truth is that a user joe who just wants to enjoy gentoo, wants to look at "checking for  sigprocmask..." messages? Nope. Ask some real users (and not developers). He only wants to know if he "need to run vmware-config.pl after installing vmware-server". And he wants to be able to cut&paste his errors onto the forums.

> Just because you can doesn't mean everyone else can (also think about automated
> processes here). And why should ebuild writers mention the package name in a
> message if in all current use cases that info is available via other means?
> (IOW: redundant information)

I don't see package name *required* to be mentioned in the message. The messages are alerts for users, and should be self-descriptive (i.e. if the message doesn't make sense in its own without the package name, the ebuild writer should put the name in there; how non-intuitive is that?). If they are not self-descriptive, ebuild writers should be required to make them descriptive.

> It doesn't matter how it "feels". It really should limit deviation from the
> standard behavior to an absolute minimum (and changes to the default behavior
> should be done in separate bugs/patches).

I understand that this change looks drastic but it is not really. People fire off emerge in a terminal or 'screen' session and go back only to see if its completed or not. The bahaviour will still be the same, only it will be faster and more meaningful this time. Its a change which is required for parallel. As long as the changes are discussed and designed well, and correctness is maintained, I don't see any problems in drastic changes.

> How does MAKEOPTS="-s" kill logging? All it does is preventing makes command
> echoing which is only relevant in extreme cases (so it's only a partial
> solution for the problem at hand anyway).

Marius, I know what it does, I have used it myself. What I meant was that it is a big hammer and I don't see my gcc commands in my log file. It kills stdout and as a result, nothing goes to log file either. Moreover, it doesn't kill everything...;-)

> Summary: If you want to get this in you better isolate the individual changes
> (so a "parallel" patch only adds that feature without changing the normal
> behavior more than necessary).

This can definitely be done. But output handling is a requirement for parallel. So, we need to finalize and get the redirection part in before parallel if we want a two-step patch.

If people agree with my proposal, I can prepare a separate patch for the output redirection related stuff.

Comment 51 Donnie Berkholz (RETIRED) gentoo-dev

2006-09-19 08:54:08 UTC

(In reply to comment #50)
> You mean the absolute truth is that a user joe who just wants to enjoy gentoo,
> wants to look at "checking for  sigprocmask..." messages? Nope. Ask some real
> users (and not developers). He only wants to know if he "need to run
> vmware-config.pl after installing vmware-server". And he wants to be able to
> cut&paste his errors onto the forums.

No, but Joe does want to know that his compilation hasn't locked up and is proceeding rather than being stuck in the same command for hours. It would be useful if every package used a Kconfig-like build system (e.g. kernel, udev, samba), but we can't do anything about that.

Comment 52 devsk 2006-09-19 09:59:42 UTC

(In reply to comment #51)
> No, but Joe does want to know that his compilation hasn't locked up and is
> proceeding rather than being stuck in the same command for hours. It would be
> useful if every package used a Kconfig-like build system (e.g. kernel, udev,
> samba), but we can't do anything about that.

e.g. He will be seeing a message which says "emerging zlib..." for last hour. He has the option of looking at the log and deciding whether the package build has locked or not. You are trying to argue for a case which happens 0.001% of the times (I have seen builds fail on me, but never froze on me in my last 10 years of open source usage), and when it already can be handled optionally.

Comment 53 devsk 2006-09-19 13:48:40 UTC

So, have I convinced everybody about the output?...:-P

I have Jason's (depgraph and vdb related) and Zac's (fork related) comments in my todo list. Is there anything else?

please ack, so I can start to get going in this direction.

Comment 54 devsk 2006-09-20 07:25:50 UTC

Jason, --resume will be broken in parallel with the depgraph changes you mentioned. In my patch the merge slot was kept as in "['ebuild', '/', 'sys-apps/portage-2.1.1', 'merge', '0']", so even though I would get a linear list from mtimedb["resume"], I could easily convert it into my merge slot list. This would mean that I still need to keep merge slot in mymergelist.

Comment 55 Marius Mauch (RETIRED) gentoo-dev

2006-09-21 10:59:04 UTC

About how to handle output:
redirect all the usual output of the build process (everything that is generated by `emerge foo`) to the logfiles and only print status messages to stdout (basically the info that is logged in emerge.log).
No fancy filtering.

Comment 56 devsk 2006-09-21 11:59:19 UTC

> redirect all the usual output of the build process (everything that is
> generated by `emerge foo`) to the logfiles and only print status messages to
> stdout (basically the info that is logged in emerge.log).
> No fancy filtering.

Why would printing messages like "Run vmware-config.pl before running vmware" or "Aggressive flags in OO can potentially break it, don't file bugs on bugs.gentoo.org if you do this" on the stdout be bad? I consider it a requirement not something fancy.

Please help me understand your concern.

Comment 57 Zac Medico gentoo-dev

2006-09-21 12:25:53 UTC

How about if we directly all of the ebuild command output to the log file and display the tail of the log on failure, along with the path of the complete log file so that the user can find it easily?  If the parent process handles the merge phase, then we can have that show the preinst, merge, and postinst on the terminal, and log the preinst/postinst to the log file as well.

Comment 58 devsk 2006-09-21 13:21:55 UTC

(In reply to comment #57)
> How about if we directly all of the ebuild command output to the log file and
> display the tail of the log on failure, along with the path of the complete log
> file so that the user can find it easily?  If the parent process handles the
> merge phase, then we can have that show the preinst, merge, and postinst on the
> terminal, and log the preinst/postinst to the log file as well.

You said exactly what the patch does. Failure case is handled exactly like that (with the tail of the log and pointing to the full log path) in the current patch (one attached to the bug). The patch logs everything into the logfile but shows just setup and postinst on the terminal. I don't understand what the problem is?

Is your point that we want to see preinst, merge and postinst messages on the stdout, instead of just setup and postinst?

If parent is doing the merge (which it is as per our previous discussions), and we are proposing it do the output to terminal as well as logging, we still missed the setup messages and got the junk of merge messages.

Can someone please explain what's so evil about what I have done with output redirection? We are trying to find so many other ways, but nobody tells what's really wrong with what I did. Apparently, no one's bothered to install it, and test it to see how it performs in different failure conditions.

Comment 59 Zac Medico gentoo-dev

2006-09-21 13:45:56 UTC

(In reply to comment #58)
> Can someone please explain what's so evil about what I have done with output
> redirection? We are trying to find so many other ways, but nobody tells what's
> really wrong with what I did.

You've put a lot of special case logic (quiet in the actionmap, for example) into the output handling.  Generally speaking, it's best to avoid special cases  and treat things in a more generically whenever possible.

Comment 60 devsk 2006-09-21 14:04:41 UTC

(In reply to comment #59)
> You've put a lot of special case logic (quiet in the actionmap, for example)
> into the output handling.  Generally speaking, it's best to avoid special cases
>  and treat things in a more generically whenever possible.

The reason for putting quiet in action map is simple. If we decide that a new phase needs similar output handling in future, we can simply add it there and make its quiet False. I thought it was more generic and fine-grained solution for future than just having everything in the logfile-only or stdout-only for all phases.

It would be special case handling if I hadn't put it in action map and put checks for different phases scattered around code. With actionmap, its nicely consolidated in one place. Whether running a phase will emit output is a property of that phase, where else to consolidate this property but the actionmap?

Comment 61 Zac Medico gentoo-dev

2006-09-21 15:57:23 UTC

(In reply to comment #60)
> It would be special case handling if I hadn't put it in action map and put
> checks for different phases scattered around code. With actionmap, its nicely
> consolidated in one place. Whether running a phase will emit output is a
> property of that phase, where else to consolidate this property but the
> actionmap?

You've hard coded special cases for ouput into the action map.  If the verbosity level is going to vary between phases like that, it should be user configurable.

Comment 62 devsk 2006-09-21 16:08:23 UTC

(In reply to comment #61)
> You've hard coded special cases for ouput into the action map.  If the
> verbosity level is going to vary between phases like that, it should be user
> configurable.

We developers wrote the phases. We will be writing the new ones if we do. The control belongs in our hands. Users only see a coherent picture that they are supposed to see: something wrong with their setup if any, their package is being installed, and these are the steps to follow once the pkg is installed if any. Where do users come in picture to decide what a phase should output? If some of us (devs) decide that another phase needs to throw something on the terminal, that's a one word change.

The action map would be considered hard coded if it was desired that this behaviour be controlled by the end-user.

Comment 63 Alec Warner (RETIRED) archtester

2006-09-21 20:30:15 UTC

(In reply to comment #62)
> (In reply to comment #61)
> > You've hard coded special cases for ouput into the action map.  If the
> > verbosity level is going to vary between phases like that, it should be user
> > configurable.
> 
> We developers wrote the phases. We will be writing the new ones if we do. The
> control belongs in our hands. Users only see a coherent picture that they are

"Put another way, the Gentoo philosophy is to create better tools. When a tool is doing its job perfectly, you might not even be very aware of its presence, because it does not interfere and make its presence known, nor does it force you to interact with it when you don't want it to. The tool serves the user rather than the user serving the tool."

> supposed to see: something wrong with their setup if any, their package is
> being installed, and these are the steps to follow once the pkg is installed if
> any. Where do users come in picture to decide what a phase should output? If
> some of us (devs) decide that another phase needs to throw something on the
> terminal, that's a one word change.

The primary user of portage is me; when I think of a feature; I'm the first person that I think of (because I'm the guy I'm writing for; users getting benefits is just a fortunate consequence of FOSS).  I *AM* a user; as much as you think I'm some uber dev who knows what is going on (which is utterly false).
So when you say "why should a user be able to control the output of phases" it's because I think it would be useful to do so.

Put another way; I guess we should remove the "ebuild" tool because "the developers" determined that the order of phases will be "x,y,z" and god forbid you only want to run one phase.

I realize the current codebase is not full of choices (and this seems a bit hypacritical(spelled-wrong); however I think this should be a goal of the portage team.

Comment 64 devsk 2006-09-21 20:48:55 UTC

(In reply to comment #63)
Alec, You are too abstract for me. But you are sort of proving my point. If you strongly feel that users need to have control over the output of the phases, then why aren't we trying to do that, instead of using a big hammer approach where in all goes to logfile and/or stdout?

If a feature needs to be introduced wherein you expose this control to users, actionmap can be populated from that input very easily. This patch and the actionmap changes that I did, can still be the basis for that. Do you disagree with that?

Comment 65 Marius Mauch (RETIRED) gentoo-dev

2006-09-23 08:54:10 UTC

Again, the main thing here is to isolate changesets. I haven't said that your solution is completely bad, but *selective* output redirection is IMHO a very different issue than parallel compilation, so these two things shouldn't be lumped together.

Comment 66 devsk 2006-09-23 09:26:22 UTC

(In reply to comment #65)
> Again, the main thing here is to isolate changesets. I haven't said that your
> solution is completely bad, but *selective* output redirection is IMHO a very
> different issue than parallel compilation, so these two things shouldn't be
> lumped together.
> 

I am isolating the changesets but I need to have redirection change in before because parallel depends on it (perhaps more strong of a requirement than for the non-parallel case). It will be a waste if I develop something which devs here don't agree with. I don't want to do that. So, there is nothing wrong in understanding if I am on right track.

Comment 67 Zac Medico gentoo-dev

2006-09-23 13:36:37 UTC

(In reply to comment #66)
> I am isolating the changesets but I need to have redirection change in before
> because parallel depends on it (perhaps more strong of a requirement than for
> the non-parallel case).

Sorry, but I don't think any of us portage devs are convinced that the output should be special cased per-phase like that.

> It will be a waste if I develop something which devs here don't agree with.

Whether you want to continue development of the parallel build feature or not, I'm quite certain that this feature will be implemented before the final release of portage-2.1.2 final.  You can keep hacking on it or just sit back and wait for it.  The choice is yours.

Comment 68 devsk 2006-09-23 14:44:35 UTC

(In reply to comment #67)
> Whether you want to continue development of the parallel build feature or not,
> I'm quite certain that this feature will be implemented before the final
> release of portage-2.1.2 final.  You can keep hacking on it or just sit back
> and wait for it.  The choice is yours.

There is no point in me hacking away at it if you have decided to work on this feature and release it in 2.1.2 (which I hope is not far because we are at 2.1.2_pre1 in the tree).

So, how do you plan to handle the output in the final release?

Comment 69 Zac Medico gentoo-dev

2006-09-23 16:22:02 UTC

(In reply to comment #68)
> There is no point in me hacking away at it if you have decided to work on this
> feature and release it in 2.1.2 (which I hope is not far because we are at
> 2.1.2_pre1 in the tree).

I aim for a 2-3 month release cycle.  2.1.1 was released about 2 weeks ago, so 2.1.2 may be final in approximately 6-10 weeks (more or less, depending on what comes up).  I think we may be able to get the parallel feature within a couple of weeks though.

> So, how do you plan to handle the output in the final release?

I imagine it will be something like I've described in comment #57.  I'll work on a patch and post here when it's ready for testing.

Comment 70 devsk 2006-09-23 16:44:45 UTC

(In reply to comment #69)
> I aim for a 2-3 month release cycle.  2.1.1 was released about 2 weeks ago, so
> 2.1.2 may be final in approximately 6-10 weeks (more or less, depending on what
> comes up).  I think we may be able to get the parallel feature within a couple
> of weeks though.

that's longer than I expected.

> I'll work on a patch and post here when it's ready for testing.

Hey Great!! I will definitely be looking forward to your patch.

Do you wanna discuss the output redirection offline on IRC/forums maybe? I am not convinced that all phases are born equal, some are more equal than others, as far as end users are concerned. I know I have repeated this a bit and *no one* here agrees with me but for end user only important messages matter and if he/she has to go thru another step for getting those, they will likely forget or mess up in other ways. It is here that non-equality of phases' output behaviour comes into picture, because I feel setup and postinst are more equal than others in this regard.

Comment 71 Zac Medico gentoo-dev

2006-09-23 17:13:22 UTC

(In reply to comment #70)
> Do you wanna discuss the output redirection offline on IRC/forums maybe?

Sure, but it's probably best to save the discussion until I have a patch.  You can ping me in #gentoo-portage on irc.freenode.net.

Comment 72 Zac Medico gentoo-dev

2006-09-24 22:43:20 UTC

Perhaps it would be a good idea to have a daemon process that initiates all builds and installs.  When emerge is launched it could spawn a daemon if one isn't running already and then the daemon could automatically exit when there are no more jobs to process.  The advantage of this approach is that the daemon process could centrally manage the state of various portage resources that shouldn't be accessed by concurrent processes, such as the world file and the installed package database.  Users would then be able to submit build and/or install jobs concurrently at any time.  The daemon would track the state of the dependency graph and be able to notify the user if they try to submit a job that somehow conflicts with an existing one.

Comment 73 devsk 2006-09-25 07:43:55 UTC

(In reply to comment #72)
> Perhaps it would be a good idea to have a daemon process that initiates all
> builds and installs.  When emerge is launched it could spawn a daemon if one
> isn't running already and then the daemon could automatically exit when there
> are no more jobs to process.  The advantage of this approach is that the daemon
> process could centrally manage the state of various portage resources that

but this would require major overhaul of bin/emerge and pym/portage.py. They are not written in a way which can easily support this. It would require that we isolate the global access points in a separate module and streamline all global accesses thru it.

This will make the change a big one, because not only we will have to write a daemon but also the module which fires parallel builds on the daemon for the current emerge, if the depgraph has parallelism in it. I would really like this feature to be auto instead of user firing two emerges in two different windows. The advantage with automatically parallelizing the depgraph is that 'emerge -uD world' can be sped up a lot because of variety of pkgs. If user needs to fire multiple emerges, I think it won't be such a big deal.

So, we have two things to do and decide which one to tackle first:

1. Separate out the global access and make the doebuild a daemon based independent job which can be submitted, held or returned immediately because of failing dep checks on running jobs.
2. Take advantage of the inherent parallelism in the depgraph automatically.

These can be done "fairly" independently because only thing in the way of "2." because of "1." is the final "merge", which can be serialized in the short term.

"1." can (and probably should) be done  before "2." and even reduce the code needed in "2." but I think we should look forward to both a short as well as a long term solution.

Comment 74 Zac Medico gentoo-dev

2006-10-04 14:28:42 UTC

*** Bug 115027 has been marked as a duplicate of this bug. ***

Comment 75 devsk 2006-10-07 12:28:04 UTC

Zac, do we have a status on the patch? I can give it a twirl if you have something.

Comment 76 Zac Medico gentoo-dev

2006-10-07 12:46:07 UTC

We've got a lot of long standing (and extremely imporant) bugs fixed in 2.1.2 already.  For examples, see some of the lower bug numbers marked as dependencies of bug 147007.  For this reason, I don't want to merge any disruptive new features during this release cycle.  I plan to release 2.1.2_rc1 in about a week.  I'm sorry about the delay on this bug, but we can't allow it to hold back the release of these other long standing bugs.

Comment 77 devsk 2006-10-07 12:51:22 UTC

that means we won't see parallel portage in 2.1.2?

Comment 78 Zac Medico gentoo-dev

2006-10-07 18:37:53 UTC

(In reply to comment #77)
> that means we won't see parallel portage in 2.1.2?

That's correct.  Sorry, thought that I was clear in comment #76.  We have several very long standing and imporant bugs that are practically ready to release and it wouldn't be fair to hold those back be adding any more new features during this release cycle.  Hopefully we'll have this feature in 2.1.3 though.

Comment 79 devsk 2006-10-07 21:03:25 UTC

> Whether you want to continue development of the parallel build feature or not,
> I'm quite certain that this feature will be implemented before the final
> release of portage-2.1.2 final.  You can keep hacking on it or just sit back
> and wait for it.  The choice is yours.

now, that sounds little harsh in retrospect, doesn't it? Particularly, when I had given you a working patch and was ready to even modify to address your fork and vdb concerns. I had already cleaned up a lot of code but abandoned it because it sounded like you were excited by the idea and were gonna have a patch of your own ready in a couple of weeks...I am a little disappointed...:-(

Comment 80 Alec Warner (RETIRED) archtester

2006-10-07 21:54:12 UTC

(In reply to comment #79)
> > Whether you want to continue development of the parallel build feature or not,
> > I'm quite certain that this feature will be implemented before the final
> > release of portage-2.1.2 final.  You can keep hacking on it or just sit back
> > and wait for it.  The choice is yours.
> 
> now, that sounds little harsh in retrospect, doesn't it? Particularly, when I
> had given you a working patch and was ready to even modify to address your fork
> and vdb concerns. I had already cleaned up a lot of code but abandoned it
> because it sounded like you were excited by the idea and were gonna have a
> patch of your own ready in a couple of weeks...I am a little disappointed...:-(
> 

Are you looking for a handout?

Many of the features going in are YEARS overdue, and yours is all of two weeks?  You have a while to wait before being able to complain buddy ;)

But on a more serious note; sometimes plans don't always work out.  Zac seemed to not have much of one (he kept telling me "working on the little things") and suddenly He and Jason break out some awesome patches that fix a bunch of long standing issues.  However as a dev I'll gladly take them now than never; and especially now vs 6 weeks from now (6 weeks being a random estimate for testing and deploying parallel merge functionality in 2.1.2).  As far as functionality in the tree goes; you need it in early, as opposed to later.

Comment 81 devsk 2006-10-07 22:55:52 UTC

(In reply to comment #80)
> Many of the features going in are YEARS overdue, and yours is all of two weeks?
>  You have a while to wait before being able to complain buddy ;)

Alec, did you even read what I quoted?

I am in no hurry. I even said "I am disappointed". At no point did I complain. But the way he claimed it was going in 2.1.2 and snubbed me, I thought he was serious and had a patch ready himself.

Comment 82 devsk 2006-10-07 23:26:43 UTC

(In reply to comment #80)
> Are you looking for a handout?

If I was the type, I wouldn't have attached a working patch while opening the bug.

Comment 83 Mudrii 2006-10-08 10:32:47 UTC

devsk I think you should contact the Zak and try to find a functional solution for inclusion of the patch in portage.
I really would like to see your patch in the main tree but if you see the bugs list for portage you will understand that are many other critical issues.

I am using your patch and would like you to see working on it with other devs of portage to make it available solution in near future.

Comment 84 Zac Medico gentoo-dev

2006-10-08 19:30:17 UTC

(In reply to comment #79)
> I had already cleaned up a lot of code but abandoned it
> because it sounded like you were excited by the idea and were gonna have a
> patch of your own ready in a couple of weeks...I am a little disappointed...:-(

I wouldn't be too disappointed.  I think that most people who are familiar with the pace of portage development are pretty excited about the bugs that have been fixed recently.  Due to limited resources, it's never possible to get all the features people might hope for in a particular release.  If we hold back release cycles for features that aren't ready, then we risk stretching them too long while users suffer unnecessarily due to important fixes being unreleased.

Comment 85 devsk 2006-10-09 01:15:34 UTC

(In reply to comment #84)
> Due to limited resources, it's never possible to get all

that sounds lame. what offends me is the fact that you didn't want the resources that were offerred to you. You arrogantly chose to snub me. what did you mean by "you can keep hacking on it or sit back and wait for it, choice is yours"? can you explain?

If I am familiar enough with portage code that I could write a workable patch for this feature, I could most definitely fix the issues that you raised. I am not some joe user asking for his fav feature to be included in portage. And no one in my position would like to be treated like one.

Comment 86 Simon Stelling (RETIRED) gentoo-dev

2006-10-09 01:27:23 UTC

Seeing the discussion drifts a bit, could you guys please take that discussion somewhere else? The bug is already long enough to read through, no need to make it even harder. The bug should only be about the technical arguments.

Thanks

Comment 87 Zac Medico gentoo-dev

2006-10-09 09:06:33 UTC

(In reply to comment #73)
> This will make the change a big one, because not only we will have to write a
> daemon but also the module which fires parallel builds on the daemon for the
> current emerge, if the depgraph has parallelism in it. I would really like this
> feature to be auto instead of user firing two emerges in two different windows.

The way that I imagine it, the daemon will manage the depgraph and parallelize automatically (appropriately for how user has configured it).  That way, the user will be able to submit parallel jobs at any time.  Since the daemon will manage the depgraph, it will be in a position to anyalyze new jobs for possible conflicts with existing jobs and reject them if necessary.  If the daemon has already spawned enough builds to consume all of it's allotted resouces, then it can add submitted jobs to the queue and process them when more resources become available.

Comment 88 devsk 2006-10-13 09:53:49 UTC

(In reply to comment #87)
> The way that I imagine it, the daemon will manage the depgraph and parallelize
> automatically (appropriately for how user has configured it).  That way, the
> user will be able to submit parallel jobs at any time.  Since the daemon will
> manage the depgraph, it will be in a position to anyalyze new jobs for possible
> conflicts with existing jobs and reject them if necessary.  If the daemon has
> already spawned enough builds to consume all of it's allotted resouces, then it
> can add submitted jobs to the queue and process them when more resources become available.
 
I think daemon is independent of the parallelization. When a user says 'emerge -uDN world', some logic still has to figure what requests to send to the daemon, unless you wanna send "emerge -uDN world" literally to the daemon. It can send the linearlised mergelist that is generated by the current code or it can send individual requests one after the other as determined by the depgraph. In both cases, we have created the depgraph already, so there is little chance of conflict. We could very well spawn a separate ebuild for each of these in the same process and achieve the same thing.

the more I think about the daemonised approach, less its making sense to me. Only gain that we get out of it is that users can fire two emerges at the same time and don't have to worry locking or dependency conflict issues. How many times users do that? How many times would they do that if we provide them a parallel feature? Its too big a change for very little gain.

Even if you wanna take the daemon approach, there is no harm in writing the parallel feature with ebuild spawn in the short term because all you will be doing is to replace this spawn with a submit_job_to_daemon() in the future.

Comment 89 Zac Medico gentoo-dev

2006-10-13 16:12:03 UTC

(In reply to comment #88)
> I think daemon is independent of the parallelization. When a user says 'emerge
> -uDN world', some logic still has to figure what requests to send to the
> daemon, unless you wanna send "emerge -uDN world" literally to the daemon. It
> can send the linearlised mergelist that is generated by the current code or it
> can send individual requests one after the other as determined by the depgraph.

The daemon will have handle pretty much everything.  When the user spawn an emerge process, it will essentially act as a client for the user to communicate with the daemon.  All of the dependency calculations need to be handled by the daemon so that it can check for conflicts in submitted jobs or even merge parts of submitted jobs together.  For example, if the user starts a world update and then later submits a job requiring one or more packages that have already been scheduled for installation, the daemon will be able to see the overlap between the two jobs and schedule builds appropriately (possibly adjusting build order so that a smaller job will complete sooner than a longer one, if possible).

Comment 90 devsk 2006-10-13 23:03:40 UTC

So, instead of the current process doing the work, all the work is done in the daemon. What do we gain with this approach? I understand that the deps, conflict resolution and resource allocation can happen in one central place, but those are important only if users fire mutliple emerges. I mean we should be trying to solve some current problem when we say that we daemonise the portage.

Unless, you envision remote emerges? Is that what you are seeing? A daemon based portage farm of identical machines? that too is a long shot for such huge effort.

Comment 91 Zac Medico gentoo-dev

2006-10-13 23:22:45 UTC

(In reply to comment #90)
> I understand that the deps,
> conflict resolution and resource allocation can happen in one central place,
> but those are important only if users fire mutliple emerges.

This is an import use case.  Why should we settle for anything less?

Comment 92 Brian Harring (RETIRED) gentoo-dev

2006-10-13 23:27:59 UTC

(In reply to comment #91)
> (In reply to comment #90)
> > I understand that the deps,
> > conflict resolution and resource allocation can happen in one central place,
> > but those are important only if users fire mutliple emerges.
> 
> This is an import use case.  Why should we settle for anything less?

Because daemonized handling is a superset of actuall build parallelization; iow, do it in steps (folks want parallelization now, not seeing clamoring for a daemon however ;)

Comment 93 Zac Medico gentoo-dev

2006-10-13 23:44:33 UTC

(In reply to comment #92)
> Because daemonized handling is a superset of actuall build parallelization;
> iow, do it in steps (folks want parallelization now, not seeing clamoring for a
> daemon however ;)

Are you kidding? We all know that lots of people run separate emerge instances in parallel.

Comment 94 Brian Harring (RETIRED) gentoo-dev

2006-10-13 23:59:47 UTC

(In reply to comment #93)
> (In reply to comment #92)
> > Because daemonized handling is a superset of actuall build parallelization;
> > iow, do it in steps (folks want parallelization now, not seeing clamoring for a
> > daemon however ;)
> 
> Are you kidding? We all know that lots of people run separate emerge instances
> in parallel.

Yep, they do.  Have for years also without requiring a daemon, despite various portage devs scheming (myself included) to offer it as an option; folks have known doing that can bite you in the ass also, but gotten by fine.

That said, you're ignoring my point that this bug is about "parallel portage can reduce build times", not "I want my portage daemonized, and want it to be parallelized in build execution".

Daemonizing portage *still* requires being able to parallelize building, as I said, a superset.  And heres why daemonizing should be seperated-

1) portage wasn't designed for it; straight through the code, track the print call usage sometime
2) track the sys.exit usage sometime.
3) got an obj for being able to ring buffer output, so that it can be fed to std term?  Got any code to handle even doing the equivalent of tee for forcing output from the daemonized process to the querying process?  Forcing all output to a log is a bit hackish...
4) raw_input...
5) users modifying ebuilds; portage isn't going to handle that in a pretty fashion, less of an issue if the portage process isn't hanging around
6) dug out all potential deadlocks yet for all requests of portage?  simple example, where is the sync locking/notification?  What about cache protection, awareness that the cache just changed under your feet (see #5)?  Portage just regenerates it on the fly, since it was designed for short term invocation...

Fair bit more to it; thats also ignoring that daemonizing portage means you *really* should be proxying all requests into the daemonized instance, meaning you have to grow either transparent proxying or do a lot of boilerplate for passing commands to it.

*Meanwhile*, you *still* need to be able to do parallelized builds.

As I said, one step at a time is a bit saner.

Comment 95 Zac Medico gentoo-dev

2006-10-14 00:22:43 UTC

(In reply to comment #94)
> *Meanwhile*, you *still* need to be able to do parallelized builds.

Meanwhile, we still need an acceptable patch.

Comment 96 devsk 2006-10-14 10:04:28 UTC

one problem I see following jason's suggestion is that the mtimedb resumelist is a linearized graph. If I get rid of the merge_slots I use, I have to resume non-parallel or recreate a depgraph from the resume list. I don't see a good solution to this without keeping the merge slots in the resumelist itself. One more advantage the current approach has is that the parallel build is separated from the depgraph traversal.

moreover, now "--skipfirst" is more like "--skipfailed" because there is no "first" (in the sense of linearized list) anymore with parallel.

Comment 97 Thomas Bettler 2006-10-14 10:33:14 UTC

With parallel emerge we should really leave the --skipfirst kind of parameters. Packages should only emerge as long as there is no depending package that failed, I guess your slotting mechanism should do that?

But the point of interest might be, sould we abort on indirect dependancies too? 

Scenario: say we do an "emerge -uD world" but lower level fail (i.e. hal), while intermediant dependencies are in place (i.e. kdelibs), so should higher (i.e. konqueror) continue to update?

Comment 98 devsk 2006-10-14 11:04:47 UTC

(In reply to comment #97)
> With parallel emerge we should really leave the --skipfirst kind of parameters.

I was merely referring to the internal implementation of --skipfirst, which is complicated by parallel. It will still be called --skipfirst.

> Scenario: say we do an "emerge -uD world" but lower level fail (i.e. hal),
> while intermediant dependencies are in place (i.e. kdelibs), so should higher
> (i.e. konqueror) continue to update?

I don't think we should go beyond the depgraph as first cut, and stop as soon as a tree level fails. Further optimizations may be possible but not desirable (to thrash out bugs) in the first cut.

Comment 99 Thomas Bettler 2006-10-14 12:15:59 UTC

Excuse me if expressed it unclear:
I whish to leave and abandon the path of the --skipfirst parameter. 
Kill this parameter and we will live without it.
It's simply not nessesary and invites to abuse it. 
If it fails, it fails and no packages in need of the failing one should be emerged!

[Though for RDEPEND style dependancies things might be different]

Comment 100 devsk 2006-10-14 14:28:24 UTC

(In reply to comment #99)
> Excuse me if expressed it unclear:
> I whish to leave and abandon the path of the --skipfirst parameter. 
> Kill this parameter and we will live without it.

oops. sorry!

I know it can be abused and introduce hard to track bugs in customer installs but its sort of necessary evil (and that's probably why it was probably allowed to be in). I mean if  'emerge -e world' is failing on a package, we can't expect users to file a bug, wait for the fix, fix the ebuild of the package with the patch and continue with --resume. Or post on forums (firefox may be broken, so he may not even be able to post), and wait for people's response. Recent expat like breakages can heavily limit user's choices for recovery and resumption.

Comment 101 Thomas Bettler 2006-10-15 04:29:03 UTC

Well, I understand your concern, but would like to discuss the question more theoretically, before getting to much into examples.

Situation: package A is in DEPEND for B, B is in DEPEND for C, C -> D, D -> E. We have an upgrade available for  A, B, C and E (without D)

So here some questions I want to ask you. (Since I'm no dev I have no final answear, I leave the decision up to you devs...)
1. Should B upgrade if A fails?
2. Should C upgrade if A fails?
3. Should E upgrade if C fails?

If there is no consense but 2 or more opinions to these questions, we may regulate the portage behaviour through adequate parameters.

Comment 102 Alec Warner (RETIRED) archtester

2006-10-15 09:10:39 UTC

We have a gentoo-portage-dev ML for a reason; bugs is not really the place to hash out implementation; as you end up with a bug just like this one with 100+ comments ;)

Comment 103 Marius Mauch (RETIRED) gentoo-dev

2007-01-11 03:57:31 UTC

*** Bug 88837 has been marked as a duplicate of this bug. ***

Comment 104 Marius Mauch (RETIRED) gentoo-dev

2007-01-11 13:06:55 UTC

*** Bug 51414 has been marked as a duplicate of this bug. ***

Comment 105 Matz Rasmus 2007-02-17 15:06:46 UTC

Has there been any progress on this matter? I have happily been using the patch, but now I soon need to upgrade my portage but I would very much like to keep the functionality of this patch.

Comment 106 devsk 2007-02-19 17:20:52 UTC

I have modified the patch to fix issues raised:

1. zmedico's issue: get rid of fork() - now I use spawn with --nodeps. Led to some more minor changes (like protecting resume database).
2. jason's issue: cruft that generated merge slots during graph generation is gone.
3. genone's issue: I have convinced myself of the output issue and gotten rid of 'special casing' cruft.

I will be attaching the updated patch soon.

Comment 107 devsk 2007-02-23 17:33:59 UTC

Created attachment 111065 [details, diff]
Patch against current stable portage (2.1.2-r9)

The patch is cleaned up and is much shorter. The fork is gone. Output redirection is simpler and requires much less change. Please reconsider for inclusion.

Comment 108 devsk 2007-02-23 18:04:07 UTC

I forgot to add: if '-parallel' is in FEATURES, the behavior and output is same as before, except maybe for --nodeps option where 'Calculating dependencies...' is not printed, which is fine because its --nodeps and user wonders why portage is calculating deps.

Also, just to quantize the gains with parallel:

$ time FEATURES=-parallel  emerge -1 gpm lsof sed grep bash patch gawk

real    2m24.704s
user    1m55.537s
sys     0m37.618s

$ time emerge -1 gpm lsof sed grep bash patch gawk

real    1m41.878s
user    1m58.901s
sys     0m37.089s

No ccache. Repeated both 3 times to remove caching effect.

That's 30% gain by using parallel or 42% loss by not using parallel depending upon how you look at it.

Comment 109 devsk 2007-02-23 21:02:32 UTC

Created attachment 111081 [details, diff]
The incremental patch to fix a copy-paste bug

Comment 110 devsk 2007-02-23 21:18:48 UTC

Created attachment 111086 [details, diff]
Patch for 2.1.2-r10 in the tree

Because I had taken the patch for #167450 in my patch, it created confusion. This patch is built against the latest in the portage tree to avoid apply hassles.

Comment 111 devsk 2007-02-23 23:21:02 UTC

Created attachment 111095 [details, diff]
Fix a --fetchonly related bug

Comment 112 devsk 2007-02-24 02:53:15 UTC

Created attachment 111106 [details, diff]
Add more useful info in the status

Sorry, for the flurry of updates. This is probably it for now.

Comment 113 devsk 2007-02-24 16:57:11 UTC

Created attachment 111136 [details, diff]
Fix minor color bug.

Comment 114 devsk 2007-02-27 16:31:43 UTC

Created attachment 111434 [details, diff]
Update patch to r11, pass nospinner to child

I was wondering if the patch could be released as a hardmasked version to get some testing and review feedback from general public. There is no drastic ugly in the code changes. I think I have gotten the flow right, and the rest is just cleaning up code and maybe optimizing it in some python usages.

Comment 115 devsk 2007-03-01 19:14:27 UTC

Created attachment 111724 [details, diff]
Further cleanups, size reduction

move to r12, clean up code and reduce the patch size.

Comment 116 devsk 2007-03-20 06:05:16 UTC

a gentle ping: how are we doing on the review of this patch? its been long time since I posted the patch.

Comment 117 devsk 2007-03-26 01:22:55 UTC

Is there any hope of seeing this thing in ever? My recent attempt to bring forward to 2.1.2.2 resulted in a huge reject file. Its getting progressively difficult to maintain this patch because a few byte change (even a white space) can trigger huge reject because the patch moves big chunks of code around (without changing them much though).

Comment 118 devsk 2007-03-26 02:56:28 UTC

Created attachment 114444 [details, diff]
Patch against 2.1.2.2

being totally shameless, here is another 'bring-it-fwd'.

1. Bring fwd to 2.1.2.2
2. Fix 'qsize' undefined.
3. Fix a remnant of '--resume --skipfirst not working'.

Comment 119 devsk 2007-03-26 15:29:31 UTC

Created attachment 114501 [details, diff]
Some more misc fixes.

Comment 120 Zac Medico gentoo-dev

2007-04-01 09:50:22 UTC

(In reply to comment #117)
> Is there any hope of seeing this thing in ever?

I'm working on a new resolver at the moment, so I don't want to merge you patch at the moment.  I'd certainly like to integrate it into the new resolver pretty soon though.

Comment 121 devsk 2007-04-02 02:45:28 UTC

Created attachment 115212 [details, diff]
Fix broken world update

I broke world file updates...my bad! Now fixed.
Also, got rid of the 'whitespace' diff...:)

Comment 122 devsk 2007-04-03 23:07:05 UTC

Created attachment 115395 [details, diff]
Fix for a small typo

Comment 123 devsk 2007-04-11 18:28:00 UTC

Created attachment 115986 [details, diff]
Fix a small bug reported in the forums

Comment 124 devsk 2007-05-02 17:40:38 UTC

Zac, how are we doing on the resolver? Just a friendly neighborhood ping...:)

Comment 125 devsk 2007-05-30 23:58:16 UTC

Zac, can we get this thing in? Its been two months since I got an update from you. I don't see any reason why this patch needs to wait for the resolver anyway.

Comment 126 devsk 2007-06-05 15:30:51 UTC

Created attachment 121259 [details, diff]
Another painful "bring it fwd"

Can someone please explain what's wrong with this patch going in without the resolver? The resolver has been in the works for last 3 months (or may be more) and this patch is just sitting there for no reason for such a long time. I don't think these intersect with each other at all.

Comment 127 devsk 2007-06-06 19:03:15 UTC

Created attachment 121347 [details, diff]
Fix a bug triggered by circular deps and reported on forums

Comment 128 devsk 2007-06-08 03:43:09 UTC

Created attachment 121467 [details, diff]
Fix a resume bug with --nodeps

Comment 129 Steve L 2007-06-12 19:12:26 UTC

(In reply to comment #126)
> Another painful "bring it fwd"
> 
> <snip whinge;>
I don't think these intersect with each other at all.

You're right IMO; make it work with pkgcore and you're laughing. Have there 
been any other reported problems from usrs?

Comment 130 devsk 2007-06-12 19:39:18 UTC

> Have there been any other reported problems from usrs?

nothing apart from what you see fixed here in this bug. The users typically post their problems on that thread and I release a fix here (typically within a day). I wish more people could use it and find problems. I have been using the patch for the longest time and find it pretty mature now. And I find it useful enough (for creating livecds and new installs or doing world updates after a long gap on a dual core) to maintain it out of tree for such a long time.

I haven't given pkgcore a serious look. May be I should.

Comment 131 Marius Mauch (RETIRED) gentoo-dev

2007-06-13 23:58:05 UTC

For your information:
As this patch is quite invasive (not a criticism, just a fact) it would only go into trunk, however trunk will almost certainly not see any release before the new resolver gets in, so there is little reason to merge this now.

Comment 132 Stefan Schweizer (RETIRED) gentoo-dev

2007-06-14 05:40:43 UTC

I find it a shame to see how unable the portage people are to merge this patch. Heck, then make an extra release just for this bug - release early and often they say. You sound like it would cost you a lot to get a new release out. I tell you
: it is easy, just increment a version number and make a tarball and upload it somewhere and put an ebuild into the tree.

Just get it done and make people happy.

Comment 133 Zac Medico gentoo-dev

2007-06-14 05:47:19 UTC

(In reply to comment #132)
> : it is easy, just increment a version number and make a tarball and upload it
> somewhere and put an ebuild into the tree.

It's not as simple as that.  When we release something with a certain feature, then users expect that we provide support for that feature.  I have no intention of supporting the feature as it is implemented in the current patch.  I'm working on a different implementation.  Please have patience.

Comment 134 devsk 2007-06-14 14:55:15 UTC

(In reply to comment #133)
> (In reply to comment #132)
> > : it is easy, just increment a version number and make a tarball and upload it
> > somewhere and put an ebuild into the tree.
> 
> It's not as simple as that.  When we release something with a certain feature,
> then users expect that we provide support for that feature.  I have no
> intention of supporting the feature as it is implemented in the current patch. 
> I'm working on a different implementation.  Please have patience.
> 
you only have to look at this bug report to see how good the support for this feature has already been. I have fixed every single little problem that has been observed by the early testers.

I don't think support is an issue. Nobody expects you to put it into ARCH right away.

<start rant>

What annoys me the most is that you make claims about this new super-duper resolver which I haven't seen any of in last 6 months. Nobody knows what and how it differs from the current implementation. It has always appeared to me that you wanted to do this parallel thing yourself (from Sep 2006) but never really worked on it. I have no problem with you doing it from scratch or whatever. But at least put a freaking time frame to it, respect that time frame and deliver on it.

The current resolver and parallel implementation is good enough for me and for many users who have tested it. Its now and working. Your resolver is future with an unspecified delivery date, and has been like that for quite some time. And from what I have observed (I hang out but I am not much of a chatter) on irc, you don't have much on it at this time. You are acting like Pavel who won't let the useful suspend2 get in but won't improve the suspend himself either.

<end rant>

Comment 135 Zac Medico gentoo-dev

2007-06-14 21:23:54 UTC

(In reply to comment #134)
> I don't think support is an issue.

It is very much a support issue.  Portage (the package manager) has many users but the number of developers actively working on it and closing bugs is actually quite small.  Because of this, there are lots of bugs that have gone unfixed for long periods of time.  Even if your patch is absolutely perfect, I'd still have to give it a complete review before merging and releasing it.  Due to limited resources, and the fact that your patch affects code that I intend to completely replace, merging and releasing you patch is not an option.  However, I will certainly consider the features of your patch while implementing the new resolver.

Comment 136 devsk 2007-06-14 23:09:52 UTC

(In reply to comment #135)
> Due to limited resources, and the fact that your patch affects code that I
> intend to completely replace, merging and releasing you patch is not an option.

which part of the code is that? What's resolver got to do with parallelization loop anyway? This only means that you haven't really looked at and reviewed the patch at all in last 9 months.

If I was you and someone submitted a useful patch, I would apply it in my local system, test it to see if it does what its supposed to do. Then, review the code line by line (which Jason Stubbs actually did) and ask the submitter to fix anything I find objectionable and ask to resubmit. Repeat until all objections are fixed. And then, put the damn thing in.

Nine months and you haven't done any of these and you still reject the patch.

Moreover, you claim that you don't have resources and there are not many people working on portage. Rejecting other people's contributions without a reason is a good way to get more hands. I am offering a dev resource to you and what have you done to grab that in last 9 months? Have you made a single positive remark or appreciated what I have done. I personally think you are actually happy that nobody works on portage, so you can have monopoly over all decisions. This is not how open source works. It works the way I described in paragraph 2 above.

Comment 137 Zac Medico gentoo-dev

2007-06-14 23:55:39 UTC

(In reply to comment #136)
> which part of the code is that? What's resolver got to do with parallelization
> loop anyway?

See comment #72.  Your patch doesn't allow the user to submit new jobs and have the resolver integrate them with any existing jobs that may already be scheduled.

> If I was you and someone submitted a useful patch, I would apply it in my local
> system, test it to see if it does what its supposed to do. Then, review the
> code line by line (which Jason Stubbs actually did) and ask the submitter to
> fix anything I find objectionable and ask to resubmit. Repeat until all
> objections are fixed. And then, put the damn thing in.

I want the parallel tasks to run within detachable sessions such as those created by dtach (http://dtach.sourceforge.net/).

The bugzilla isn't a good way to collaborate on this.  Please, lets continue any further discussion in the #gentoo-portage irc channel or on the gentoo-portage-dev mailing list.

Comment 138 devsk 2007-08-04 04:20:17 UTC

Created attachment 126839 [details, diff]
Bring it fwd, fix exception on new install

2.1.2.7 has fallen off the tree, so update fwd.

Comment 139 Arthur Castro 2007-10-26 19:41:04 UTC

is there no more updates? or any news?

Comment 140 Zac Medico gentoo-dev

2007-10-26 19:45:42 UTC

I'm planning to integrate this feature pretty soon. Probably within the next week or two.

Comment 141 Arthur Castro 2007-10-26 20:03:54 UTC

Will it work just like this patch? or are you using dtach?

And where can I see when this will be integrated?

thanks.

Comment 142 Zac Medico gentoo-dev

2007-10-26 20:17:55 UTC

(In reply to comment #141)
> Will it work just like this patch? or are you using dtach?

It won't actually use dtach but will work in a similar way.

> And where can I see when this will be integrated?

It will be integrated in trunk, and there are instructions for checking that out and using it here:

http://www.gentoo.org/proj/en/portage/doc/testing.xml

Please direct any more questions or discussion to the gentoo-portage-dev@gentoo.org mailing list or the #gentoo-portage channel on irc.freenode.net.

Comment 143 devsk 2008-01-04 00:24:27 UTC

Do we have something we can test? Its been a couple of months since last update.

Comment 144 devsk 2008-01-05 18:25:24 UTC

Created attachment 140211 [details, diff]
Bring it fwd to 2.1.2.12

No >=2.1.3 for now...:-(

Comment 145 devsk 2008-01-29 07:24:25 UTC

Created attachment 142083 [details, diff]
Bring it fwd to 2.1.4

It kinda has become a necessity.

Comment 146 devsk 2008-02-06 20:42:29 UTC

Created attachment 142841 [details, diff]
Fix resume and update to 2.1.4.1

Comment 147 Anielkis Herrera 2008-02-15 16:32:28 UTC

Created attachment 143576 [details, diff]
patch por lastest portage version

I'm trying it now.. and I bring it to version portage-2.1.4.4, here is the patch I created

Comment 148 Anielkis Herrera 2008-02-15 16:38:23 UTC

(In reply to comment #147)
> Created an attachment (id=143576) [edit]
> patch por lastest portage version
> 
> I'm trying it now.. and I bring it to version portage-2.1.4.4, here is the
> patch I created
> 

sorry.. I used vim with 4 spaces and without tabs.. there are problems with the indentation in the patch.. and there is a modification(sorry again) in portage.py in function debug_print to find the recursives dependencies( the default function show to much (and difficult to process) information)

Comment 149 devsk 2008-02-26 19:13:26 UTC

Created attachment 144689 [details]
updated to 2.1.4.4

Comment 150 Anielkis Herrera 2008-03-25 03:06:57 UTC

I add an epatch line to the ebuild to aply the last patch, but it fail
later tried to edit the files by hand, guided by the patch file

without "parallel" in FEATURES it return this exception at the end:

...
>>> sys-apps/portage-2.1.4.4 merged.

>>> No packages selected for removal by clean
Traceback (most recent call last):
  File "/usr/bin/emerge", line 7403, in <module>
    retval = emerge_main()
  File "/usr/bin/emerge", line 7397, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 6826, in action_build
    retval = mergetask.merge(pkglist, favorites, mtimedb, merge_slots)
  File "/usr/bin/emerge", line 3997, in merge
    return self._merge(mylist, favorites, mtimedb, m_slots)
  File "/usr/bin/emerge", line 4150, in _merge
    del mtimedb["resume"]["mergelist"][0]
KeyError: 'resume'

 * Messages for package sys-apps/portage-2.1.4.4:

 * If you have an overlay then you should remove **/files/digest-* files
....

Comment 151 Pacho Ramos gentoo-dev

2008-04-09 20:52:12 UTC

Any news on this? Maybe too late for 2.1.5 but maybe for next version... :-/

Thanks a lot

Comment 152 Brandon Mintern 2008-06-21 00:04:51 UTC

Wow, it really is a shame that nearly two years later this is not in portage yet. With all the time you spent updating your patch, devsk, you could have written "newportage".

Comment 153 Matt Whitlock 2008-06-21 19:39:15 UTC

I'm glad to see someone else has recognized that this is a major deficiency of Portage.  I have a quad-core system, and it makes me cry to see Portage trudging through ebuilds at 25% CPU usage.  This problem is only going to get worse as we get 8- and 16-core processors.

Comment 154 Zac Medico gentoo-dev

2008-06-21 21:52:12 UTC

Please refrain from "me too" posts. This feature is certainly slated for inclusion. Any discussion about how to expedite the process should be directed to the gentoo-portage-dev@gentoo.org mail list. Thanks in advance for your cooperation.

Comment 155 Will Saxon 2008-06-22 00:55:11 UTC

(In reply to comment #154)
> Please refrain from "me too" posts. This feature is certainly slated for
> inclusion. Any discussion about how to expedite the process should be directed
> to the gentoo-portage-dev@gentoo.org mail list. Thanks in advance for your
> cooperation.
> 

You're obviously the gatekeeper for this feature, and this is obviously the focal point of discussion on the issue, so asking people to complain somewhere else doesn't make a lot of sense.

Your argument that the patch should not be implemented because you will be replacing it completely doesn't hold much water IMO. It should not matter at all whether a separate implementation is put into place as a stopgap measure today. You'll still be improving the implementation tomorrow, while people will get some/all of the benefit today. 

Instead, you're aggravated that there is continued debate about something you're already planning to do, while all the people who want the feature yesterday are aggravated that you have a life and choose to implement the feature on your schedule.

Comment 156 Zac Medico gentoo-dev

2008-06-22 01:39:47 UTC

Please direct all discussion about this feature to the gentoo-portage-dev@gentoo.org mailing list. Thanks in advance for your cooperation.

Comment 157 Patrick Borjesson 2008-06-22 02:53:58 UTC

(In reply to comment #155)
> (In reply to comment #154)
> > Please refrain from "me too" posts. This feature is certainly slated for
> > inclusion. Any discussion about how to expedite the process should be directed
> > to the gentoo-portage-dev@gentoo.org mail list. Thanks in advance for your
> > cooperation.
> >
> Your argument that the patch should not be implemented because you will be
> replacing it completely doesn't hold much water IMO. It should not matter at
> all whether a separate implementation is put into place as a stopgap measure
> today. You'll still be improving the implementation tomorrow, while people will
> get some/all of the benefit today. 
> 
> Instead, you're aggravated that there is continued debate about something
> you're already planning to do, while all the people who want the feature
> yesterday are aggravated that you have a life and choose to implement the
> feature on your schedule.

So make a patch, use it, and be happy. I don't get the big deal about it not being integrated into portage _right_now_. The portage developers obviously have move important things to do. If you want the current patch discussed, take it to the gentoo-portage-dev ml (as pointed out, multiple times)!

Comment 158 Zac Medico gentoo-dev

2008-07-11 23:17:16 UTC

There is parallel build support in svn now. It's controlled be --jobs and --load-average options (analogous to make's options). There is also parallel --regen support. There is a new --keep-going option that people here may also be interested in.

--jobs JOBS
		Specifies the number of packages to build simultaneously.
		Also see the related --load-average option.

--keep-going
		Continue as much as possible after an error. When an error
		occurs, dependencies are recalculated for remaining packages
		and any with unsatisfied dependencies are automatically
		dropped. Also see the related --skipfirst option.

--load-average LOAD
		Specifies that no new builds should be started if there are
		other builds running and the load average is at least LOAD (a
		floating-point number).

Comment 159 devsk 2008-07-12 00:56:37 UTC

Is it going to make it in 2.2 final release?

BTW, I like all the work you have put in 2.2. Really appreciate it! Great Job!

Comment 160 devsk 2008-07-12 06:57:24 UTC

Ok, I gave the portage in svn a run. And here are few things:

1. Why do I need to see tonnes of "--- replaced obj" or ">>> /usr/lib64/perl5" or "strip ..." merge/unmerge messages. Not good! Let these go to the log file.

2. It seems to not queue the jobs i.e. it starts 3 jobs and waits for them to finish before starting the next three, whereas 2.1.4.4 patch I had would start the next in queue as soon as one of those 3 jobs finished. Not good! Let's track job status for reporting as well as queuing the next one.

3. There is no consistent status output on the screen about which and how many jobs are running, failed or done successfully. All this is clobbered with messages in point 2. above. I think this is another thing we can use from the patch here.

Apart from that, I like several things in this work. So, once again, great job!

Comment 161 devsk 2008-07-12 07:13:36 UTC

s/messages in point 2/messages in point 1/

Comment 162 devsk 2008-07-12 07:23:47 UTC

after installing 2.2 and 'emerge --metadata', emerge -puDvN world took 9secs while it took 6secs with 2.1.4.4 with the same world set (the output showed same 26 packages to be updated). Why this slowdown compared to 2.1.4.4?

Comment 163 devsk 2008-07-12 19:11:19 UTC

# emerge -vp1 HTML-Parser HTML-Tagset Digest-SHA1 XML-LibXML Socket6 Net-SSLeay extutils-parsexs module-build

A simple example to demonstrate that job queuing is not working as intended. There are at least 5 packages which can emerged at the same time to begin with. But with --jobs=3, the scheduler merges 3, waits for all of them to finish and then starts next 3.

I just picked some random perl modules to get a fast run. Here is the information from depgraph in 2.1.4.4 that my patch uses to start jobs:

Package list for slot = 0
        ['ebuild', '/', 'perl-core/digest-base-1.15', 'merge', '0']
        ['ebuild', '/', 'dev-perl/HTML-Tagset-3.20', 'merge', '0']
        ['ebuild', '/', 'dev-perl/XML-LibXML-1.66', 'merge', '0']
        ['ebuild', '/', 'dev-perl/Socket6-0.20', 'merge', '0']
        ['ebuild', '/', 'dev-perl/Net-SSLeay-1.30', 'merge', '0']
Package list for slot = 1
        ['ebuild', '/', 'virtual/perl-digest-base-1.15', 'merge', '1']
        ['ebuild', '/', 'dev-perl/HTML-Parser-3.56', 'merge', '1']
Package list for slot = 2
        ['ebuild', '/', 'dev-perl/Digest-SHA1-2.11', 'merge', '2']
Package list for slot = 3
        ['ebuild', '/', 'dev-perl/module-build-0.28.08', 'merge', '3']
Package list for slot = 4
        ['ebuild', '/', 'dev-perl/extutils-parsexs-2.19', 'merge', '4']

Comment 164 Zac Medico gentoo-dev

2008-07-12 21:47:29 UTC

(In reply to comment #160)
> Ok, I gave the portage in svn a run. And here are few things:
> 
> 1. Why do I need to see tonnes of "--- replaced obj" or ">>> /usr/lib64/perl5"
> or "strip ..." merge/unmerge messages. Not good! Let these go to the log file.

The merge is currently executed in the main process. If we execute it in a background process like the the other tasks, that will solve it.

> 2. It seems to not queue the jobs i.e. it starts 3 jobs and waits for them to
> finish before starting the next three, whereas 2.1.4.4 patch I had would start
> the next in queue as soon as one of those 3 jobs finished. Not good! Let's
> track job status for reporting as well as queuing the next one.

The current algorithm is intentionally as conservative as possible in the sense that it will not execute a package if there are any packages in it's subgraph of deep dependencies scheduled to be executed. We can add on or more options to control the criteria for choosing packages. Those options will modify the behavior or Scheduler._choose_pkg(). The reasoning for the current conservative behavior is that in many cases it's beneficial (avoids breakage) to ensure that a package's subgraph of deep dependencies is up to date before executing the package itself.

> 3. There is no consistent status output on the screen about which and how many
> jobs are running, failed or done successfully. All this is clobbered with
> messages in point 2. above. I think this is another thing we can use from the
> patch here.

Scheduler._schedule_tasks() seems like a logical place to hook a status report.

(In reply to comment #162)
> after installing 2.2 and 'emerge --metadata', emerge -puDvN world took 9secs
> while it took 6secs with 2.1.4.4 with the same world set (the output showed
> same 26 packages to be updated). Why this slowdown compared to 2.1.4.4?

There are a lot more objects and there are also lots of changes in the way the dependency resolver works. We may be able to optimize some things to improve performance. Also note that 2.2_rc1 has a memory leak (bug #229069) which is fixed in svn.

Comment 165 devsk 2008-07-14 00:30:08 UTC

Created attachment 160298 [details]
2.1.4.4 with python 2.5.2

I just ran 'emerge -puDvN world' thru the profile.py and I am attaching the outputs.

The profiler in 2.5.2 seems to be twice as slow as the one in 2.4.4, the results may be relative. But 2.1.4.4 with 2.5.2 and 2.2 with 2.5.2 should be comparable. I can't really make much sense out of the data. See if you can.

Comment 166 devsk 2008-07-14 00:30:55 UTC

Created attachment 160299 [details]
portage 2.2 with python 2.5.2

Comment 167 devsk 2008-07-14 00:31:41 UTC

Created attachment 160301 [details]
2.1.4.4 with python 2.4.4

Comment 168 devsk 2008-07-14 03:42:44 UTC

Zac, the failed jobs are not handled correctly. Look at this:

# emerge -v -1 \<app-emulation/vmware-modules-1.0.0.16

These are the packages that would be merged, in order:

Calculating dependencies ... done!
[ebuild   R   ] app-emulation/vmware-modules-1.0.0.15-r1  0 kB

Total: 1 package (1 reinstall), Size of downloads: 0 kB

>>> Verifying ebuild Manifests...

>>> Emerging (1 of 1) app-emulation/vmware-modules-1.0.0.15-r1 to /

#

Gives me the prompt. I have to dig the logs figure that the compile actually failed. Can we print a message saying the package <blah> failed with the last 20 lines from its log file and then point the use the log file path?

Comment 169 Zac Medico gentoo-dev

2008-07-15 11:26:49 UTC

Error handling should be much better now. If the "echo" elog module isn't enabled then it will emulate it for the error messages (including the die messages). There's also a summary near the end listing the names of the failed packages.

The preinst, postinst, prerm, and postrm phases now run asynchronously, This allows the scheduler's poll loop to run so that other parallel tasks aren't starved for output handling while those phases are executing.

Next on the TODO list:

* Pass a scheduler callback into portage.movefile() for cases when mv needs to be spawned, allowing the scheduler to run while mv executes asynchronously.

* Pass a callback into the merge/unmerge code, for sending output to the log instead of stdout.

Comment 170 devsk 2008-07-15 22:23:20 UTC

(In reply to comment #169)
> Error handling should be much better now. If the "echo" elog module isn't
> enabled then it will emulate it for the error messages (including the die
> messages). There's also a summary near the end listing the names of the failed
> packages.
> 
> The preinst, postinst, prerm, and postrm phases now run asynchronously, This
> allows the scheduler's poll loop to run so that other parallel tasks aren't
> starved for output handling while those phases are executing.
> 
> Next on the TODO list:
> 
> * Pass a scheduler callback into portage.movefile() for cases when mv needs to
> be spawned, allowing the scheduler to run while mv executes asynchronously.
> 
> * Pass a callback into the merge/unmerge code, for sending output to the log
> instead of stdout.
> 

I think output processing and a valid overall status of jobs, in the terminal window as well as in the title should be a priority.

Performance is a bigger beast. And I think its not just about --jobs.

Comment 171 devsk 2008-07-16 19:01:41 UTC

You did it dog!

couple of minor issues:

0. We should print log location of the package being emerged, in case a user wants to look at it, like:

Emerging (1 of 1) sys-apps/portage-2.2_rc3 to /, log at /var/log/portage/<blah>.log

1.
>>> Emerging (1 of 1) sys-apps/portage-2.2_rc3 to /
>>> Jobs: 0 of 1 complete, 1 running, 0 merges, load average: 0.4, 0.2, 0.2
>>> Merging sys-apps/portage-2.2_rc3 to /
>>> sys-apps/portage-2.2_rc3 merged.
>>> Jobs: 1 of 1 complete, 1 running, 1 merges, load average: 0.6, 0.3, 0.2
>>> Auto-cleaning packages...

when 1 of 1 is complete, 1 can not be running...:-)

2. why an extra newline before ">>> Emerging (" and in:

>>> Auto-cleaning packages...

>>> No outdated packages were found on your system.

Apart from that: AWESOME!

Comment 172 devsk 2008-07-16 19:17:20 UTC

Zac, I have to apply this patch every time I want to create a tar for new release that I want to test. Can we do something which makes it easy for end users as well as devs to use the same script?

--- mkrelease.sh.orig   2008-07-16 11:41:59.000000000 -0700
+++ mkrelease.sh        2008-07-16 11:42:46.000000000 -0700
@@ -3,7 +3,8 @@
 RELEASE_BUILDDIR=${RELEASE_BUILDDIR:-/var/tmp/portage-release}
 SOURCE_DIR=${RELEASE_BUILDDIR}/checkout
 BRANCH=${BRANCH:-trunk}
-REPOSITORY=svn+ssh://cvs.gentoo.org/var/svnroot/portage/main
+# REPOSITORY=svn://cvs.gentoo.org/var/svnroot/portage/main
+REPOSITORY=svn://anonsvn.gentoo.org/portage/main
 SVN_LOCATION=${REPOSITORY}/${BRANCH}

 die() {
@@ -57,7 +58,7 @@

 echo ">>> Building release tree"
 cp -a "${SOURCE_DIR}/"{bin,cnf,doc,man,pym,src} "${RELEASE_DIR}/" || die "directory copy failed"
-cp "${SOURCE_DIR}/"{ChangeLog,DEVELOPING,NEWS,RELEASE-NOTES,TEST-NOTES,TODO} "${RELEASE_DIR}/" || die "file copy failed"
+cp "${SOURCE_DIR}/"{ChangeLog,DEVELOPING,NEWS,RELEASE-NOTES,TEST-NOTES} "${RELEASE_DIR}/" || die "file copy failed"

 cd "${RELEASE_BUILDDIR}"

Comment 173 Marius Mauch (RETIRED) gentoo-dev

2008-07-17 01:31:37 UTC

(In reply to comment #172)
> Zac, I have to apply this patch every time I want to create a tar for new
> release that I want to test. Can we do something which makes it easy for end
> users as well as devs to use the same script?
> 
> --- mkrelease.sh.orig   2008-07-16 11:41:59.000000000 -0700
> +++ mkrelease.sh        2008-07-16 11:42:46.000000000 -0700
> @@ -3,7 +3,8 @@
>  RELEASE_BUILDDIR=${RELEASE_BUILDDIR:-/var/tmp/portage-release}
>  SOURCE_DIR=${RELEASE_BUILDDIR}/checkout
>  BRANCH=${BRANCH:-trunk}
> -REPOSITORY=svn+ssh://cvs.gentoo.org/var/svnroot/portage/main
> +# REPOSITORY=svn://cvs.gentoo.org/var/svnroot/portage/main
> +REPOSITORY=svn://anonsvn.gentoo.org/portage/main
>  SVN_LOCATION=${REPOSITORY}/${BRANCH}

That's a little problem as the anon service always lags a bit behind (half an hour or so), so we we'd always have to make sure that the exported revision contains all fixes we want in the release (including the ones that were committed just before running the script)

Comment 174 devsk 2008-07-17 01:44:01 UTC

(In reply to comment #173)
> That's a little problem as the anon service always lags a bit behind (half an
> hour or so), so we we'd always have to make sure that the exported revision
> contains all fixes we want in the release (including the ones that were
> committed just before running the script)

Script can take an additional argument like --access=[anon | ssh] and default to ssh and use original REPO, to smoothen it out and make it work for both cases.

What about the TODO file? why is that always missing when I do svn checkout of trunk?

Comment 175 Andrew Gaffney (RETIRED) gentoo-dev

2008-07-17 03:00:17 UTC

(In reply to comment #173)
> That's a little problem as the anon service always lags a bit behind (half an
> hour or so), so we we'd always have to make sure that the exported revision
> contains all fixes we want in the release (including the ones that were
> committed just before running the script)

Actually, it's 5 minutes these days. Robin bumped it up for me a few months ago when I was doing testing with the catalyst-9999 ebuild for the release.

Comment 176 devsk 2008-07-18 02:17:10 UTC

Zac, is it possible to print info about which packages will be emerged concurrently in the verbose mode? We could group the packages together in the package emerge order in verbose mode.

Comment 177 Zac Medico gentoo-dev

2008-07-18 06:50:53 UTC

The build order is intentionally non-deterministic (within constraints) since one can never know in advance exactly how much time a given job will take. The scheduler makes spontaneous decisions based on the state at a given time, so in general it's not possible to know exactly which packages will be built concurrently, especially if you use the --load-average option.

Comment 178 Zac Medico gentoo-dev

2008-07-23 07:54:02 UTC

This is fixed in 2.2_rc2.

Comment 179 devsk 2008-07-24 06:18:15 UTC

(In reply to comment #178)
> This is fixed in 2.2_rc2.
> 

Can we please output the name of the log file for the package if LOGDIR is defined?

Comment 180 Zac Medico gentoo-dev

2008-07-24 06:48:51 UTC

I'm afraid that log paths will clutter the output (they might not be displayable in 80 columns). Isn't it easy enough to locate the logs as it is? Besides, in the event of a build failure, the path of the build log is included in the die message that's displayed.

Comment 181 devsk 2008-07-24 08:02:07 UTC

(In reply to comment #180)
> I'm afraid that log paths will clutter the output (they might not be
> displayable in 80 columns). Isn't it easy enough to locate the logs as it is?
> Besides, in the event of a build failure, the path of the build log is included
> in the die message that's displayed.
> 

full path isn't necessary...logfile should fit in 80 chars for most packages.

Comment 182 devsk 2008-08-15 02:42:03 UTC

Why would 2.2_rc8 do this? This looks real bad! It seems like all the packages are blocked for merging. It hasn't merged khelpcenter yet.

>>> Starting parallel fetch
>>> Emerging (1 of 52) kde-base/khelpcenter-9999
>>> Emerging (2 of 52) kde-base/kiconfinder-9999
>>> Emerging (3 of 52) kde-base/kinfocenter-9999
>>> Emerging (4 of 52) kde-base/kioclient-9999
>>> Emerging (5 of 52) kde-base/kmenuedit-9999
>>> Emerging (6 of 52) kde-base/kmimetypefinder-9999
>>> Emerging (7 of 52) kde-base/knetattach-9999
>>> Emerging (8 of 52) kde-base/knewstuff-9999
>>> Emerging (9 of 52) kde-base/kpasswdserver-9999
>>> Emerging (10 of 52) kde-base/kquitapp-9999
>>> Emerging (11 of 52) kde-base/ksnapshot-9999
>>> Emerging (12 of 52) kde-base/kstart-9999
>>> Emerging (13 of 52) kde-base/ksystraycmd-9999
>>> Emerging (14 of 52) kde-base/ktimezoned-9999
>>> Emerging (15 of 52) kde-base/ktraderclient-9999
>>> Emerging (16 of 52) kde-base/kuiserver-9999
>>> Emerging (17 of 52) kde-base/kwrite-9999
>>> Emerging (18 of 52) kde-base/renamedlg-plugins-9999
>>> Emerging (19 of 52) kde-base/solid-hardware-9999
>>> Emerging (20 of 52) kde-base/kbounce-9999
>>> Emerging (21 of 52) kde-base/kdemultimedia-kioslaves-9999
>>> Emerging (22 of 52) kde-base/keditbookmarks-9999
>>> Emerging (23 of 52) kde-base/khotkeys-9999
>>> Emerging (24 of 52) kde-base/klipper-9999
>>> Jobs: 0 of 52 complete, 6 running               Load avg: 6.26, 6.16, 5.39

Comment 183 devsk 2008-08-15 02:44:24 UTC

I think it might be something to do with resume codepath in case of failure i.e. with option --keep-going. These 52 packages are being emerged as the resume from failure.

Comment 184 devsk 2008-08-15 02:54:58 UTC

so, after a long time, it came back with this:

>>> Emerging (33 of 52) kde-base/kdm-9999
>>> Installing kde-base/kmimetypefinder-9999
>>> Installing kde-base/kmenuedit-9999
>>> Installing kde-base/khelpcenter-9999
>>> Installing kde-base/kioclient-9999
>>> Installing kde-base/knetattach-9999
>>> Installing kde-base/kinfocenter-9999
>>> Installing kde-base/kpasswdserver-9999
>>> Installing kde-base/kquitapp-9999
>>> Installing kde-base/ksnapshot-9999
>>> Installing kde-base/kstart-9999
>>> Installing kde-base/ksystraycmd-9999
>>> Installing kde-base/ktraderclient-9999
>>> Installing kde-base/ktimezoned-9999
>>> Installing kde-base/kuiserver-9999
>>> Installing kde-base/kwrite-9999
>>> Installing kde-base/renamedlg-plugins-9999
>>> Installing kde-base/kbounce-9999
>>> Installing kde-base/solid-hardware-9999
>>> Installing kde-base/keditbookmarks-9999
>>> Installing kde-base/knewstuff-9999
>>> Installing kde-base/kdemultimedia-kioslaves-9999
>>> Installing kde-base/klipper-9999
>>> Installing kde-base/kiconfinder-9999
>>> Installing kde-base/khotkeys-9999
>>> Installing kde-base/ksudoku-9999
>>> Installing kde-base/kmahjongg-9999
>>> Installing kde-base/kweather-9999
>>> Installing kde-base/kscreensaver-9999
>>> Installing kde-base/konsole-9999
>>> Installing kde-base/ksysguard-9999
>>> Installing kde-base/kshisen-9999
>>> Installing kde-base/kdm-9999
>>> Installing kde-base/kmix-9999

this was all serial and took a long time.

Comment 185 Zac Medico gentoo-dev

2008-08-15 02:58:35 UTC

(In reply to comment #182)
> Why would 2.2_rc8 do this? This looks real bad! It seems like all the packages
> are blocked for merging. It hasn't merged khelpcenter yet.

Can you reproduce that with --debug enabled?  What was your --jobs setting, unlimited? If so then it's possible for it to go out of control like that, especially if you haven't set --load-average or it's set too high. Considering that all the installs succeeded it seems like the dep calculation worked correctly.

Comment 186 Zac Medico gentoo-dev

2008-08-15 03:05:24 UTC

(In reply to comment #185)
> Can you reproduce that with --debug enabled?  What was your --jobs setting,
> unlimited?

There was a bug in earlier versions of portage (less than rc8) which caused --jobs to be unlimited when using --resume. If /var/cache/edb/mtimedb had been written with a lower version of portage then the problem would still affect rc8 after it read the corrupt merge list from the mtimedb.

Comment 187 devsk 2008-08-15 03:13:09 UTC

(In reply to comment #186)
> (In reply to comment #185)
> > Can you reproduce that with --debug enabled?  What was your --jobs setting,
> > unlimited?
> 
> There was a bug in earlier versions of portage (less than rc8) which caused
> --jobs to be unlimited when using --resume. If /var/cache/edb/mtimedb had been
> written with a lower version of portage then the problem would still affect rc8
> after it read the corrupt merge list from the mtimedb.
> 

I updated portage to rc8 prior to starting on 101 packages of kde. It failed in between because the TMPDIR ran out of inodes. After removing some junk, I said resume, and it resumed those remaining 72 packages correctly.

--jobs=6 and --load-average=8.

Comment 188 devsk 2008-08-15 03:19:05 UTC

Another issue: multiple failures and resumes lead to this towards the end:

Traceback (most recent call last):
  File "/usr/bin/emerge", line 18, in <module>
    retval = _emerge.emerge_main()
  File "/usr/lib64/portage/pym/_emerge/__init__.py", line 13662, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/lib64/portage/pym/_emerge/__init__.py", line 12787, in action_build
    retval = mergetask.merge()
  File "/usr/lib64/portage/pym/_emerge/__init__.py", line 9378, in merge
    mergelist.remove(list(failed_pkg.pkg))
ValueError: list.remove(x): x not in list

Comment 189 devsk 2008-08-24 18:22:31 UTC

Zac, can we output the overall status every 10 packages (or every 5 minutes)? This will leave some trace of the load averages at various stages of the world update as well as let us know if we are hitting any bottlenecks in out algorithm. Currently, mid-way through 'emerge -e world', it has permanently (at least last 30 packages or so) turned to 1 job at a time and I don't know which package triggered this or why it happened? If I wasn't watching it, I wouldn't have noticed (at least I won't have any proof that it was doing 1 job at a time).

>>> Jobs: 55 of 355 complete, 1 running             Load avg: 1.46, 1.76, 2.00

I think the repeated sequence of

>>> Installing <blah>
>>> Emerging (58 of 355) <blah>

is too minimal of a status to be useful. We need to know how many have failed, and which ones have failed. Many times the investigation into which one failed has to be postponed to the end of the update (which can be long) because we don't know which package failed.

And I think putting the log file name in next line after "Emerging .." is very useful and is not that verbose. We should do that. Otherwise its a sort on time in LOGDIR every time and my LOGDIR has too many files in there.

Comment 190 Zac Medico gentoo-dev

2008-08-24 18:55:20 UTC

(In reply to comment #189)
> Currently, mid-way through 'emerge -e world', it has permanently (at
> least last 30 packages or so) turned to 1 job at a time and I don't know which
> package triggered this or why it happened? If I wasn't watching it, I wouldn't
> have noticed (at least I won't have any proof that it was doing 1 job at a
> time).

This is a know issue that's been fixed in svn for some time. See bug 235542.

Comment 191 Zac Medico gentoo-dev

2008-12-21 00:03:32 UTC

This feature is included in portage-2.1.6, so let's close this bug now. If you've got some remaining enhancements to add then please file separate bugs. Thanks.