Summary: | sys-apps/portage-2.1.9.7: endless ebuild-ipc timed out during write after 15 seconds | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Juergen Rose <rose> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | esigra, graphics+disabled, oss.elmar, shiningarcanine |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=337465 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 335925 |
Description
Juergen Rose
2010-09-17 08:58:20 UTC
After erasing gqview the next 'emerge gqview' was succesful. I have seen this when repeatedly emerging portage and when initially emerging groff. In case of groff it occured immediately after unpacking of the sources. (In reply to comment #2) > I have seen this when repeatedly emerging portage and when initially emerging > groff. In case of groff it occured immediately after unpacking of the sources. > My workaround is to run ebuild xxx.ebuild install. I think this bug is related to bug #337465 Hopefully it's fixed with portage-2.1.9.10. Please test. (In reply to comment #4) > Hopefully it's fixed with portage-2.1.9.10. Please test. > Unfortunatly with portage-2.2.01.16616 (Prefix overlay on Cygwin): >>> Source unpacked in /home/prefix/gentoo/var/tmp/portage/sys-apps/groff-1.20.1-r3/work ebuild-ipc timed out during write after 15 seconds, retrying... ... ebuild-ipc timed out during write after 225 seconds, retrying... (In reply to comment #5) > Unfortunatly with portage-2.2.01.16616 (Prefix overlay on Cygwin): Please file a separate bug. Each kernel behaves differently, and we can't handle them all in one bug. Your issue may be fixed already, but the fix may not have made it into a prefix release yet (latest fixes are have been mentioned on bug 337465). > Please file a separate bug. Each kernel behaves differently, and we can't
> handle them all in one bug. Your issue may be fixed already, but the fix may
> not have made it into a prefix release yet (latest fixes are have been
> mentioned on bug 337465).
Yes, you asked for portage-2.1.9.10 and portage-2.2.01.16616 should be thereafter, even on Prefix. Hence, I thought to report that info.
I wait a little before opening a new bug, to see if it goes away on it's own.
Nah, I'll commit a new portage in a few, with ipc USE-flag to enable or disable it. It'll be disabled by default (in Prefix) to be enabled in profiles that have configurations that actually support the code. I'm not sure yet what it's useful for, but for Prefix it's going to cause more trouble than for anybody else. (In reply to comment #7) > Yes, you asked for portage-2.1.9.10 and portage-2.2.01.16616 should be > thereafter, even on Prefix. Hence, I thought to report that info. Still, if you have a different kernel then you can get wildly different results. Comment #0 clearly refers to a Linux kernel, and there have not been any ebuild-ipc issues reported for the Linux kernel since portage-2.1.9.10 (which is not true of other kernels, as evidenced by comments in bug 337465). > Still, if you have a different kernel then you can get wildly different
> results. Comment #0 clearly refers to a Linux kernel, and there have not been
> any ebuild-ipc issues reported for the Linux kernel since portage-2.1.9.10
> (which is not true of other kernels, as evidenced by comments in bug 337465).
>
OK, I learn that for the future, that it's always related to one kernel.
Al
I can trigger this problem in portage 2.1.10.3 using GNU Screen. It seems that if you put screen into scrollback mode (Ctrl+A Esc) while emerge is doing something, this happens. I have seen this twice so far. One time was when emerging packages. My computer did nothing for 7 hours until I put screen back into normal operation. I assume this happened there, but I was using --jobs, so I didn't check the ebuild logs. Another time was today when doing --depclean. I clearly saw this message occurring. I have had this happen in the past with --jobs and even once filed a bug report for which a patch was written. I don't think every instance of this involves screen, but it seems that investigating the screen issue might give some insight into this problem. I would investigate this myself before reporting it, but I do not have time to do much with this. (In reply to comment #11) > I can trigger this problem in portage 2.1.10.3 using GNU Screen. It seems that > if you put screen into scrollback mode (Ctrl+A Esc) while emerge is doing > something, this happens. I have seen this twice so far. One time was when > emerging packages. My computer did nothing for 7 hours until I put screen back > into normal operation. I assume this happened there, but I was using --jobs, so > I didn't check the ebuild logs. Another time was today when doing --depclean. I > clearly saw this message occurring. A likely explanation is that screen stops processing stdout/stderr of attached processes when it is in scrollback mode. In thise case, there's nothing that attached processes like emerge can do except to wait for stdout/stderr to unblock. This type of issue can be fixed in screen by making it run a select/poll loop to process stdout/stderr of attached processes while it's in scrollback mode. (In reply to comment #12) > (In reply to comment #11) > > I can trigger this problem in portage 2.1.10.3 using GNU Screen. It seems that > > if you put screen into scrollback mode (Ctrl+A Esc) while emerge is doing > > something, this happens. I have seen this twice so far. One time was when > > emerging packages. My computer did nothing for 7 hours until I put screen back > > into normal operation. I assume this happened there, but I was using --jobs, so > > I didn't check the ebuild logs. Another time was today when doing --depclean. I > > clearly saw this message occurring. > > A likely explanation is that screen stops processing stdout/stderr of attached > processes when it is in scrollback mode. In thise case, there's nothing that > attached processes like emerge can do except to wait for stdout/stderr to > unblock. This type of issue can be fixed in screen by making it run a > select/poll loop to process stdout/stderr of attached processes while it's in > scrollback mode. That is a good explanation. Unfortunately, screen has to do this in a way that does not cause it to use an infinite scrollback buffer, which would be a memory leak. Maybe a COW buffer would work where it will basically keep a temporary buffer so that the scrollback works and the actual buffer can keep going. I will try to file a separate bug for screen later when I have more time. With that said, the idea that the ebuild-ipc time outs occur when stdout/stderr stops being processed suggests to me that a race condition is occurring during the build. In the past, these ebuild-ipc issues have usually been accompanied by heavy load from --jobs, which is consistent with the idea that a race condition is being triggered. Is it possible to get emerge to output diagnostic information on the recipient(s) of stdout and stderr for the entire process tree when the ebuild-ipc timeout occurs? (In reply to comment #13) > With that said, the idea that the ebuild-ipc time outs occur when stdout/stderr > stops being processed suggests to me that a race condition is occurring during > the build. In the past, these ebuild-ipc issues have usually been accompanied > by heavy load from --jobs, which is consistent with the idea that a race > condition is being triggered. If it times out endlessly, even after the load decreases or after stdout/stderr unblock, then that would probably indicate some kind of bug (possibly a race condition). However, if it eventually stops timing out after load decreases or stdout/stderr unblock, then it would be behaving as designed. > Is it possible to get emerge to output diagnostic > information on the recipient(s) of stdout and stderr for the entire process > tree when the ebuild-ipc timeout occurs? Currently, there is no special diagnostic output like you that, and I'm not sure exactly what you'd want to see in that output. Before we go there, I'd first like to clarify whether it's behaving as designed or not. Does it stop timing out after the load decreases or after stdout/stderr unblock, or not? It has been a while since I have done a full system emerge to witness these problems in that context. With the first incident involving screen, it resumed after leaving scrollback mode. In instances not involving screen's scrollback mode, the builds failed, although those happened 2 to 3 months ago. Since there's no activity here lately, I think we can assume that this was fixed by the changes from bug 337465. See bug 524328 for a similar issue that has been reported with more recent versions of portage. |