Summary: | sys-apps/portage-2.2.0_alpha116: Ctrl+C during --jobs simply halts progress w/o quitting | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Jacob Godserv <jacobgodserv> |
Component: | [OLD] Core system | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | minor | CC: | 4glitch, esigra, mrueg, pacho, rdalek1967, SebastianLuther, tomboy64 |
Priority: | Normal | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://www.cons.org/cracauer/sigint.html | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=617550 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 184128, 604854 | ||
Attachments: | lsof, pstree, and pdb's backtrace output |
Description
Jacob Godserv
2012-07-10 00:30:48 UTC
Created attachment 317738 [details]
lsof, pstree, and pdb's backtrace output
*** Bug 433958 has been marked as a duplicate of this bug. *** Why is this still not confirmed? Should I provide additional information? When you ^C to emerge, the Scheduler._terminate_tasks method sends SIGTERM to direct child processes, and then it waits for them to terminate. It may help if we also add a timeout, and send a SIGKILL if the SIGTERM cause it to terminate in a reasonable amount of time (10 seconds or so). Actually, I was about to file a bug when i found this one about ...why not turn it into a feature? My suggestion: Ctrl+C kills emerge and all of its subprocesses. Sending it a signal (SIGUSR1 for example) could instead make it finish the currently running compiles/merges and then *not* proceed with the scheduled merges? E.g. make it finish gracefully. The current behavior (for which this bug was opened) also happens in 2.1.11.62. And I find it quite a nuisance, since it needs an explicit kill -9 to stop. (In reply to comment #5) > The current behavior (for which this bug was opened) also happens in > 2.1.11.62. And I find it quite a nuisance, since it needs an explicit kill > -9 to stop. Just FYI, a HUP signal to the python process does the trick. It seems python is responsive even if the code its executing is not. If you Ctrl+Z'ed it, you just have to be sure to run "fg" to let it receive and react to the signal. (In reply to comment #5) > My suggestion: > Ctrl+C kills emerge and all of its subprocesses. If it sends SIGKILL first then we risk losing potentially useful output, such as the traceback captured in bug 463960, comment #26. In order to avoid losing useful information like that, I think it's better to use a timeout. Alternatively, we could send SIGKILL if more than one SIGINT signal is received. > Sending it a signal (SIGUSR1 for example) could instead make it finish the > currently running compiles/merges and then *not* proceed with the scheduled > merges? E.g. make it finish gracefully. I think the graceful exit is a good default, but it will be better if enhanced with a timeout and translation of multiple SIGINT to SIGKILL. > The current behavior (for which this bug was opened) also happens in > 2.1.11.62. And I find it quite a nuisance, since it needs an explicit kill > -9 to stop. Yes, it is certainly annoying. This suggests several different signals should be sent before a SIGKILL is: http://pthree.org/2012/08/14/appropriate-use-of-kill-9-pid/ I am still suffering this with 2.2.8-r1 (just hit) According to the article "Proper handling of SIGINT/SIGQUIT", the only proper way to exit due to SIGINT would be essentially as follows: void sigint_handler(int sig) { [do some cleanup] signal(SIGINT, SIG_DFL); kill(getpid(), SIGINT); } Our current code doesn't do that, so that's something to fix. We should certainly avoid calling signal.signal(signal.SIGINT, signal.SIG_IGN) and leaving it in that state (that's what we do now). I've experienced this bug just now, and use SIGUSR1 + pdb to get some insight into what causes it. The problem is that in Scheduler there are 2 system packages in self._merge_wait_scheduled, and they are also in self._running_tasks (which keeps the main loop running). These tasks have been cancelled by self._terminate_tasks, and cleared from self._task_queues.merge. So, they're just sitting there in a cancelled state, keeping the loop running. This patch seems to work for me: https://github.com/gentoo/portage/pull/43 Posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/bf102d1a2558b43655bf17251ca9b53d This is in the master branch: https://gitweb.gentoo.org/proj/portage.git/commit/?id=d54a795615ccb769a25a0f8d6cc15ba930ec428f Fixed in portage-2.3.3. |