alexander@blatt /opt/netscape/plugins $ sudo emerge -vbat x11-plugins/gkrellm-sensors x11-misc/xsensors ksensors These are the packages that I would merge, in reverse order: Calculating dependencies ...done! [ebuild N ] kde-misc/ksensors-0.7.3 +arts -debug +kdeenablefinal -xinerama 844 kB [ebuild N ] x11-misc/xsensors-0.47 115 kB [ebuild N ] x11-plugins/gkrellm-sensors-0.1 6 kB [ebuild NS ] app-admin/gkrellm-1.2.13 +nls 428 kB [ebuild N ] sys-apps/lm_sensors-2.9.1-r1 +sensord 0 kB Total size of downloads: 1,393 kB Do you want me to merge these packages? [Yes/No] [...] >>> md5 src_uri ;-) lm_sensors-2.9.1.tar.gz * Determining the location of the kernel source code * Found kernel source directory: * /usr/src/linux alexander@blatt /opt/netscape/plugins $ As you can see there, emerge dies too soon. It doesn't print the complete error message. When I re-run the command, I often get the complete error message: >>> md5 src_uri ;-) lm_sensors-2.9.1.tar.gz * Determining the location of the kernel source code * Found kernel source directory: * /usr/src/linux * Found sources for kernel version: * 2.6.13-ck1.022 * * lm_sensors-2.9.1 requires CONFIG_I2C_SENSOR to be enabled for non-2.4.x kernels. * This bug is *NOT* about lm_sensors! Other packages (like davfs2, rt2500, unionfs) sometimes die in the same way. What's common is, that all those packages are kernel related.
alexander@blatt /opt/netscape/plugins $ emerge info Portage 2.0.51.22-r2 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r1, 2.6.13-ck1.022 i686) ================================================================= System uname: 2.6.13-ck1.022 i686 Intel(R) Celeron(R) M processor 1.50GHz Gentoo Base System version 1.12.0_pre7 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] ccache version 2.4 [enabled] dev-lang/python: 2.3.4-r1, 2.4.1-r1 sys-apps/sandbox: 1.2.12 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6 sys-devel/binutils: 2.16.1 sys-devel/libtool: 1.5.20 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -mtune=pentium-m -pipe -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O2 -mtune=pentium-m -pipe -fomit-frame-pointer" DISTDIR="/Gentoo/Portage/distfiles" FEATURES="autoconfig ccache distcc distlocks sandbox sfperms strict" GENTOO_MIRRORS="http://server.bei.digitalprojects.com/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://php2.ath.cx/~askwar/gentoo-files/ http://stuff.alexander.skwar.name/gentoo/ http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ ftp://ftp.tu-clausthal.de/pub/linux/gentoo/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/ http://distfiles.gentoo.org/" LANG="de_DE.UTF-8" LDFLAGS="-Wl,-O1" LINGUAS="de" MAKEOPTS="-j3" PKGDIR="/Gentoo/Portage/packages" PORTAGE_TMPDIR="/Gentoo/Portage/build" PORTDIR="/Gentoo/Portage/tree" PORTDIR_OVERLAY="/Gentoo/Portage/local-tree/misc" SYNC="rsync://server/gentoo-portage" USE="x86 GAPING_SECURITY_HOLE X acpi alsa amd apm arts artswrappersuid async avi bash-completion bdf berkdb bluetooth bootsplash browserplugin cardbus ccache cdda cddb cdio cdparanoia cdr cdrom cle266 crypt css cups curl curlwrappers dbus devmap dillo divx4linux dlloader dvd dvdread emoticon esd exif fam fbcon fbdev fbsplash firefox fping freetype gd gdbm gif gnokii gnome gstreamer gtk gtk2 hal hpn icc id3 idn imagemagick imap imlib imlib2 insecure-drivers insecure-savers java javascript jikes jpeg kde kdeenablefinal libedit libwww logrotate lynxkeymap mad maildir matroska mbox mmx mmxext mozilla moznoirc mozsvg mp3 mpeg mpeg2 mpeg4 mplayer multicall ncurses netboot network new-login nfs nis nls no-old-linux no-suexec noantlr nobcel nobeanutils nobsf nobsh nocommonslogging nocommonsnet nodrm nogg nogulm nojoystick nojsch nojython nolog4j nomac nooro nopri norhino noxalan noxerces nozaptel nptl nptlonly nsplugin offensive ogg oggvorbis opengl openssh pam_console pam_timestamp passfile password patented pccts pcmcia pcre perl perlsuid pic player png pnp qt quicktime rar readline real recode reiserfs samba sendfile sensord sftp slang sms spell spf sse sse2 ssl startup-notification stream subp subtitles suid symlink sysfs syslog transcode truetype truetype-fonts trusted type1-fonts underscores unichrome unicode unsafe usb utf8 uudeview vim vim-pager vlm vorbis wifi win32codecs wma123 x11vnc xine xinetd xml xml2 xmms xpm xscreensaver xv xvid xvmc zlib video_cards_via linguas_de userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LC_ALL
Carlo, I don't think, that this is a kernel bug. I rather think, that there's something buggy in emerge or somewhere in portage - after all, it is emerge, that's dying too soon. Sure, I reported, that kernel related packages have these issues, but still - the kernel or those packages work just fine. What can I do, to make the portage system (or emerge) print more verbose output, so that the reason for this bug can be discovered?
This really isn't portage's doing. Partial chopping of bash output (without even any error msgs) is indicative of something segfaulting on your box. Doubt it's even strictly the ebuild's doing, tbh.
it isnt a kernel bug, i see it with a bunch of different packages, i just didnt care to figure it out :p
oh, and it isnt anything segfaulting
SpanKY is right ("of course" *G*). I also see it with non-kernel packages, now that I think about it. It's just, that I tend to see it most of the time with kernel packages. Changing priority to MINOR: It's annoying, but I can live with it.
iirc, ive seen it with `emerge` and `ebuild` also happens with `epatch` output with something like unpacking gcc and you have a bum patch in the patchset
emerge -d it, and look through the bash output. Doubt it's the case, but disable sandbox and verify you can get this behaviour.
Created attachment 67590 [details] output of sudo emerge -v1d app-mobilephone/gnokii In this attachment, you can find the output of running "emerge -v1d app-mobilephone/gnokii" (another kernel pkg, btw...) two times. The last output of the 2nd run is: + '[' -n 'Please ensure that /usr/src/linux points to a configured set of Linux sources.' ']' The last output of the 1st run is: + exit 1 Further, in the 1st run, the last line of the 2nd run is in line 40 of the attachment. In the 1st run, there are quite a lot of more lines. Is there a way to make emerge be even more verbose? In the 2nd run, it died when test ([) was run. But why did it die there?
give the exit code, disable sandbox like I said, unset PORT_LOGDIR if set, and try it withulimit -c flipped on also (yes I still think it's segfaulting). If this is occuring frequently as you indicate, start noting *everyone* it occurs on. The linux-info.eclass qout cruft in your output is funky, but still, there is no indication that this is portage specifically, rather then python/bash being stupid. From where things bailed in your output, there's frankly *really* no way for the bash side of things to bail completely without any indication, and have the python side bail, again, without any indication. Bash bailing without errors, well, that's damn weird. Portage code in *no* way indicating it bailed? Have problems viewing that as possible due to the indeterminate nature of this bug rearing it's head; run 1 fails, run 2 fails... Why would run2 succeed in outputting further bash output? It shouldn't, is the answer.... That indicates things being unstable, which for that to occur is bash based, it *has* to be in knowing how ebuild.sh works. Screwups are possible, but I'm personally inclined to think people's installations/hardware are being stupid. Possibilities I can think of are sandbox, and tee (port_logdir); again, doubt that's the case due to the fact there is no python output to go with.
Here's another instance of where emerge failed to print the complete error message. It failed with: i686-pc-linux-gnu-gcc -O2 -mtune=pentium-m -pipe -fomit-frame-pointer -Wall -Wmissing-prototest-ve-config.o -Wl,--export-dynamic -pthread /usr/lib/libglade-2.0.so /usr/lib/libgtk-x11-o /usr/lib/libatk-1.0.so /usr/lib/libgdk_pixbuf-2.0.so /usr/lib/libpangoxft-1.0.so /usr/lib/goft2-1.0.so /usr/lib/libpango-1.0.so /usr/lib/libgnome-2.so /usr/lib/libesd.so -L/usr/lib //libasound.so /usr/lib/libgnomevfs-2.so /usr/lib/libxml2.so -lz -lssl -lcrypto -lresolv -lrtb/libgconf-2.so /usr/lib/libbonobo-activation.so /usr/lib/libORBitCosNaming-2.so /usr/lib/li/usr/lib/libgobject-2.0.so -lm /usr/lib/libgmodule-2.0.so -ldl /usr/lib/libgthread-2.0.so -l-L/Gentoo/Portage/build/portage/gdm-2.8.0.4/work/gdm-2.8.0.4/vicious-extensions -lvicious /usr/lib/gcc/i686-pc-linux-gnu/3.4.4/../../../../i686-pc-linux-gnu/bin/ld: cannot find -lviccollect2: ld returned 1 exit status distcc[22030] ERROR: compile (null) on localhost failed make[3]: *** [test-ve-config] Fehler 1 make[3]: *** Warte auf noch nicht beendete Prozesse... I don't think that this is flaky hardware. I get this on many machines. On some, I get it more often than on others. Sure, it might be because of f*cked up installations - but I doubt this, since many machines act up like this, and they are not installed identically.
Yet another breakage: >>> md5 files ;-) files/digest-pysol-4.82-r1 >>> md5 files ;-) files/pysol-4.82-sound-ok.patch >>> md5 src_uri ;-) pysol-4.82.tar.bz2 >>> md5 src_uri ;-) pysol-4.82-src.tar.bz2 server / # emerge -bvat pysol These are the packages that I would merge, in reverse order: Calculating dependencies ...done! [ebuild N ] games-board/pysol-4.82-r1 0 kB Total size of downloads: 0 kB Do you want me to merge these packages? [Yes/No] >>> emerge (1 of 1) games-board/pysol-4.82-r1 to / >>> md5 files ;-) pysol-4.82-r1.ebuild >>> md5 files ;-) pysol-4.82.ebuild >>> md5 files ;-) files/digest-pysol-4.82 >>> md5 files ;-) files/digest-pysol-4.82-r1 >>> md5 files ;-) files/pysol-4.82-sound-ok.patch >>> md5 src_uri ;-) pysol-4.82.tar.bz2 >>> md5 src_uri ;-) pysol-4.82-src.tar.bz2 * You need to recompile python with Tkinter support. server / # As you can see, in the first run, the "* You need to recompile python with Tkinter support." message was not printed. And even the 2nd run isn't complete. The complete message (which I got in a 3rd run) is: * You need to recompile python with Tkinter support. * That means: USE='tcltk' emerge python !!! ERROR: games-board/pysol-4.82-r1 failed. !!! Function python_tkinter_exists, Line 99, Exitcode 0 !!! missing tkinter support with installed python !!! If you need support, post the topmost build error, NOT this status message. And now I can't get it to screw again. Everytime I run it, i get the complete output. Because of that, I don't think it's flaky hardware. Are there any (temporary) files that get created and survive during "emerge -bvat" runs? It really seems like, doesn't it (why else would 3rd or 4th runs be successful?)?
Another instance: I ran: "time emerge -v gthumb". It ended with: <ANCHOR id ="GPPORTLIBRARYOPERATIONS" href="gphoto2-port/gphoto2-port-gphoto2-port-library.html#GPPORTLIBRARYOPERATIONS"> <ANCHOR id ="GP-PORT-LIBRARY-TYPE" href="gphoto2-port/gphoto2-port-gphoto2-port-library.html#GP-PORT-LIBRARY-TYPE"> <ANCHOR id ="GP-PORT-LIBRARY-LIST" href="gphoto2-port/gphoto2-port-gphoto2-port-library.html#GP-PORT-LIBRARY-LIST"> <ANCHOR id ="GP-PORT-LIBRARY-OPERATIONS" href="gphoto2-port/gphoto2-port-gphoto2-port-library.html#GP-PORT-LIBRARY-OPERATIONS"> real 4m27.324s user 0m48.882s sys 0m43.318s server / # time emerge -vat gthumb These are the packages that I would merge, in reverse order: Calculating dependencies ...done! [ebuild R ] media-gfx/gthumb-2.6.7 -debug +gphoto2* +jpeg +png +tiff* 0 kB Pretty annoying :(
And another issue: server linux # emerge -bv findutils [...] then mv -f ".deps/version.Tpo" ".deps/version.Po"; else rm -f ".deps/version.Tpo"; exit 1; fi i686-pc-linux-gnu-gcc -O2 -mtune=pentium-m -pipe -fomit-frame-pointer -Wl,-O1 -static -o find find.o fstype.o parser.o pred.o tree.o util.o version.o ../lib/libfind.a ../gnulib/lib/libgnulib.a server linux # This was when trying to emerge findutils-4.2.24. This time, it's not just plain annoying, it's also a problem, as it hides the real error message, which I'd like to paste in bug #107638
Stop flooding this bug with "another instance" before I invalidate it... I already stated I doubt this is portage's fault. Python, glibc, kernel, hardware, they are more apt targets then portage, which just does a "print blah". Any screwups within the print implementation/execution we can't fix, since it's not are source that's screwing up. Meanwhile, do as I said in comment 10. "and another... and another" does nothing to help debug it, besides spam us while we're waiting on a response so we _can_ continue trying to debug it. Reopen when you've triggered failures using what I stated in comment 10.
(In reply to comment #15) > Stop flooding this bug with "another instance" before I invalidate it... [...] > Reopen when you've triggered failures using what I stated in comment 10. I *did* like you told in comment #10 - there, you wrote: "start noting *everyone* it occurs on.". That's what I did, didn't I? Hm, or was that NOT a typo and you really meant "everyone" and not "everything"? Where can I find out, who's responsible for a package? Further, I don't think that this is the fault of a given package (eg. findutils in comment #14) as it happens in too many packages, from time to time. Because of that, I rather think that it is the fault of emerge. I don't think Python, glibc, kernel, hardware, is faulty - too many things *DO* work. It's just emerge that's dying too soon, sometimes. Anyway. The bug is most certainly not "resolved" ;-) Reopening. I'll try get and attach a typescript when I run "PORT_LOGDIR='' FEATURES=-sandbox emerge -d $pkg ; echo exit code $?". This is rather hard, as packages exit just fine when run a 2nd time :)
resolved needinfo is actually accurate, it means "can't do a damn thing with out info".... (see comment 10 with testing suggestion, beyond just noting failures) ;-) Being a bit blunt on this, there's pretty much no chance in hell emerge is explicitly doing this. You don't blame a bash script that is valid syntax for crashing bash, do you? No, you look at it and try to figure out what bug it's exposing in the underlying software stack... As per the norm with the "no chance in hell", I may have to eat my words on it, but your earlier -d run pretty much bears out what I'm stating. There's no set -x displayed, which means ebuild.sh died essentially out of the blue. Possible emerge is somehow silently swallowing the exit code and bailing, but that doesn't explain why the output is partially swallowed. That right there points at something further down the stack, *really* far down the stack. Change your kernel to a non ck, stable 2.6.13 release and reopen after testing since 2.6.13 releases had OOMK bugs that acted rather similar to this. Reopen if it continues...
via jstubbs, check your dmesg for messages about oomk waxing things also.
To be specific, there was a lot of breakage in 2.6.13-ck up until -ck5. A lot of people were reporting incorrect OOM kills when "prefetch" was enabled. -ck6 should correct most of these issues, but Con himself still regards it as experimental (as the triple asterisked captialized statement at the top of his release email shows). I agree with Brian's handling of this bug. If you're going to run anything other than the vanilla kernel (or Gentoo's derivative thereof) you must be prepared to do a little investigation regarding any abnormal system behaviour.
BTW: $ uname -a Linux server 2.6.12-ck6.014.reiser4.nfsv4.inotify.no-posix.bsd IOW: I also get this on non-2.6.13 systems. I also had no out-of-memory OOM messages in the logs. Lastly, I also don't use "prefetch". But, I'll install gentoo sources and will see. Another BTW: Running "emerge -d" doesn't show any differences. It also simply dies. I now removed PORT_LOGDIR from make.conf. But I don't really want to add FEATURES=-sandbox - sandbox is too important, IMO.
I'm now using gentoo-sources kernel and got: >>> Unpacking source... >>> Unpacking bind-9.3.2b1.tar.gz to /Gentoo/Portage/build/portage/bind-9.3.2_beta1/work !!! ERROR: net-dns/bind-9.3.2_beta1 failed. !!! Function src_unpack, Line 59, Exitcode 0 Here, some output is missing again - at a earlier run, I got a message that "USE=idn" doesn't work with net-dns/bind-9.3.2_beta1. [19:03:45 alexander@server:/var/log] $ uname -a Linux server 2.6.13-gentoo-r4.015.reiser4 #1 Fri Oct 14 22:30:13 CEST 2005 i686 AMD Athlon(tm) XP 2000+ AuthenticAMD GNU/Linux At a later run: >>> Unpacking source... >>> Unpacking bind-9.3.2b1.tar.gz to /Gentoo/Portage/build/portage/bind-9.3.2_beta1/work !!! ERROR: net-dns/bind-9.3.2_beta1 failed. !!! Function src_unpack, Line 59, Exitcode 0 !!! idn patch is broken for this beta !!! If you need support, post the topmost build error, NOT this status message. As you can see, the 3rd and 4th !!! lines are missing. But I had set PORT_LOGDIR and sandbox was enabled. Let's hope that I get emerge to brake with sandbox disabled.
(Unintentionally) reproduced here: /bin/sh ../libtool --silent --mode=link --tag=CXX x86_64-pc-linux-gnu-g++ -Wno-long-long -Wundef -ansi -D_XOPEN_SOURCE=500 -D_BSD_SOURCE -Wcast-align -Wconversion -Wchar-subscripts -Wall -W -Wpointer-arith -DNDEBUG -DNO_DEBUG -O2 -march=opteron -O2 -pipe -Wformat-security -Wmissing-format-attribute -Wno-non-virtual-dtor -fno-exceptions -fno-check-new -fno-common -o k3b -L/usr/kde/3.5/lib64 -L/usr/qt/3/lib64 -L/usr/lib64 -R /usr/lib64 -R /usr/kde/3.5/lib64 -R /usr/qt/3/lib64 -R /usr/lib64 k3bwelcomewidget.o k3bapplication.o k3bdiroperator.o k3bfiletreeview.o k3bprojecttabbar.o k3bprojecttabwidget.o k3bsplash.o k3bfileview.o k3bdirview.o k3b.o main.o k3bstatusbarmanager.o k3bfiletreecombobox.o k3binterface.o k3bprojectinterface.o k3bdataprojectinterface.o k3bsystemproblemdialog.o k3bcdcontentsview.o k3bwriterspeedverificationdialog.o k3bsidepanel.o k3bjobprogressdialog.o k3bburnprogressdialog.o k3btempdirselectionwidget.o k3bdatamodewidget.o k3bwritingmodewidget.o k3bwriterselectionwidget.o k3binteractiondialog.o k3bthememanager.o k3bprojectmanager.o k3btrm.o k3bmusicbrainz.o k3baudioprojectinterface.o k3bmixedprojectinterface.o k3bflatbutton.o k3bemptydiscwaiter.o k3bjobprogressosd.o k3brichtextlabel.o k3bdebuggingoutputdialog.o k3bdebuggingoutputfile.o k3binterface_skel.o k3bprojectinterface_skel.o k3bdataprojectinterface_skel.o k3baudioprojectinterface_skel.o k3bmixedprojectinterface_skel.o ./cdinfo/libcdinfo.la ./option/liboption.la ./rip/librip.la ./videoEncoding/libvideoEncoding.la ./projects/libprojects.la ../libk3bdevice/libk3bdevice.la ../libk3b/libk3b.la ./misc/libmisc.la -lkio -lkparts -lm -L/usr/kde/3.5/lib64 -L/usr/qt/3/lib64 -L/usr/lib64 /usr/lib64/libdbus-qt-1.so: undefined reference to `opteron246 ~ # For completeness' sake: Portage 2149-svn (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r1, 2.6.13-ck6 x86_64) ================================================================= System uname: 2.6.13-ck6 x86_64 AMD Opteron(tm) Processor 246 Gentoo Base System version 1.12.0_pre9 dev-lang/python: 2.3.5, 2.4.2 sys-apps/sandbox: 1.2.13 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1 sys-devel/binutils: 2.16.1 sys-devel/libtool: 1.5.20 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="amd64 ~amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=opteron -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/init.d /etc/terminfo /etc/env.d" CXXFLAGS="-march=opteron -O2 -pipe" DISTDIR="/mnt/archive/gentoo/distfiles" FEATURES="livecvsportage noinfo nostrip sign strict userpriv" GENTOO_MIRRORS="http://gentoo.gg3.net/" LC_ALL="ja_JP.UTF-8" LINGUAS="en ja" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/mnt/archive/gentoo/rsync" SYNC="rsync://gentoo.gg3.net/gentoo-portage" USE="amd64 X a52 aac alsa arts cjk dts dvd dvdr dvdread ffmpeg flac gif gstreamer hal immqt-bc jpeg kdeenablefinal mad mp3 nocd nptl nptlonly ogg opengl pdflib png python qt readline sasl sdl smime spell ssl subversion truetype-fonts unicode v4l v4l2 videos vorbis xv xvid zlib linguas_en linguas_ja userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LANG, LDFLAGS, PORTDIR_OVERLAY PORT_LOGDIR is set.
This part of spawn() looks to be the problem to me. Comments inlined. # mypid == [pid of tee, pid of bash] while len(mypid): # wait of pid of bash retval=os.waitpid(mypid[-1],0)[1] # ebuild fails so retval != 0 if retval != 0: # start killing off everything else (== tee) for x in mypid[0:-1]: try: # send it a SIGTERM os.kill(x,signal.SIGTERM) # Immediately check if it's dead if os.waitpid(x,os.WNOHANG)[1] == 0: # It's not dead because it hasn't been given any time # to clean up so we SIGKILL it. os.kill(x,signal.SIGKILL) # Now it dies without flushing its buffers. os.waitpid(x,0)
Created attachment 71248 [details, diff] Adds a 1 second timeout after SIGTERM is sent
It is the loop that is the problem but the attached patch doesn't fix it. "tee" seems to respond to SIGTERM by exiting without dumping its buffers. Reworking and testing further at the moment.
Created attachment 71258 [details, diff] 0.5 second window before SIGTERM followed by 0.5 second window before SIGKILL The "sleeping" is printed just before time.sleep(0.01). /bin/sh ../libtool --silent --mode=link --tag=CXX x86_64-pc-linux-gnu-g++ -Wno-long-long -Wundef -ansi -D_XOPEN_SOURCE=500 -D_BSD_SOURCE -Wcast-align -Wconversion -Wchar-subscripts -Wall -W -Wpointer-arith -DNDEBUG -DNO_DEBUG -O2 -march=opteron -O2 -pipe -Wformat-security -Wmissing-format-attribute -Wno-non-virtual-dtor -fno-exceptions -fno-check-new -fno-common -o k3b -L/usr/kde/3.5/lib64 -L/usr/qt/3/lib64 -L/usr/lib64 -R /usr/lib64 -R /usr/kde/3.5/lib64 -R /usr/qt/3/lib64 -R /usr/lib64 k3bwelcomewidget.o k3bapplication.o k3bdiroperator.o k3bfiletreeview.o k3bprojecttabbar.o k3bprojecttabwidget.o k3bsplash.o k3bfileview.o k3bdirview.o k3b.o main.o k3bstatusbarmanager.o k3bfiletreecombobox.o k3binterface.o k3bprojectinterface.o k3bdataprojectinterface.o k3bsystemproblemdialog.o k3bcdcontentsview.o k3bwriterspeedverificationdialog.o k3bsidepanel.o k3bjobprogressdialog.o k3bburnprogressdialog.o k3btempdirselectionwidget.o k3bdatamodewidget.o k3bwritingmodewidget.o k3bwriterselectionwidget.o k3binteractiondialog.o k3bthememanager.o k3bprojectmanager.o k3btrm.o k3bmusicbrainz.o k3baudioprojectinterface.o k3bmixedprojectinterface.o k3bflatbutton.o k3bemptydiscwaiter.o k3bjobprogressosd.o k3brichtextlabel.o k3bdebuggingoutputdialog.o k3bdebuggingoutputfile.o k3binterface_skel.o k3bprojectinterface_skel.o k3bdataprojectinterface_skel.o k3baudioprojectinterface_skel.o k3bmixedprojectinterface_skel.o ./cdinfo/libcdinfo.la ./option/liboption.la ./rip/librip.la ./videoEncoding/libvideoEncoding.la ./projects/libprojects.la ../libk3bdevice/libk3bdevice.la ../libk3b/libk3b.la ./misc/libmisc.la -lkio -lkparts -lm -L/usr/kde/3.5/lib64 -L/usr/qt/3/lib64 -L/usr/lib64 sleeping... /usr/lib64/libdbus-qt-1.so: undefined reference to `DBusQt::Internal::Integrator::readReady()' /usr/lib64/libdbus-qt-1.so: undefined reference to `vtable for DBusQt::Connection' /usr/lib64/libdbus-qt-1.so: undefined reference to `DBusQt::Internal::Timeout::timeout(DBusTimeout*)' /usr/lib64/libdbus-qt-1.so: undefined reference to `vtable for DBusQt::Internal::Integrator' /usr/lib64/libdbus-qt-1.so: undefined reference to `DBusQt::Internal::Integrator::newConnection(DBusQt::Connection*)' /usr/lib64/libdbus-qt-1.so: undefined reference to `vtable for DBusQt::Internal::Timeout' /usr/lib64/libdbus-qt-1.so: undefined reference to `vtable for DBusQt::Server' collect2: ld returned 1 exit status make[3]: *** [k3b] Error 1 make[3]: Leaving directory `/var/tmp/portage/k3b-0.12.5/work/k3b-0.12.5/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/var/tmp/portage/k3b-0.12.5/work/k3b-0.12.5/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/var/tmp/portage/k3b-0.12.5/work/k3b-0.12.5' make: *** [all] Error 2 !!! ERROR: app-cdr/k3b-0.12.5 failed. !!! Function kde_src_compile, Line 168, Exitcode 2 !!! died running emake, kde_src_compile:make !!! If you need support, post the topmost build error, NOT this status message.
Created attachment 71278 [details, diff] wait on the correct pid. mypid during logfile is [tee process, ebuild.sh process] The code is explicitly testing mypid[-1] as the correct error code. This is the standard for pipe code (last exit code) That said, the waitpid return code doesn't totally do that... if the equivalent of false | true were ran, it would pick up the exit code from false, while standard shell doesn't. Digressing a bit, but pointing it out since it's sort of $PIPESTATUS without the usual $? exit code norms for shell. Either way, patch attached reverses pid ordering so mypid reflects what logfile is really doing; effectively ebuild.sh | tee some-log The change in pid ordering results in spawn watching the tee process, instead of ebuild.sh Should fix it.
Sidenote... I really don't want to see time delays introduced into waxing the processes. If the last pid has returned, anything that still is surviving in the chain should be waxed with no delay (the pipe of processing is horked, kill off any processes that are still alive in the chain).
If you don't want any delays, any processes left in the pipe should just be sent a SIGKILL. Sending a SIGTERM and then not waiting for the receiving process to act on it is pointless. Also, using os.waitpid(x, os.WNOHANG) == (0,0) instead of os.waitpid(x, os.WNOHANG)[1] == 0 allows you to get rid of the try/except wrappers. Sidenote, I've said all this on the mailing list already but seeing as they seem to be down... :/
defensive coding for the try/except is the only benefit, would protect the scenario were a pid has been harvested already; that's pretty much a bug if it occurs though, moreso, I can't think of any valid reason for it to occur that's not a bug. regarding delay... could do a run through of known pids, sigterm 'em, then induce a minimal delay (whatever is required to sleep the process and induce a context switch), then try harvesting/killing. Main concern was with the .5 delay, which in hindsight was obviously testing only.
I'm seeing this behavior on a SPARC box with several packages as well.
Created attachment 73618 [details, diff] wait on the pid of tee re comment 27: that doesn't work, it breaks subsequent logic that expects the pid from fork to be the last pid in mypids. mypid indexing needs to be changed throughout, or just altered at the waitpid loop.
*** Bug 114264 has been marked as a duplicate of this bug. ***
*** Bug 114280 has been marked as a duplicate of this bug. ***
I'm beginning to think, that this issue is somehow related to PORT_LOGDIR being set. When I enable this setting/feature, I'm seeing those issues (sometimes). With PORT_LOGDIR being disabled, I never see this AFAICT.
As noted in comment #25, the problem is that "tee" is killed before it's buffer is flushed. Normally, output is only piped through "tee" when PORT_LOGDIR is enabled.
Released in 2.1_pre1.