i have gentoo ~x86, with nptl enabled. when using muine, it frequently freezes (like when i import my whole music library (around 5-10gig)). i recompiled my glibc + mono with nptl disabled, and then it worked fine. this happens with the newest packages (mono-beta-3 (mono-0.96)). Reproducible: Always Steps to Reproduce: 1.rm -r ~/.gconf/apps/muine ~/.gnome2/muine 2.start muine 3.import a BIG music directory (5-10gigs) Actual Results: muine start to import the music, but then freezes somewhere... Expected Results: it should import it the insteresting thing is, that i had a nptl enabled system for a long time, and when i used mono-beta-1 with muine, it worked fine, and i had nptl enabled at that time. it got wrong when i installed mono-beta-2, and it is still bad with mono-beta-3.
there's a discussion on the mono-devel list about this problem, maybe it's of some help... the link to the beginning of the thread: http://lists.ximian.com/archives/public/mono-devel-list/2004-June/006183.html
Hi Paolo, I tried what you described in the mailinglist. (gdb) thread apply all bt (gdb) bt #0 0xffffe410 in ?? () #1 0xbfffeca4 in ?? () #2 0x40d08100 in ?? () from /lib/libc.so.6 #3 0x00000008 in ?? () #4 0x40c28eff in sigsuspend () from /lib/libc.so.6 #5 0x40138108 in GC_end_blocking () from /usr/lib/libmono.so.0 #6 <signal handler called> #7 0xffffe410 in ?? () #8 0xbffff018 in ?? () #9 0x0000004d in ?? () #10 0x00000000 in ?? () #11 0x40bce810 in pthread_cond_timedwait () from /lib/libpthread.so.0 #12 0x40111506 in mono_method_full_name () from /usr/lib/libmono.so.0 #13 0x4011f704 in mono_once () from /usr/lib/libmono.so.0 #14 0x4011ff65 in mono_once () from /usr/lib/libmono.so.0 #15 0x400dc206 in mono_install_thread_callbacks () from /usr/lib/libmono.so.0 #16 0x08120d90 in ?? () #17 0x00000001 in ?? () #18 0xffffffff in ?? () #19 0x00000000 in ?? () #20 0x4000ae60 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2 #21 0x400dc45a in mono_thread_manage () from /usr/lib/libmono.so.0 Previous frame inner to this frame (corrupt stack?)
This is the second thread just-in-case ;) #0 0xffffe410 in ?? () #1 0xbfffeb88 in ?? () #2 0xffffffff in ?? () #3 0x00000003 in ?? () #4 0x40ca7d89 in poll () from /lib/libc.so.6 #5 0x401d3022 in g_main_loop_get_context () from /usr/lib/libglib-2.0.so.0 #6 0x0805c620 in ?? () #7 0x00000003 in ?? () #8 0xffffffff in ?? () #9 0x401d1e6f in g_main_context_query () from /usr/lib/libglib-2.0.so.0 #10 0x00000003 in ?? () #11 0x00000003 in ?? () #12 0x0805c620 in ?? () #13 0x401d252f in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #14 0x080528d8 in ?? () #15 0xffffffff in ?? () #16 0x7fffffff in ?? () #17 0x0805c620 in ?? () #18 0x00000003 in ?? () #19 0x080528d8 in ?? () #20 0xbfffecd8 in ?? () #21 0x401a59bf in ?? () from /usr/lib/libgthread-2.0.so.0 #22 0x00000000 in ?? () #23 0xbfffebf4 in ?? () #24 0x402271c0 in g_thread_use_default_impl () from /usr/lib/libglib-2.0.so.0 #25 0x402271b8 in g_ascii_table () from /usr/lib/libglib-2.0.so.0 #26 0x40227da0 in _g_debug_flags () from /usr/lib/libglib-2.0.so.0 #27 0xffffffff in ?? () #28 0x7fffffff in ?? () #29 0x4022733c in ?? () from /usr/lib/libglib-2.0.so.0 #30 0x080528d8 in ?? () #31 0x401963e0 in mono_debugger_class_init_func () from /usr/lib/libmono.so.0 #32 0xbfffecd8 in ?? () #33 0x401d2826 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #34 0x080528d8 in ?? () #35 0x00000001 in ?? () #36 0x00000001 in ?? () #37 0x0804f050 in ?? () #38 0x40194818 in ?? () from /usr/lib/libmono.so.0 #39 0x080528d8 in ?? () #40 0x401963e0 in mono_debugger_class_init_func () from /usr/lib/libmono.so.0 #41 0x4010d103 in mono_method_full_name () from /usr/lib/libmono.so.0 Previous frame inner to this frame (corrupt stack?)
you should probably rebuild with 'inherit debug' added, so the bt's get a bit more useful.
inherit debug...is it an USE flag? or a ./configure switch? for mono? for muine?
I'll do that foser.
foser - added debug to the inherit clause on the first line in the mono ebuild. Should it be placed in any particular order? I placed it last now and I didn't get anymore output when using gdb.
in theory it shouldn't matter, not sure.. might be that you need to compile several packages to get relevant output. you probably use omit-frame-pointer ? That's not helpful
Nope I don't use omit-fram-pointer. I did a FEATURES=nostrip USE=debug. Something is seriously borked... (gdb) run Starting program: /usr/bin/mono /usr/lib/muine/muine.exe warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. Detaching after fork from child process 7563.
i got these traces: #0 0xffffe410 in ?? () #1 0xbffff044 in ?? () #2 0x40d38860 in ?? () from /lib/libc.so.6 #3 0x00000008 in ?? () #4 0x40c53667 in sigsuspend () from /lib/libc.so.6 #5 0x4014c48f in GC_end_blocking () from /usr/lib/libmono.so.0 #6 <signal handler called> #7 0xffffe410 in ?? () #8 0xbffff3b8 in ?? () #9 0x000000c8 in ?? () #10 0x00000000 in ?? () #11 0x40bf7ae0 in pthread_cond_timedwait () from /lib/libpthread.so.0 #12 0x40122798 in mono_method_full_name () from /usr/lib/libmono.so.0 #13 0x40131920 in mono_once () from /usr/lib/libmono.so.0 #14 0x401320b3 in mono_once () from /usr/lib/libmono.so.0 #15 0x400e75d1 in mono_thread_manage () from /usr/lib/libmono.so.0 #16 0x400c3b02 in mono_runtime_exec_managed_code () from /usr/lib/libmono.so.0 #17 0x4007bd70 in mono_main () from /usr/lib/libmono.so.0 #18 0x08048f2b in main () and #0 0xffffe410 in ?? () #1 0xbfffea48 in ?? () #2 0xffffffff in ?? () #3 0x00000003 in ?? () #4 0x40cd5b8d in poll () from /lib/libc.so.6 #5 0x401ea886 in g_main_loop_get_context () from /usr/lib/libglib-2.0.so.0 #6 0x401ec3fc in g_idle_remove_by_data () from /usr/lib/libglib-2.0.so.0 #7 0x401e9fad in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #8 0x4011cc2a in mono_method_full_name () from /usr/lib/libmono.so.0 #9 0x4012c470 in mono_once () from /usr/lib/libmono.so.0 #10 0x401235d2 in mono_method_full_name () from /usr/lib/libmono.so.0 #11 0x40129b61 in mono_once () from /usr/lib/libmono.so.0 #12 0x40120906 in mono_method_full_name () from /usr/lib/libmono.so.0 #13 0x4012ff33 in mono_once () from /usr/lib/libmono.so.0 #14 0x4012f00d in mono_once () from /usr/lib/libmono.so.0 #15 0x40101643 in mono_assembly_loaded () from /usr/lib/libmono.so.0 #16 0x4010138e in mono_assembly_load () from /usr/lib/libmono.so.0 #17 0x4010213a in mono_init () from /usr/lib/libmono.so.0 #18 0x40069352 in mono_codegen () from /usr/lib/libmono.so.0 #19 0x4007b6bd in mono_main () from /usr/lib/libmono.so.0 #20 0x08048f2b in main () is this of any help?
hmm...errr..these stacktraces were from a from-cvs compiled muine..i'll check the emerged muine, if the traces differ
if i use export GC_DONT_GC=1 before starting muine, then it works ok... so the problem seems to be somewhere with the gc. (yes, this is also mentioned in the mono-devel-list thread)
ok, i tried also with the emerge muine (0.6.3). the stack traces are the same as what i reported (with muine-cvs)
I opened a bug at bugs.ximian.com. http://bugs.ximian.com/show_bug.cgi?id=60576
To add to this, I've *not* been having any problems with muine and NPTL enabled glibc. I'm using x86 base, with only the mono items as ~x86. If someone can/has the time, can they please also test this combination and see if things work? If so, we can at least have a starting point to see where these problems might have started appearing in our toolchain. thanks
Okay, CCing the toolchain folks as this seems to be related to glibc versions and NPTL. you guys got any input?
Scratch that again. i *am* having these problems. I've added more info to the ximian bug, but their motivation to look into it is obviously low.
This bug is not going to solve itself any time soon. I recommend people either don't bother with mono, or disable NPTL from glibc.
i updated to newest ~x86: glibc-2.3.4.20040619 linux26-headers-2.6.6-r1 ck-sources-2.6.7-r1 mono-1.0 (muine from cvs, but should be pretty much like muine-0.6.3) and now it imported my music directory :))) this was always my test for muine... i will test it more today to see whether it's freezing or not. could other people also make the upgrade and see if it helps?
Hey Gabor, great news, last time i had tried the ~x86 glibc it still had issues. Does further testing still show this seeming to be resolved? I've just upgraded to those 2.6 headers, i'll hopefully be upgrading and testing the new glibc this afternoon/tonight. Please report back if you still have it working or if it's freaking out. Did you change any CFLAGS/stripping/etc ?
i've listened to music for some hours, and it did not deadlock yet ;) my config: bash-2.05b# emerge info Portage 2.0.50-r9 (default-x86-2004.0, gcc-3.3.3, glibc-2.3.4.20040619-r0, 2.6.7-ck1) ================================================================= System uname: 2.6.7-ck1 i686 Intel(R) Pentium(R) M processor 1400MHz Gentoo Base System version 1.5.1 ccache version 2.3 [enabled] Autoconf: sys-devel/autoconf-2.59-r4 Automake: sys-devel/automake-1.8.5-r1 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-O3 -march=pentium3 -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -march=pentium3 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://gentoo.inode.at" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/home/portage" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage /usr/local/bmg-main /usr/local/gnome-2.7.1" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X acpi alsa apache2 avi berkdb canna cdr cjk composite crypt cups divx dri dvd dvdr encode faad flac foomaticdb gdbm gif gnome gnutls gpm gstreamer gtk gtk2 imlib jpeg ldap libg++ libwww mad mikmod mmx mono mozilla moznocompose moznoirc moznomail mozsvg mpeg ncurses nls nptl oggvorbis opengl pam pdflib png python qt quicktime readline samba sdl slang sse ssl tcpd truetype unicode usb x86 xml2 xmms xprint xv xvid zlib"
Okay, things are definitely not fixed for me using that version of glibc and linux26-headers. gabor, can you please double check and stress test the import on some big*ss media directories and see if it is indeed fixed for you? If so we need to start looking at other places this problem might be originating.
it still works :) i tried to import my music folder (7.5 gig (around 800megs of ogg, 200megs of mpc(well,they are ignored imho :-), and the rest are mp3) first i tried with my a very-little-modified from-cvs muine. it worked ok. then i tried with a clean from-cvs muine. it worked ok. then i emerge muine-0.6.3. and tried to import the music folder 3 times. it worked ok. between the imports i stopped muine, killed gconfd, and removed ~/.gconfs/apps/muine and ~/.gnome2/muine . the cvs versions used the xine backend, and the emerged version used the gstreamer backend, but it should not matter by the indexing part. muine USE flags: +flac +mad +oggvorbis -xine i'll attach my list of installed programs.
Created attachment 35265 [details] my-installed-packages.txt a list of my installed packages (got it with "epm -qa")
Can you also post the output of "emerge info" from your working configuration? Thanks.
nm, you already did. d'oh.
This seems to be a gentoo only bug, are we the only of the NPTL distros using 2.3.4 glibc? Does anyone know if NPTL 2.3.3 is ok? I looked at the patches applied to glibc, and its fairly generic so I guess its possible that it could be somewhere else. I hope this gets fixed soon, I want to play with mono but I don't have the time to recompile my system w/o NPTL.
as you can see nptl with glibc-2.3.4 works for me. try to update to lastes ~x86 (if it's possible for you), and try it out (import a BIG (several gigs) music repository into muine).
I've got 2.3.4, but still no go. I can't get it to work without disabling GC.
ok guys, it is not working again ;) i don't really know what happened, but today i reorganized my mp3's, and deleted muine's config files, and reimported the music, and it froze again ;(((( i don't really know why because last time it worked, and i tested it a lot ;((
to complicate things more: today i imported my music dir 2x (deleting the config files between the imports), and it worked fine. so it seems that: either a. something changed on my compouter between today and yesterday or b. the muine-import-my-whole-music-dir test i used to test muine stability is not a good test anymore. in the past muine froze reliably when doing this test:( now it works fine, but yesterday it froze once.
Ok, my problem was slightly different to most people's here (in that I couldn't even start muine; it just returned to the command line with no error or comment) but I have found a way to get it to start ok which may help. Attached are emerge-info.txt, which shows the results of calling `emerge info`, and /var/log/emerge.log. To get NPTL enabled (and checking /lib/libc.so.6 confirms that NPTL is enabled) I put 'nptl' in my USE flags and ran /usr/portage/scripts/bootstrap-2.6.sh. The script fails part way along due to virtual OS headers (on my box at least) but I then re-emerged glibc and mono and everything worked fine. This trick was pointed out to me by a friend, so I'm not sure how well it works long term. I have to add, this was on a *running* system. I hope this will help.
Created attachment 35766 [details] Emerge info and log for a (probably) working system
Okay, i'm just too d*mn stubborn to let this go. Starting this evening, i'm recompiling glibc with NPTL support, and compiling mono from CVS from the date -D "20 May 2004" (a few days after the newer libgc was merged to mono). Every day/evening possible, i will be trying a new date(s) from mono's CVS, to track down exactly when this problem occured, to at least give a patchset that is responsible for the breakage. Yes, this is ugly and brute force, but as my knowledge of NPTL/thread/GC issues is minimal, and nobody who knows this stuff seems cares about this bug, i'm going with what i've got. If i can find where the problem is introduced, i can at least focus my efforts on something (and maybe learn a little in the process). PROCESS: Anybody who wants this fixed can help me do this: 1) emerge glibc with USE="nptl" 2) Grab a CVS snapshot from some date that has not yet been reported here yet, preferably sometime around the libgc update which was on 2004-05-18. 3) Compile it into /opt/mono or somewhere so you don't zap your real install (not that it matters since your installed mono with have the NPTL problem anyway) by doing ./configure --prefix=/opt/mono --enable-nptl=yes make make install 4) grab the simple thread test from Chris Haydens comment on http://bugzilla.ximian.com/show_bug.cgi?id=60576 (reproduced here to make life easy): using System; using System.Threading; class Test { public static void Main( String[] args ) { int i = 0; while( true ) { Thread t = new Thread( new ThreadStart(Blah) ); t.Start(); i++; Console.WriteLine( i+" threads" ); } } private static void Blah() { Console.WriteLine( "starting thread" ); } } change your PATH to reference /opt/mono ("export PATH="/opt/mono/bin:$PATH") and compile the above with "mcs Foo.cs". Then run it by doing "mono Foo.exe". If you have problems, the program should just hang at some point. If it doesn't hang, and keeps just happilly spitting out thread started messages, you probably have a working NPTL enabled mono from some date. 5) Post to this bug stating the exact date of the pull you used, and whether it worked or not. *IMPORTANT* Once we find a limit date where at one date it works and some other date it doesn't, we obviously will be focusing on dates *between* those two. When we get to that point. start testing those dates, and don't bother reporting any other dates. When we manage to track down the exact day/patchset where things broke, we'll go from there. BONUS: To whomever finds the exact date on which things went from working to not working with regards to this NPTL/GC bug, i will personally buy you a beer (if not many, many beers) the next time you are in new york city. POSSIBILITY: If i find time and it seems worth it, I may script this up to automate the cvs-update/build/test/lather/rinse/repeat stuff.
Okay, on a side note, i've just commited 1.0.1-r1 and 1.0.1-r2 versions of mono. -r1 has the NPTL support removed, and dies nastily if you try to compile it with a NPTL glibc. -r2 is the same as 1.0.1, but package.masked. It includes the NPTL support. In order not to let this bug hold up stabling mono for the linuxthreads folks, i've done the above setup. People can still test/work on this bug using -r2, but we have a valid candidate in the -r1 for a stable mono and friends.
Peter - please add some keywords so this shows up on a mono+nptl search
Scott: The allowed keywords are fixed, neither nptl nor mono is in it. I'll add mono to the subject though.
if you guys want to check to see if a system is using nptl or linuxthreads all you have to do is run `/lib/libc.so.6` ... bug like one guy said on the ximian bugzilla, he's seen this on non-gentoo nptl-enabled systems
Ok, I've spent few hours on this issue and found where REAL problem lies. Good news: it's NOT Boehm's GC and it's NOT mono. Bad news: it's problem with glibc itself :-( I've started with "GC is broken with nptl" sample by "Peter Johanson" and played with it for few hours. In the end I've just removed Boehm's GC completely (just plain old malloc) and... it still deadlocks somewhere. So we should stop playing with mono and try to address REAL issue: deadlocks somewhere in nptl library itself :-( Unfortunatelly I'm not glibc guru. Program: #include <pthread.h> #include <stdio.h> void *thread_function (void *args) { int j = 0; char *str; printf("starting thread!\n"); for (j; j < 24; j++) { str = (char *)malloc(240); printf("malloc in thread !\n"); } pthread_yield(); } int main (void) { int i; pthread_t thread; char *str; for (i=0;i<1000;i++) { pthread_create( &thread, NULL, thread_function, (void *)i); pthread_yield(); str = (char *)malloc(240); printf("%d threads\n", i); } sleep(10); } $ ./testpgm | grep 'malloc in thread' | wc -l 9168 Without ntpl I've got expected 24000 ... Portage 2.0.50-r11 (default-x86-2004.2, gcc-3.3.4, glibc-2.3.4.20040808-r0, 2.6.9-rc1-mm4) ================================================================= System uname: 2.6.9-rc1-mm4 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz Gentoo Base System version 1.5.3 Autoconf: sys-devel/autoconf-2.59-r4 Automake: sys-devel/automake-1.8.5-r1 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-O2 -pipe -march=pentium4 -funroll-loops -ffast-math -fomit-frame-pointer -ffloat-store -fforce-addr -ftracer -mmmx -msse -msse2 -mfpmath=sse" CHOST="i686-pc-linux-gnu" COMPILER="" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O2 -pipe -march=pentium4 -funroll-loops -ffast-math -fomit-frame-pointer -ffloat-store -fforce-addr -ftracer -mmmx -msse -msse2 -mfpmath=sse" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="ftp:///ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://vlaai.snt.ipv6.utwente.nl/pub/os/linux/gentoo/ ftp://ftp6.uni-erlangen.de/pub/mirrors/gentoo http://gentoo.spb.ru/rsync" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X apm arts avi berkdb bitmap-fonts crypt cups encode foomaticdb gdbm gif gnome gpm gtk gtk2 imlib jpeg kde libg++ libwww mad mikmod motif mpeg ncurses nls nptl oggvorbis opengl oss pam pdflib perl png python qt quicktime readline sdl slang spell ssl svga tcpd truetype x86 xml2 xmms xprint xv zlib"
*** This bug has been marked as a duplicate of 63734 ***
Moved this to a new bug which focuses on the "real" issue here. Thanks to Canal for pointing us in the right directions.
I've attached a patch to upstream (http://bugs.ximian.com/show_bug.cgi?id=60576 for the lazy) which should fix it. I think what is happening is this: libgc is being build with -fexceptions (C++-compatible exceptions handling). This clobbers the stack unwinding in libpthread used by pthread_cleanup_{push,pop}. As a result the thread cleanup handler in libgc does not remove dead threads from the list of active threads. As a result dead threads are signalled times (during global stop for garbage collection) and expected to signal back, so the garbage collector deadlocks waiting for non-existent threads to signal back. (Actually it's a little more involved, but you get the picture.) As you can probably expect, that took quite a lot of debugging. gdb is not good at debugging zombie threads.
Additional note: from upstream it appears that there is a different bug that affects gcc 3.3. My system is gcc-3.4.2-r2, glibc-2.3.4.20040808.