This issue popped up after I moved to linux26-headers, remerged glibc and remerged lwp/rvm/rpc2/coda. I'm not sure what is the actual cause. It may not even be coda's fault, but until I have done some debugging and find some solid evidence that it isn't I will keep this bug assigned to myself (as coda maintainer). Anyone willing to help me track down the root cause of this problem can use this bug report to add comments. To try out coda-6.0.6 with linux26-headers, replace the following dependency in the coda ebuild: >=sys-kernel/linux-headers-2.4 with this one: virtual/os-headers
Are you referring to venus or vice? And what kind of time frame are we talking about before the failure? I've had venus running for about 10 mins (like you, linux26-headers and remerged glibc) and it seems ok.
Time frame is maybe 2 seconds. I noticed a lot of warnings during compilation. Did you see any? I'm using gcc 3.4. Tonight I will debug some more and hopefully figure out what's causing it.
Created attachment 36422 [details] STDERR output of coda emerge I'm using gcc-3.3.3-r6. There are some warning during the compile, but no more than I see from 90% of compiles. :P I'm not a C programmer, though, so what do I know? For your comparison pleasure, I've attached the STDERR of the emerge (do glance at the odd mv error on the last line). BTW, you didn't answer if you were referring to venus (client) or vice (server), or both. I've had venus running for 12 hours now. Seems fine in my environment (like I said in Bug #57996, though, I'm not actively using it...I'm just keeping it merged for the time being to help you test).
Created attachment 36425 [details] STDERR output of coda emerge Oops...mis-mimed it. *blush*
I was referring to vice by the way. This is what I normally see in the log: [...] 22:18:12 Attached 1 volumes; 0 volumes not attached lqman: Creating LockQueue Manager.....LockQueue Manager starting ..... 22:18:12 LockQueue Manager just did a rvmlib_set_thread_data() done 22:18:12 CallBackCheckLWP just did a rvmlib_set_thread_data() 22:18:12 CheckLWP just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 0 just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 1 just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 2 just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 3 just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 4 just did a rvmlib_set_thread_data() 22:18:12 ServerLWP 5 just did a rvmlib_set_thread_data() 22:18:12 ResLWP-0 just did a rvmlib_set_thread_data() 22:18:12 ResLWP-1 just did a rvmlib_set_thread_data() 22:18:12 VolUtilLWP 0 just did a rvmlib_set_thread_data() 22:18:12 VolUtilLWP 1 just did a rvmlib_set_thread_data() 22:18:12 Starting SmonDaemon timer 22:18:12 File Server started Thu Jul 29 22:18:12 2004 And the last line I see in the log when it crashes: 22:40:56 Attached 1 volumes; 0 volumes not attached There is nothing in the SrvErr log. Some more info (what it was compiled with between parenthesis): glibc (linux-headers + gcc3.4.1) + coda (linux-headers + gcc3.3.3) -> ok glibc (linux-headers + gcc3.4.1) + coda (linux-headers + gcc3.4.1) -> ok glibc (linux-headers + gcc3.4.1) + coda (linux26-headers + gcc3.4.1) -> ok glibc (linux26-headers + gcc3.3.3) + coda (linux26-headers + gcc3.3.3) -> ok glibc (linux26-headers + gcc3.3.3) + coda (linux26-headers + gcc3.4.1) -> ok glibc (linux26-headers + gcc3.4.1) + coda (linux26-headers + gcc3.3.3) -> fail glibc (linux26-headers + gcc3.4.1) + coda (linux26-headers + gcc3.4.1) -> fail Apparently the problem is triggered when using glibc compiled with linux26-headers and gcc 3.4.1 That's all I have time for today.
The segfault occurs during initialisation of the LockQueue Manager. When the stack for the LQM lwp is about to be mmapped, lwp_stackbase is set to 0x15027000. The stack size is 0x2000. Here is the stack trace at the moment of the crash. I prefixed each line with the value of the stack pointer: oops | V 0x15026874 #0 _IO_vfprintf (s=0x402b7de0, format=0x8110a08 "LockQueue Manager starting .....\n", ap=0x15028f2c ",-./012345678 9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmn o
The segfault occurs during initialisation of the LockQueue Manager. When the stack for the LQM lwp is about to be mmapped, lwp_stackbase is set to 0x15027000. The stack size is 0x2000. Here is the stack trace at the moment of the crash. I prefixed each line with the value of the stack pointer: oops | V 0x15026874 #0 _IO_vfprintf (s=0x402b7de0, format=0x8110a08 "LockQueue Manager starting .....\n", ap=0x15028f2c ",-./012345678 9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmn oªÓ\t@0\177\024\b") at vfprintf.c:185 0x15028f10 #1 0x401ecb14 in printf (format=0x402b7de0 "\200<û") at printf.c:34 0x15028f28 #2 0x080c1a1d in lqman::func (this=0x816b008) at lockqueue.cc:92 0x15028fc8 #3 0x080c1845 in LQman_init (c=0x816b008) at lockqueue.cc:65 0x15028fd8 #4 0x4009cc64 in Create_Process_Part2 () at lwp.c:796 0x15028ff8 #5 0x4009dd1f in L1 () at process.S:455 Here is the code of vfprintf() when compiled with 2.6.7-rc4 kernel headers and gcc 3.4: 0x401e3e82 <_IO_vfprintf+1>: mov %esp,%ebp 0x401e3e84 <_IO_vfprintf+3>: push %edi 0x401e3e85 <_IO_vfprintf+4>: push %esi 0x401e3e86 <_IO_vfprintf+5>: push %ebx 0x401e3e87 <_IO_vfprintf+6>: call 0x401be346 <__i686.get_pc_thunk.bx> 0x401e3e8c <_IO_vfprintf+11>: add $0xd623c,%ebx 0x401e3e92 <_IO_vfprintf+17>: sub $0x2688,%esp 0x401e3e98 <_IO_vfprintf+23>: movl $0x0,0xffffda98(%ebp) 0x401e3ea2 <_IO_vfprintf+33>: call 0x401be678 <*__GI___errno_location> 0x401e3ea7 <_IO_vfprintf+38>: mov (%eax),%eax Here is the same code when compiled with 2.6.7-rc4 headers and gcc 3.3: 0x401e1a17 <vfprintf+0>: push %ebp 0x401e1a18 <vfprintf+1>: mov %esp,%ebp 0x401e1a1a <vfprintf+3>: push %edi 0x401e1a1b <vfprintf+4>: push %esi 0x401e1a1c <vfprintf+5>: push %ebx 0x401e1a1d <vfprintf+6>: call 0x401bd2b8 <__i686.get_pc_thunk.bx> 0x401e1a22 <vfprintf+11>: add $0xcf666,%ebx 0x401e1a28 <vfprintf+17>: sub $0x5c4,%esp 0x401e1a2e <vfprintf+23>: movl $0x0,0xfffffb58(%ebp) 0x401e1a38 <vfprintf+33>: call 0x401bd5f4 <__errno_location> 0x401e1a3d <vfprintf+38>: mov 0xc(%ebp),%edi 0x401e1a40 <vfprintf+41>: mov (%eax),%eax
I'm reassigning this to toolchain. By the way, I only just noticed that in the two code listings at the end of my previous comment, one shows _IO_vfprintf while the other shows vfprintf. Looks like different functions are called after all. In any case 9K+ of stack space for local vars alone seems a bit excessive ;) glibc version: glibc-2.3.4.20040619 Guys, if you need any extra info, let me know. Portage 2.0.51_pre13 (default-x86-1.4, gcc-3.4.1, glibc-2.3.4.20040619-r0, 2.6.8-rc2 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz) ================================================================= System uname: 2.6.8-rc2 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz Gentoo Base System version 1.5.1 distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.3 [enabled] Autoconf: sys-devel/autoconf-2.59-r4 Automake: sys-devel/automake-1.8.5-r1 Binutils: sys-devel/binutils-2.14.90.0.8-r1 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-march=pentium4 -O0 -pipe -g3 -ggdb3" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.3/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=pentium4 -O0 -pipe -g3 -ggdb3" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache cvs digest fixpackages noclean nostrip sandbox sign userpriv usersandbox" GENTOO_MIRRORS="http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://ftp.gentoo.skynet.be/pub/gentoo/ http://ftp.uni-erlangen.de/pub/mirrors/gentoo http://mirrors.sec.informatik.tu-darmstadt.de/gentoo http://ftp.easynet.nl/mirror/gentoo/" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/home/griffon26/cvs-wa/gentoo-x86" SYNC="rsync://griffon26.kfk4ever.com/gentoo-portage" USE="X alsa apache2 avi bonobo cdr crypt cscope dedicated dga dvd encode ethereal fastcgi fbcon freetds gd gdbm ggi gif gpm gstreamer gtk gtk2 imlib ipv6 jikes joystick jpeg libwww lirc mad mcal memlimit mikmod mmx motif mozilla mpeg mpi mysql ncurses nls nocd oggvorbis opengl pam pdflib perl plotutils png pnp qt quicktime readline samba sdl slang snmp sse ssl svga tcltk tcpd tiff truetype trusted usb wmf wxwindows x86 xml xml2 xmms xosd xv zlib"
can you rebuild with gcc-3.4.4-r1 and see if it works ? this smells like a bug we fixed with 3.4.4-r1 ...
Come to think of it, I haven't seen this in quite a while. Even though my system has been using 2.6 headers and gcc 3.4 for ages. Must have been fixed.