Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 44875 - distcc stalling due to client not sending DONE
Summary: distcc stalling due to client not sending DONE
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Development (show other bugs)
Hardware: All All
: High normal
Assignee: Lisa Seelye (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-03-16 11:43 UTC by Andrew Gaffney (RETIRED)
Modified: 2004-08-17 16:23 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Gaffney (RETIRED) gentoo-dev 2004-03-16 11:43:05 UTC
I setup Gentoo on my Thinkpad 770 that I just got. I'm using distcc to do most of the compiling on my desktop machine (Athlon 1.3GHz w/768MB RAM). I'm running the same version of GCC, binutils, etc. on both machines.

Most compiles work just fine. Every once in a while (12 times in one day during bootstrap and emerge system) the compile will just hang on the laptop. It looks like the compile finished on the desktop (no cc or gcc processes running) and was sent back to the laptop, but the desktop didn't send the DONE signal. The network connection remains open, so the laptop sits there and waits indefinately for the DONE signal. If I run '/etc/init.d/distccd restart' on the desktop, it breaks the network connection and I get an error on the laptop that the file descriptor (socket) was unexpectedly closed before it got the DONE token from the desktop. It then proceeds to compile locally.

I was able to hack my around this by opening an xterm on the desktop and running 'while `sleep 30`; do /etc/init.d/distccd restart; done', but this is very definately a nasty hack and it will restart distcc in the middle of compiles.
Comment 1 Andrew Gaffney (RETIRED) gentoo-dev 2004-03-18 08:44:41 UTC
Additional info:

I had originally thought the problem was because of the PCMCIA NIC I was using, but the problems persist with my wireless NIC. It did it during the install with the LiveCD's 2.4 kernel and now with my 2.6.4 kernel.

Is there any additional information I can give to aid in tracking down this problem?
Comment 2 Lisa Seelye (RETIRED) gentoo-dev 2004-03-18 11:47:15 UTC
Follow the relevent bullets on this document and paste the results.

http://distcc.samba.org/problems.html

Include what kernel and NIC modules you're using on each system.
Comment 3 Andrew Gaffney (RETIRED) gentoo-dev 2004-03-18 11:59:29 UTC
Laptop:
gcc 3.3.3
one NIC used the SMC PCMCIA something or other driver and my current one uses the madwifi driver
distcc 2.13

Desktop:
dmfe driver for NIC
gcc 3.3.3
distcc 2.13

I can't duplicate the error message right now, since I'm in the middle of a Mozilla build that's taken over 24 hours already (233MHz processor).
Comment 4 Lisa Seelye (RETIRED) gentoo-dev 2004-03-21 13:39:04 UTC
Are you able to duplicate the error now?
Comment 5 Brian Jackson (RETIRED) gentoo-dev 2004-03-22 10:16:35 UTC
Could you post emerge info for both boxes? Tried changing network cables (or nics in the desktop)?
Comment 6 Andrew Gaffney (RETIRED) gentoo-dev 2004-03-22 10:33:28 UTC
Desktop:
Portage 2.0.50-r1 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.6.3-gentoo)
=================================================================
System uname: 2.6.3-gentoo i686 AMD Athlon(tm) Processor
Gentoo Base System version 1.4.3.13p1
distcc 2.13 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-march=athlon-tbird -O2 -pipe -fomit-frame-pointer -mmmx -m3dnow"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.1/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=athlon-tbird -O2 -pipe -fomit-frame-pointer -mmmx -m3dnow"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://gentoo.noved.org/ http://gentoo.chem.wisc.edu/gentoo/ http://mirror.tucdemonic.org/gentoo/ http://gentoo.ccccom.com"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow X aalib aim alsa apm avi berkdb cdr crypt cups dga dvd encode esd ethereal faad ffmpeg foomaticdb gatos gdbm gif gpm gtk gtk2 imap imlib java joystick jpeg libg++ libwww mmx moznoirc mpeg msn ncurses nptl oggvorbis opengl oss pam pdflib perl png python quicktime readline samba sdk sdl slang snmp spell sse ssl svga tcltk truetype v4l x86 xml xml2 xmms xv xvid yahoo zlib"


Laptop:
Portage 2.0.50-r1 (default-x86-2004.0, gcc-3.3.3, glibc-2.3.3_pre20040207-r0, 2.6.4-gentoo)
=================================================================
System uname: 2.6.4-gentoo i586 Mobile Pentium MMX
Gentoo Base System version 1.4.3.13p1
distcc 2.13 i586-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=pentium -fomit-frame-pointer -pipe -mmmx"
CHOST="i586-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=pentium -fomit-frame-pointer -pipe -mmmx"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distcc sandbox"
GENTOO_MIRRORS="http://gentoo.oregonstate.edu http://distro.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X alsa apm avi berkdb cups encode esd foomaticdb gdbm gif gpm gtk gtk2 imlib jpeg libg++ libwww mad mikmod mmx motif mozilla moznocompose moznoirc moznomail mpeg ncurses nls nptl oggvorbis oss pam pdflib perl png python quicktime readline sdl slang spell ssl svga tcpd tpctlir truetype x86 xml2 xmms xv zlib"


I don't have another cable/NIC I can use for the desktop, but it appears to be working correctly, although all those collisions worry me a bit.

eth0      Link encap:Ethernet  HWaddr 00:80:AD:04:7B:63  
          inet addr:192.168.0.3  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2989512 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2764357 errors:0 dropped:0 overruns:0 carrier:0
          collisions:12579 txqueuelen:1000 
          RX bytes:952605205 (908.4 Mb)  TX bytes:288042987 (274.6 Mb)
          Interrupt:11 Base address:0xc000 

upstairs root # uptime
 12:24:35 up 16:00,  3 users,  load average: 0.21, 0.31, 0.25
Comment 7 Andrew Gaffney (RETIRED) gentoo-dev 2004-03-22 11:07:56 UTC
I'm trying to reproduce the problem now to get some actual output. It doesn't always do it. I saw something else that worried me. I ping'ed by desktop machine from the laptop (w/ wireless NIC). The first 7 packets were lost and everything after that got a response. 'ifconfig ath0' on the laptop didn't show any packet errors or collisions. I'd want to lean towards a driver problem with the card, but it did it with my wired PCMCIA NIC also.
Comment 8 Brian Jackson (RETIRED) gentoo-dev 2004-03-22 12:08:19 UTC
I just checked all my boxes here, none of them have any collisions (even the server that's public IP is exposed to my isp that gives off horrible traffic). You should probably try tracking down what's causing those (google should be a good thing to check for that).
Comment 9 Andrew Gaffney (RETIRED) gentoo-dev 2004-03-22 12:23:47 UTC
The common piece of hardware is my Speedstream DSL modem/router. I can't really replace that. Although, it is a few years old and runs at full bandwidth about 24/7.
Comment 10 Andrew Gaffney (RETIRED) gentoo-dev 2004-04-07 11:01:18 UTC
I've tried to replicate this bug again. I see it happening, but it's not giving the error anymore when I do '/etc/init.d/distcc restart' on my desktop. The compile on the laptop is still stalling, though.
Comment 11 Lisa Seelye (RETIRED) gentoo-dev 2004-08-15 16:53:54 UTC
is this still happening?
Comment 12 Andrew Gaffney (RETIRED) gentoo-dev 2004-08-17 16:23:09 UTC
I haven't compiled anything on my laptop since my last comment. I can give it another go in the next few days, but the problem is intermittent.