Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 57706 - random emerge crash under openmosix
Summary: random emerge crash under openmosix
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Cluster Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-07-20 03:53 UTC by kwant
Modified: 2010-09-10 18:59 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
output form omtest (openmosix stress test) (openMosix-stress-test-report-0407201503.txt,484.05 KB, text/plain)
2004-07-20 07:30 UTC, kwant
Details
configuration of kernels: 2.4.26-openmosix-r4 and 2.4.26-openmosix-r5 (kernels_conf.tar.gz,6.52 KB, application/octet-stream)
2004-07-26 10:43 UTC, kwant
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kwant 2004-07-20 03:53:51 UTC
Most apps couldn't be emerged under openmosix. When I stop openmosix everything is ok. Some apps COULD be emerged under openmosix if compilation process didn't migrate to other nodes or there is only one compilation process (MAKEOPTS="-j1" in make.conf file). 


Reproducible: Sometimes
Steps to Reproduce:
1.star openmosix (/etc/init.d/openmosix start)
2.emerge "enything" (emerge vim)
3.

Actual Results:  
emerge vim
produce: (tail of emerge log)
(...)

!!! ERROR: app-editors/vim-6.3 failed.
!!! Function src_compile, Line 260, Exitcode 2
!!! emake failed

------------------------------------------------

emerge screen
produce: (tail of emerge log)
(...)

/usr/lib/portage/bin/emake: line 14: 16375 Segmentation fault    make 
${MAKEOPTS} ${EXTRA_EMAKE} "$@"

!!! ERROR: app-misc/screen-4.0.2 failed.
!!! Function src_compile, Line 84, Exitcode 139
!!! emake failed


------------------------------------------------

emerge distcc
produce: (tail of emerge log)
(...)

make: *** read jobs pipe: No such file or directory.  Stop.
make: *** Waiting for unfinished jobs....

!!! ERROR: sys-devel/distcc-2.13-r1 failed.
!!! Function src_compile, Line 67, Exitcode 2
!!! emake failed




Gentoo Base System version 1.4.16
Portage 2.0.50-r9 (default-x86-2004.0, gcc-3.3.3, glibc-2.3.3.20040420-
r0, .4.26-openmosix-r4)
=================================================================
System uname: 2.4.26-openmosix-r4 i686 Intel(R) Pentium(R) 4 CPU 2.60GHz
distcc 2.13 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
Autoconf: sys-devel/autoconf-2.59-r3
Automake: sys-devel/automake-1.8.3
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-march=pentium4 -O3 -fomit-frame-pointer -mfpmath=sse"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3/s
hare/config
/usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/gconf:/etc/terminfo /etc/env.d"
CXXFLAGS="-march=pentium4 -O3 -fomit-frame-pointer -mfpmath=sse"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://gentoo.prz.rzeszow.pl http://gentoo.zie.pg.gda.pl 
ftp://mirrors.sec.i
nformatik.tu-darmstadt.de/gentoo/ http://gentoo.oregonstate.edu 
http://www.ibiblio.org/pub/L
inux/distributions/gentoo"
MAKEOPTS="-j10"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="NPTL X alsa apache2 apm arts avi berkdb bzlib crypt cups encode esd flash 
foomaticdb gdbm gif gnome gpm gtk gtk2 imlib jack java javascript jpeg kde 
libg++ libwww mad mikmod motif mozilla mpeg ncurses nls oggvorbis opengl oss 
pam pdflib perl php png postgres python qt quicktime readline samba sdl slang 
spell sse ssl svga tcpd truetype x86 xml2 xmms xosd xv zlib"

-----------------------------------------

My openmosix cluster consists of 4 nodes, PIV 2.6GHz HT and 1G RAM, all nodes 
are the same (gentoo instalation has been done by means of hdd replication). 
Application like stress-test works fine, load-balancing works fine.

I didn't test this claster very hard but it seams only emerging/compilation 
crash. I didn't notice any other problem.
Comment 1 kwant 2004-07-20 07:30:08 UTC
Created attachment 35813 [details]
output form omtest (openmosix stress test)

output form openmosix stress test (omtest)
Comment 2 Konstantin Arkhipov (RETIRED) gentoo-dev 2004-07-21 01:00:54 UTC
can you show disassembled output of crashed function?

# ulimit -c unlimited
# emerge somethingBig

after "Segmentation fault", you'll have generated core-dump

# gdb -c core.pid segfaultedProgram

(i bet, then segfaultedProgram will be python)

# disasssemble
Comment 3 Konstantin Arkhipov (RETIRED) gentoo-dev 2004-07-26 03:43:00 UTC
can not reproduce in .26-r5
Comment 4 kwant 2004-07-26 10:37:05 UTC
I test kernels: 
2.4.26-openmosix-r3
2.4.26-openmosix-r4
2.4.26-openmosix-r5

All kernels couses similar problems (described above). 2.4.26-openmosix-r5 additionaly freeze my computer! System hang up (only hard reset helps) during compilation - only the node on which emerging has been started. 

Different types of error appears (see above for three examples). It's not 100% repetitive. Sometimes emerging app finished with succes, sometimes fails.

core inspection:
#emerge screen
(...)
# cd /var/tmp/portage/screen-4.0.2/work/screen-4.0.2/
# gdb -c core.4273
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
Core was generated by `make -j6'.
Program terminated with signal 11, Segmentation fault.
#0  0x4015bcc2 in ?? ()

--------------------------------------------

Unfortuantely I'm not familiar with program debuging :( Should I attache core file? or do something with it? Give me an instruction what should I do.

Kwant!
Comment 5 kwant 2004-07-26 10:43:36 UTC
Created attachment 36209 [details]
configuration of kernels: 2.4.26-openmosix-r4 and 2.4.26-openmosix-r5

This is my kernel configuration on tested claster. Each node has the same
kernel.
Comment 6 Michael Imhof (RETIRED) gentoo-dev 2004-07-27 07:50:25 UTC
I don't know how familiar you are with openmosix, but openmosix is inted for use on ssi clusters.

So when you have different architectures in your cluster you will likely see some problems.

When you have i686 on your main node, but other (older cpus) as nodes when openmosix will migrate the cc calls to the older machines, which can't execute the code and will segfault of course.
Comment 7 kwant 2004-07-27 12:52:49 UTC
As I have already mentioned, all nodes (4) has exactly the same hardware configuration. Libs version are the same too - I install only one gentoo and replicate it to the remaining 3 nodes. So... there are no problems with hard/soft compatibility.

I've tested cluster with different kernels, but every time all kernels on each nodes are exactly (binary) the same.

Last tested kernel cause additionaly problem with hanging up one node. I've no logs/core from this crash - this was serious hang up, only hard reset help (keybord wasn't responding, I couldn't ping this computer).