Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 27615 - OpenMosix Segmentation Faults with emerge.
Summary: OpenMosix Segmentation Faults with emerge.
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High blocker (vote)
Assignee: Michael Imhof (RETIRED)
URL:
Whiteboard:
Keywords:
: 22736 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-08-30 12:40 UTC by Bill Cavalieri
Modified: 2003-10-06 15:35 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Cavalieri 2003-08-30 12:40:56 UTC
When running most emerge commands on a openmosix cluster I get segmentation
faults randomly (more often than not).  

I can offer ssh access to my cluster to the person working on the bug (it is a
test system, not production).

Reproducible: Always
Steps to Reproduce:
1. installed openmosix kernel, and 3 nodes
2. ran emerge sync or emerge <package>


Actual Results:  
Out of 11 attempts to run emerge -Up world, I got 10 segmentation faults, and
one success.

Expected Results:  
updated system if needed.

The cluster does work, when I do finally get emerge to work without segmentation
fault, I can see the process migrate as expected with openmosixview.

Using syscalltrack, here is a link to 11 log files, 10 failures and 1 success. 
I don't know if this will offer any insight, but thought it couldn't hurt. 
3mb's http://www.cavalieri.net/openmosix_log.zip

I have compiled the openmosix kernel with gcc 2.95.3, and rest of the system
with gcc 3.2.3.  Using the 2.4.20 and 2.4.21 versions of omosix kernel.

the openmosix user were compiled with the /usr/src/linux link pointing to the
omosix kernel.
Comment 1 Mikhail 2003-08-30 15:24:27 UTC
I'm having exactly the same issues on gentoo cluster (2.4.21-openmosix). I can confirm that 
emerge segfaults in 9 cases out of 10 if openmosix is running. 
Comment 2 Michael Imhof (RETIRED) gentoo-dev 2003-09-02 08:24:41 UTC
Are you aware of the following? (taken from openmosix.sf.net)

"What is openMosix?
openMosix is a Linux kernel extension for single-system image clustering.  This kernel extension turns a network of ordinary computers into a supercomputer for Linux applications."

That means that openMosix is designed for a cluster of homogenous nodes.
I assume that you are running an openMosix cluster that has nodes with different architectures (like p4, p3, xeon, athlon etc...)

In that case you start portage (by calling emerge) on one specific node. All code on that node is optimized for that machine (and it's capabilities).
When some of the processes generated by portage (like gcc calls etc) are migrated the following situation can happen:
Highly optimized code (e.g. with sse2 optimizations etc.) is executed on a node which is not having/supporting those optimizations --> it'll segfault.

I hope this helps and clarifies the current situation.
Comment 3 Michael Imhof (RETIRED) gentoo-dev 2003-09-03 05:18:53 UTC
*** Bug 22736 has been marked as a duplicate of this bug. ***
Comment 4 Bill Cavalieri 2003-09-03 12:18:04 UTC
tantive's response fixed my segmentation problems.  All the nodes in the cluster need to be the same processor arch, I had one p4 as a node in the cluster, and everything else was p3's.

Removed the p4, and segmentation fault's went away.
Comment 5 Mikhail 2003-09-03 13:49:03 UTC
I'm pretty familiar with OM and did not remove my segfaults. Gentoo cluster I'm running is 4-node 
cluster, all machines have the following: 
 
- PII 266Mhz 
- 64Mb RAM 
- Even the same hdd 
 
all of these were compiled with the same CFLAGS, all have the same optimizations, arch, and 
processor. 
 
I believe this is emerge specification (it does not like when it is balanced, I do not know though 
'why'). 
Comment 6 Charles Nadeau 2003-09-03 23:44:27 UTC
In /etc/make.conf, use "-mcpu=" instead of "-march=", you won't segfault anymore.
Comment 7 Robert Moss (RETIRED) gentoo-dev 2003-09-13 15:12:02 UTC
Might I just ask if you've tried setting the following in /etc/make.conf:

CBUILD="i686-pc-linux-gnu"
CHOST="i686-pc-linux-gnu"
CFLAGS="... -march=i686 ..."

Then recompiling your kernel, rebooting and doing an "emerge -e system && emerge -e world". Also, can I ask you if all the relevant hardware is running reliably? The *primary* cause of Gentoo segfaults, apart from user error, is hardware failure. I found that underclocking a P2 350 down to 333MHz stopped all my segfaults on my OM cluster.

Anyway, just a couple of suggestions; I don't like 'blocker' bugs!
Comment 8 Michael Imhof (RETIRED) gentoo-dev 2003-10-06 15:35:06 UTC
As i don't like blocker bugs, too...

... i'll close it now.