One of the machines in our cluster crashes during boot, or very quickly afterwards, with the following oops: kernel BUG in skbuff.c:96! invalid operand:0000 CPU:0 EIP: 0010:[<c02da4bb>] Not tainted EFLAGS: 00010086 eax: 00000028 ebx: 00000000 ecx: 00000001 edx: 00000000 esi: 00000006 edi: f75c0280 ebp: f75c0280 esp: c03efc94 ds: 0018 es: 0018 ss: 0018 Process swapper: pid 0, stackpage=c03ef00 .. Code: 0f ab 60 00 99 75 38 co 83 c4 14 c3 89 f6 8d bc 27 00 00 00 <0> kernil panic, Aie, killing interrupt handler In interrupt handler: not syncing linux-2.4.24-openmosix-r7 did work fine. Reproducible: Always Steps to Reproduce: 1. 2. 3. # # openMosix # CONFIG_MOSIX=y # CONFIG_MOSIX_TOPOLOGY is not set CONFIG_MOSIX_SECUREPORTS=y CONFIG_MOSIX_DISCLOSURE=3 # CONFIG_MOSIX_FS is not set # CONFIG_MOSIX_PIPE_EXCEPTIONS is not set # CONFIG_MOSIX_NO_OOM is not set # CONFIG_MOSIX_LOADLIMIT is not set
can you please provide output of ksymoops for this oops?
If I run ksymoops, I get Reading Oops report from the terminal Can you tell me how exactly I should use it? I just copied the oops report by hand, because I couldn't find it in the syslog.
you should save whole oops message in separate file, then - just `cat oops.file | ksymoops` if you're able to reproduce oops, probably best way to catch whole oops message will be looking at `tail -f /var/log/messages` from remote thru ssh session
If I paste my partial handcopied oops, I get: >>EIP; c02da4bb <skb_over_panic+3b/50> <===== >>edi; f75c0280 <_end+37142db4/383e3b94> >>ebp; f75c0280 <_end+37142db4/383e3b94> >>esp; c03efc94 <init_task_union+1c94/2000> Code; c02da4bb <skb_over_panic+3b/50> 00000000 <_EIP>: Code; c02da4bb <skb_over_panic+3b/50> <===== 0: 0f ab 60 00 bts %esp,0x0(%eax) <===== Code; c02da4bf <skb_over_panic+3f/50> 4: 99 cltd Code; c02da4c0 <skb_over_panic+40/50> 5: 75 38 jne 3f <_EIP+0x3f> Is this sufficient info? The oops doesn't show up in /var/log/messages. Should I change something to syslog-ng?
you can try it without files/openmosix-sources-af_unix.patch too.
no, your syslog is ok. that's normal behaviour under oops/panics.
Without openmosix-sources-af_unix.patch I get the same behaviour... I think I'll try a vanilla 2.4.26-1, because we can't really afford to reboot our production cluster all the time.
is it still actual? any luck with 2.4.30?
still no feedback. closing it.
Sorry, didn't see the request for feedback. We won't be trying 2.4.30: the vanilla openmosix 2.4.26-1 works fine for us, and since we're running a production cluster, we can't use it to test gentoo patches.