Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 4897 - Add patch for broken SMP (was: Merging glibc-2.2.5-r4 fails with "Illegal instruction")
Summary: Add patch for broken SMP (was: Merging glibc-2.2.5-r4 fails with "Illegal ins...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: Highest critical (vote)
Assignee: Michael Cohen (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-07-11 22:14 UTC by Ivan Raikov
Modified: 2003-02-04 19:42 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Raikov 2002-07-11 22:14:29 UTC
Merging the latest version of glibc with `emerge glibc` fails -- make spits out 
an "Illegal instruction" error. System is a dual Xeon machine, CFLAGS="-
mcpu=i686 -O2 -pipe". Getting the source with `ebuild unpack` and then running 
configure and make manually works flawlessly, but `ebuild merge` after that 
fails, again with "Illegal instruction". Changing the CPU type to i586 doesn't 
seem to have an effect, other than the build process failing at a different 
spot. Same for changing the optimiziation flags (tried -O, -O2, -O3, no -O). 
Here is the error output:

if test -r /var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5/buildhere/csu/abi-
tag.h.new; then mv -f /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu/abi-tag.h.new /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu/abi-tag.h; \
else echo >&2 'This configuration not matched in ../abi-tags'; exit 1; fi
/bin/sh ../scripts/move-if-change /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/gnu/lib-names.T /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/gnu/lib-names.h
touch /var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5/buildhere/gnu/lib-
names.stmp
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
.././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu
make[2]: *** [/var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu/check_fds.d] Illegal instruction
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [/var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu/version.d] Illegal instruction
make[2]: Leaving directory `/var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/csu'
make[1]: *** [csu/subdir_lib] Error 2
make[1]: Leaving directory `/var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5'
make: *** [all] Error 2

!!! ERROR: The ebuild did not complete successfully.
!!! Function src_compile, Line 14, Exitcode 2
!!! (no error message)

!!! emerge aborting on  /usr/portage/sys-libs/glibc/glibc-2.2.5-r4.ebuild .
Comment 1 Brandon Low (RETIRED) gentoo-dev 2002-07-11 23:48:56 UTC
I'm going to start with questions:  

1) any chance of faulty hardware?
2) if you try again compiling with i686 does it fail in the same place?
3) what compiler are you using?
4) what processor are you running on?
5) is there a chance that your gcc or other library is compiled for a processor
that is not compatible with your hardware?
Comment 2 Ivan Raikov 2002-07-12 07:49:48 UTC
Thanks for responding so quickly.

> 
> 1) any chance of faulty hardware?


        This is a brand new machine,  and I haven't had the chance to
test it with another, binary only distribution; I cannot completely
rule out the possibility of faulty hardware. 

> 2) if you try again compiling with i686 does it fail in the same
> place?

        No, I tried compiling twice, and it failed at the following
two places: 

[.../glibc-2.2.5/buildhere/math/w_acoshf.d] Illegal instruction
[.../glibc-2.2.5/buildhere/sysd-syscalls] Illegal instruction


> 3) what compiler are you using?

gcc version 2.95.3 20010315 (release)

> 4) what processor are you running on?

# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.70GHz
stepping        : 2
cpu MHz         : 1680.862
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dt   
s acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3355.44

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Xeon(TM) CPU 1.70GHz
stepping        : 2
cpu MHz         : 1680.862
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dt   
s acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3355.44



> 5) is there a chance that your gcc or other library is compiled for a processor
> that is not compatible with your hardware?
> 

        I've recompiled gcc, make, and binutils with -mcpu=i686
-O3. I'm going to compile them for i586 and let you know what the 
results are.
Comment 3 Ivan Raikov 2002-07-12 09:21:50 UTC
I recompiled gcc, binutils, make with CFLAGS="-mcpu=i586". It's interesting to
note that the compilation of gcc initially failed with "Illegal instruction" (I
had just unsuccessfully tried to build glibc several times), but it worked on
the second attempt.

Afterwards, building glibc with the same CFLAGS failed the first time. The error
message this time was "mkdir: cannot create directory `buildhere/io': File
exists", and then emerge aborted. I don't know why it decided to abort at that
particular point, because I could see similar errors from mkdir prior to the
`io' directory.

The second time, glibc built successfully. I decided to recompile it, this time
with CFLAGS="-mcpu=i686". Again, the build was successful. I decided to
recompile it again, this time with CFLAGS="-mcpu=i686 -O3" (the original value). 
This time it failed with "Illegal instruction" in the `sys-dcalls' directory.
Another attempt to compile, this time with CFLAGS=imcpu=i686 -O", failed again
("Illegal instruction" at `image//usr/lib/libanl.so'). So it appears that any
optimization options fail the build process. I'm going to rebuild my gcc, make,
binutils for i686 and see if I can still build glibc without -O.
Comment 4 Brandon Low (RETIRED) gentoo-dev 2002-07-12 10:22:31 UTC
Ok the only way this can really happen is if you have something like 3dnow
instructions in your gcc that are being called sometimes, which of course your
processor doesn't support... Otherwise you are looking at faulty memory or a
faulty computer... 

carpaski@gentoo.org just suggested that your problem may be a 3dnow use flag
being set, check your use flags to see that 3dnow is NOT set anywhere in
make.globals or use.defaults or make.conf as that could also lead to this.

(Unless another dev has a better idea)
Comment 5 Ivan Raikov 2002-07-12 10:42:22 UTC
I'm testing the memory with memtest86 now. Are there any other hardware testing
programs that I might want to run?
Comment 6 Nick Hadaway 2002-07-12 11:40:48 UTC
Another memory tester... This one has some different testing concepts than 
memtest86.
http://panic.et.tudelft.nl/~costar/memmxtest/

I know this is from the evil empire... but some PC99 hard drive test programs 
for WHQL certification.
http://www.microsoft.com/hwdq/hwtest/devices/devicesub.asp?area=HD

Many manufacturers also supply their own hard drive testing utilities... like 
IBM makes a "drive fitness test" and western digital has "wddiag" etc...

In your first posting of the problem, this error...
".././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc-
2.2.5/buildhere/csu"
repeated six times... (first try + 5 retries?) and then an "illegal instruction"
Again, possibly pointing to hardware issues.

Keep us posted with what you find.
Comment 7 Martin Schlemmer (RETIRED) gentoo-dev 2002-07-12 14:01:34 UTC
1)  I cannot see that having "3dnow" in USE can give problems with glibc, as
    not many (nearly sure none) of the base system use it.

2)  Errors like this is usually caused by invalid "-march" or "-cpu" flags.

3)  Caused by faulty hardware.

4)  Or caused when linking with libs that was linked with instructions
    not supported by the current cpu (not 100% sure about this point).

5)  I think I also got this when trying to compile glibc with -march=pentium4
    and gcc-3.1.  So it could be due to buggy "gcc arch instruction generation
    code" ... if you understood that, explain to me as well :P

As an afterthough ... I reall do not know much about Xeon processors ... are
they 100% x86 compadible ?
Comment 8 Ivan Raikov 2002-07-13 12:47:47 UTC
I tested my machine with memtest86 for about 23 hours, and I've been running
memmxtest for a few hours now, without any error. Any other suggestions for
hardware testing? Or testing for buggy GCC output?

The thing I don't understand is that when I compile glibc manually (after doing
`ebuild unpack'), everything works fine, but `ebuild merge' fails afterwards.
What could be the reason for this? By the way, I'm willing to give you guys
remote access to the machine, if that's going to be of any help.
Comment 9 Ivan Raikov 2002-07-13 15:29:15 UTC
I was perusing the Linux-Kernel mailing list archives, and it appears that
someone with the exact same configuration as me was having "Illegal instruction"
problems. The only workaround he mentions is to disable the second CPU. Here is
a link to the post:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0202.2/1077.html

Comment 10 Ivan Raikov 2002-07-13 17:02:39 UTC
OK, so it appears that this problem is caused by a bug in the SMP code of Linux
2.4.18. Here's a patch that addresses this issue:

http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week20/0681.html


After applying this patch to my kernel source tree (2.4.18), I was successfully
able to merge glibc with CFLAGS="-mcpu=i686 -O3".
Comment 11 Martin Schlemmer (RETIRED) gentoo-dev 2002-07-13 18:47:39 UTC
Well, that explains a thing or two.
Comment 12 Martin Schlemmer (RETIRED) gentoo-dev 2002-07-13 19:35:29 UTC
Hiya MJC .. mind having a look at this patch for your patches ?

( http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week20/0681.html )
Comment 13 Michael Cohen (RETIRED) gentoo-dev 2002-07-16 21:44:09 UTC
Fixed in upcoming mjc-sources.