Merging the latest version of glibc with `emerge glibc` fails -- make spits out an "Illegal instruction" error. System is a dual Xeon machine, CFLAGS="- mcpu=i686 -O2 -pipe". Getting the source with `ebuild unpack` and then running configure and make manually works flawlessly, but `ebuild merge` after that fails, again with "Illegal instruction". Changing the CPU type to i586 doesn't seem to have an effect, other than the build process failing at a different spot. Same for changing the optimiziation flags (tried -O, -O2, -O3, no -O). Here is the error output: if test -r /var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5/buildhere/csu/abi- tag.h.new; then mv -f /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu/abi-tag.h.new /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu/abi-tag.h; \ else echo >&2 'This configuration not matched in ../abi-tags'; exit 1; fi /bin/sh ../scripts/move-if-change /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/gnu/lib-names.T /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/gnu/lib-names.h touch /var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5/buildhere/gnu/lib- names.stmp .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu .././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu make[2]: *** [/var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu/check_fds.d] Illegal instruction make[2]: *** Waiting for unfinished jobs.... make[2]: *** [/var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu/version.d] Illegal instruction make[2]: Leaving directory `/var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/csu' make[1]: *** [csu/subdir_lib] Error 2 make[1]: Leaving directory `/var/tmp/portage/glibc-2.2.5-r4/work/glibc-2.2.5' make: *** [all] Error 2 !!! ERROR: The ebuild did not complete successfully. !!! Function src_compile, Line 14, Exitcode 2 !!! (no error message) !!! emerge aborting on /usr/portage/sys-libs/glibc/glibc-2.2.5-r4.ebuild .
I'm going to start with questions: 1) any chance of faulty hardware? 2) if you try again compiling with i686 does it fail in the same place? 3) what compiler are you using? 4) what processor are you running on? 5) is there a chance that your gcc or other library is compiled for a processor that is not compatible with your hardware?
Thanks for responding so quickly. > > 1) any chance of faulty hardware? This is a brand new machine, and I haven't had the chance to test it with another, binary only distribution; I cannot completely rule out the possibility of faulty hardware. > 2) if you try again compiling with i686 does it fail in the same > place? No, I tried compiling twice, and it failed at the following two places: [.../glibc-2.2.5/buildhere/math/w_acoshf.d] Illegal instruction [.../glibc-2.2.5/buildhere/sysd-syscalls] Illegal instruction > 3) what compiler are you using? gcc version 2.95.3 20010315 (release) > 4) what processor are you running on? # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 1 model name : Intel(R) Xeon(TM) CPU 1.70GHz stepping : 2 cpu MHz : 1680.862 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dt s acpi mmx fxsr sse sse2 ss ht tm bogomips : 3355.44 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 1 model name : Intel(R) Xeon(TM) CPU 1.70GHz stepping : 2 cpu MHz : 1680.862 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dt s acpi mmx fxsr sse sse2 ss ht tm bogomips : 3355.44 > 5) is there a chance that your gcc or other library is compiled for a processor > that is not compatible with your hardware? > I've recompiled gcc, make, and binutils with -mcpu=i686 -O3. I'm going to compile them for i586 and let you know what the results are.
I recompiled gcc, binutils, make with CFLAGS="-mcpu=i586". It's interesting to note that the compilation of gcc initially failed with "Illegal instruction" (I had just unsuccessfully tried to build glibc several times), but it worked on the second attempt. Afterwards, building glibc with the same CFLAGS failed the first time. The error message this time was "mkdir: cannot create directory `buildhere/io': File exists", and then emerge aborted. I don't know why it decided to abort at that particular point, because I could see similar errors from mkdir prior to the `io' directory. The second time, glibc built successfully. I decided to recompile it, this time with CFLAGS="-mcpu=i686". Again, the build was successful. I decided to recompile it again, this time with CFLAGS="-mcpu=i686 -O3" (the original value). This time it failed with "Illegal instruction" in the `sys-dcalls' directory. Another attempt to compile, this time with CFLAGS=imcpu=i686 -O", failed again ("Illegal instruction" at `image//usr/lib/libanl.so'). So it appears that any optimization options fail the build process. I'm going to rebuild my gcc, make, binutils for i686 and see if I can still build glibc without -O.
Ok the only way this can really happen is if you have something like 3dnow instructions in your gcc that are being called sometimes, which of course your processor doesn't support... Otherwise you are looking at faulty memory or a faulty computer... carpaski@gentoo.org just suggested that your problem may be a 3dnow use flag being set, check your use flags to see that 3dnow is NOT set anywhere in make.globals or use.defaults or make.conf as that could also lead to this. (Unless another dev has a better idea)
I'm testing the memory with memtest86 now. Are there any other hardware testing programs that I might want to run?
Another memory tester... This one has some different testing concepts than memtest86. http://panic.et.tudelft.nl/~costar/memmxtest/ I know this is from the evil empire... but some PC99 hard drive test programs for WHQL certification. http://www.microsoft.com/hwdq/hwtest/devices/devicesub.asp?area=HD Many manufacturers also supply their own hard drive testing utilities... like IBM makes a "drive fitness test" and western digital has "wddiag" etc... In your first posting of the problem, this error... ".././scripts/mkinstalldirs /var/tmp/portage/glibc-2.2.5-r4/work/glibc- 2.2.5/buildhere/csu" repeated six times... (first try + 5 retries?) and then an "illegal instruction" Again, possibly pointing to hardware issues. Keep us posted with what you find.
1) I cannot see that having "3dnow" in USE can give problems with glibc, as not many (nearly sure none) of the base system use it. 2) Errors like this is usually caused by invalid "-march" or "-cpu" flags. 3) Caused by faulty hardware. 4) Or caused when linking with libs that was linked with instructions not supported by the current cpu (not 100% sure about this point). 5) I think I also got this when trying to compile glibc with -march=pentium4 and gcc-3.1. So it could be due to buggy "gcc arch instruction generation code" ... if you understood that, explain to me as well :P As an afterthough ... I reall do not know much about Xeon processors ... are they 100% x86 compadible ?
I tested my machine with memtest86 for about 23 hours, and I've been running memmxtest for a few hours now, without any error. Any other suggestions for hardware testing? Or testing for buggy GCC output? The thing I don't understand is that when I compile glibc manually (after doing `ebuild unpack'), everything works fine, but `ebuild merge' fails afterwards. What could be the reason for this? By the way, I'm willing to give you guys remote access to the machine, if that's going to be of any help.
I was perusing the Linux-Kernel mailing list archives, and it appears that someone with the exact same configuration as me was having "Illegal instruction" problems. The only workaround he mentions is to disable the second CPU. Here is a link to the post: http://www.uwsg.iu.edu/hypermail/linux/kernel/0202.2/1077.html
OK, so it appears that this problem is caused by a bug in the SMP code of Linux 2.4.18. Here's a patch that addresses this issue: http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week20/0681.html After applying this patch to my kernel source tree (2.4.18), I was successfully able to merge glibc with CFLAGS="-mcpu=i686 -O3".
Well, that explains a thing or two.
Hiya MJC .. mind having a look at this patch for your patches ? ( http://hypermail.idiosynkrasia.net/linux-kernel/archived/2002/week20/0681.html )
Fixed in upcoming mjc-sources.