This is specific to a system running under QEMU/KVM where I downgraded the processor emulated from Broadwell to SandyBridge (using qemu -cpu parameter) to allow migrating between EPYC and Intel Xeon v4 hardware. The VM with this behavior has been started on an AMD EPYC 7301 and is currently running on an Intel(R) Xeon(R) CPU E5-2697 v4 CPU. Although the running kernel has CONFIG_IA32_EMULATION=y (verified with /proc/configs.gz), trying to emerge glibc fails with "Illegal instruction". There is no CPU specific optimization (and never was) in /etc/portage/make.conf which is pretty much the default. I tried to reuse the glibc 32 bit elf test (gcc -m32 check-ia32-emulation.c -o check-ia32-emulation) and it indeed fails with an "Illegal instruction". I'm a bit puzzled by the situation, I suspect 2 possibilities : - something in the build chain is using an illegal instruction for reasons I can't imagine, - something in the kernel configuration prevents the test from succeeding. The later would be odd: the kernel didn't change since January according to the /boot timestamps and glibc was updated recently although it was at the time running on the system is was started on : no migration happened. What might be possible is that the cpu vendor_id would trigger the problem. QEMU doesn't seem to emulate the vendor_id when using the -cpu parameter and currently the kernel reports AuthenticAMD in /proc/cpuinfo (which is the CPU vendor_id for the host that started the VM neither the one the VM is currently running on or the one consistent with the CPU model). I fired up gdb on the core file generated by my attempt at reusing the glibc test and found the following (gdb> layout asm) : 0xf77d6c65 <__kernel_vsyscall+5> syscall I can make do without the 32bit ABI on this system (and I'm currently testing migrating to no-multilib on another system) but it seems there's something to be fixed here (in QEMU, my kernel configuration and/or the toolchain), running and migrating VMs in a mixed Intel/AMD cluster shouldn't prevent the 32bit ABI from working (and it may be the sign of other problems to come). Reproducible: Always Steps to Reproduce: 1. Install a Gentoo VM on QEMU with -cpu Broadwell on Intel hardware 2. Stop the VM 3. Restart it with -cpu SandyBridge on AMD hardware 4. Live migrate it to Intel hardware 5. emerge -1 glibc Actual Results: Failure with "Illegal instruction" Expected Results: glibc emerged == emerge --info == Portage 2.3.49 (python 3.6.5-final-0, default/linux/amd64/17.0, gcc-7.3.0, glibc-2.26-r7, 4.9.76-gentoo-r1 x86_64) ================================================================= System uname: Linux-4.9.76-gentoo-r1-x86_64-Intel_Xeon_E312xx_-Sandy_Bridge-with-gentoo-2.6 KiB Mem: 64431248 total, 4102432 free KiB Swap: 2097148 total, 2084972 free Timestamp of repository gentoo: Sun, 25 Nov 2018 07:00:01 +0000 Head commit of repository gentoo: f78b3b7f82c4de98c22ca940891804f4784f5cd4 sh bash 4.4_p12 ld GNU ld (Gentoo 2.30 p5) 2.30.0 app-shells/bash: 4.4_p12::gentoo dev-lang/perl: 5.24.3-r1::gentoo dev-lang/python: 2.7.15::gentoo, 3.6.5::gentoo dev-util/cmake: 3.9.6::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.38.2::gentoo sys-apps/sandbox: 2.13::gentoo sys-devel/autoconf: 2.69-r4::gentoo sys-devel/automake: 1.15.1-r2::gentoo sys-devel/binutils: 2.30-r4::gentoo sys-devel/gcc: 7.3.0-r3::gentoo sys-devel/gcc-config: 1.8-r1::gentoo sys-devel/libtool: 2.4.6-r3::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 4.13::gentoo (virtual/os-headers) sys-libs/glibc: 2.26-r7::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.gentoo.org/gentoo-portage priority: -1000 sync-rsync-verify-metamanifest: yes sync-rsync-verify-max-age: 24 sync-rsync-extra-opts: sync-rsync-verify-jobs: 1 x-portage location: /usr/local/portage masters: gentoo priority: 0 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="-j --load-average=8" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LANG="fr_FR.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j4 --load-average=6" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="acl amd64 bash-completion berkdb bzip2 cli crypt cxx dri gdbm iconv ipv6 libtirpc mmx multilib ncurses nls nptl openmp pam pcre readline rtc seccomp sse ssl threads unicode xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias proxy proxy_http" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-1" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby23 ruby24" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS == Processor related kernel configuration extracted from /proc/configs.gz == # Processor type and features # CONFIG_ZONE_DMA=y CONFIG_SMP=y CONFIG_X86_FEATURE_NAMES=y CONFIG_X86_FAST_FEATURE_TESTS=y CONFIG_X86_X2APIC=y # CONFIG_X86_MPPARSE is not set # CONFIG_GOLDFISH is not set # CONFIG_X86_EXTENDED_PLATFORM is not set # CONFIG_X86_INTEL_LPSS is not set # CONFIG_X86_AMD_PLATFORM_DEVICE is not set # CONFIG_IOSF_MBI is not set CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_HYPERVISOR_GUEST=y CONFIG_PARAVIRT=y # CONFIG_PARAVIRT_DEBUG is not set # CONFIG_PARAVIRT_SPINLOCKS is not set # CONFIG_XEN is not set CONFIG_KVM_GUEST=y # CONFIG_KVM_DEBUG_FS is not set # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set CONFIG_PARAVIRT_CLOCK=y CONFIG_NO_BOOTMEM=y # CONFIG_MK8 is not set # CONFIG_MPSC is not set CONFIG_MCORE2=y # CONFIG_MATOM is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_INTERNODE_CACHE_SHIFT=6 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_P6_NOP=y CONFIG_X86_TSC=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=64 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y # CONFIG_GART_IOMMU is not set # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_IOMMU_HELPER=y # CONFIG_MAXSMP is not set CONFIG_NR_CPUS=64 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set # CONFIG_X86_MCE is not set == /proc/cpuinfo extract (last processor) == processor : 15 vendor_id : AuthenticAMD cpu family : 6 model : 42 model name : Intel Xeon E312xx (Sandy Bridge) stepping : 1 microcode : 0x1000065 cpu MHz : 2199.876 cache size : 512 KB physical id : 1 siblings : 8 core id : 7 cpu cores : 8 apicid : 15 initial apicid : 15 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm nopl eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm cmp_legacy 3dnowprefetch vmmcall xsaveopt arat bugs : fxsave_leak sysret_ss_attrs null_seg bogomips : 4399.75 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
by default glibc attempts to infer best instruction set from runtime CPU at application startup. Does dmesg show you which instruction is attempted to be executed? gdb should should you what causes SIGILL.
(In reply to Sergei Trofimovich from comment #1) > by default glibc attempts to infer best instruction set from runtime CPU at > application startup. Does dmesg show you which instruction is attempted to > be executed? > > gdb should should you what causes SIGILL. As I wrote it reports this (gdb> layout asm) : 0xf77d6c65 <__kernel_vsyscall+5> syscal Nothing in dmesg. I thought about restarting qemu without detaching from the terminal in case it reports attempts to use unsupported opcodes but this is a production system which doesn't make it easy.
Then my guess would be that one of your host CPUs does not support SYSCALL emulation at least in 32-bit mode.
(In reply to Sergei Trofimovich from comment #3) > Then my guess would be that one of your host CPUs does not support SYSCALL > emulation at least in 32-bit mode. I started reading about SYSCALL emulation and this seems like a mess with several implementations according to which cpu vendor and model you are using. I've found one KVM patch addressing this : https://lists.ubuntu.com/archives/kernel-team/2012-March/018646.html. Extract : Depending on the architecture (AMD or Intel) pretended by guests, various checks according to vendor's documentation are implemented to overcome the current issue and behave like the CPUs physical counterparts. I'm not familiar with this subject at all but in my case the CPU identifies as AuthenticAMD although QEMU was asked to emulate SandyBridge (probably copying the vendor_id from the initial host before live migration). So this could be triggered because of a mismatch between the vendor_id QEMU reports and how it handles the syscall emulation. This would make it a QEMU bug (running qemu-2.10.0 on hosts, couldn't find any recent bugfix for syscall emulation). Should I report this to QEMU devs directly (I didn't find a relevant bug report on https://launchpad.net/qemu at first glance) or do you think there's another possibility ?
(In reply to Lionel Bouton from comment #4) > Should I report this to QEMU devs directly (I didn't find a relevant bug > report on https://launchpad.net/qemu at first glance) or do you think > there's another possibility ? Sounds like a good next step. I also suggest asking qemu upstream explicitly if it's a supported path to migrate like that. It might be that you need to pass more options to qemu (like, disable more host leaks) to make it work. Making vendorstring matching emulated sandybridge would make some sense on qemu side. Don't know if it's feasible or would fix your problem. Host kernel, guest kernel and guest userspace all can be subtly persisting initial VM CPU. CCing our qemu maintainers in case they might already know.
Closing as INVALID assuming qemu is able to change CPU features underneath running glibc/kernel.