Bug 671878

Summary:	sys-libs/glibc-2.27-r6 build failure : "Illegal instruction" testing 32bit in mixed AMD Intel virtualization environment
Product:	Gentoo Linux	Reporter:	Lionel Bouton <lionel-dev>
Component:	Current packages	Assignee:	Gentoo Toolchain Maintainers <toolchain>
Status:	RESOLVED INVALID
Severity:	normal	CC:	slyfox, tamiko, virtualization
Priority:	Normal
Version:	unspecified
Hardware:	AMD64
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---

Description Lionel Bouton 2018-11-25 15:34:37 UTC

This is specific to a system running under QEMU/KVM where I downgraded the processor emulated from Broadwell to SandyBridge (using qemu -cpu parameter) to allow migrating between EPYC and Intel Xeon v4 hardware.

The VM with this behavior has been started on an AMD EPYC 7301 and is currently running on an Intel(R) Xeon(R) CPU E5-2697 v4 CPU.

Although the running kernel has CONFIG_IA32_EMULATION=y (verified with /proc/configs.gz), trying to emerge glibc fails with "Illegal instruction". There is no CPU specific optimization (and never was) in /etc/portage/make.conf which is pretty much the default.

I tried to reuse the glibc 32 bit elf test (gcc -m32 check-ia32-emulation.c -o check-ia32-emulation) and it indeed fails with an "Illegal instruction". I'm a bit puzzled by the situation, I suspect 2 possibilities :
- something in the build chain is using an illegal instruction for reasons I can't imagine,
- something in the kernel configuration prevents the test from succeeding.

The later would be odd: the kernel didn't change since January according to the /boot timestamps and glibc was updated recently although it was at the time running on the system is was started on : no migration happened. What might be possible is that the cpu vendor_id would trigger the problem. QEMU doesn't seem to emulate the vendor_id when using the -cpu parameter and currently the kernel reports AuthenticAMD in /proc/cpuinfo (which is the CPU vendor_id for the host that started the VM neither the one the VM is currently running on or the one consistent with the CPU model).

I fired up gdb on the core file generated by my attempt at reusing the glibc test and found the following (gdb> layout asm) :

0xf77d6c65 <__kernel_vsyscall+5>        syscall

I can make do without the 32bit ABI on this system (and I'm currently testing migrating to no-multilib on another system) but it seems there's something to be fixed here (in QEMU, my kernel configuration and/or the toolchain), running and migrating VMs in a mixed Intel/AMD cluster shouldn't prevent the 32bit ABI from working (and it may be the sign of other problems to come).

Reproducible: Always

Steps to Reproduce:
1. Install a Gentoo VM on QEMU with -cpu Broadwell on Intel hardware
2. Stop the VM
3. Restart it with -cpu SandyBridge on AMD hardware
4. Live migrate it to Intel hardware
5. emerge -1 glibc
Actual Results:  
Failure with "Illegal instruction"

Expected Results:  
glibc emerged

== emerge --info ==

Portage 2.3.49 (python 3.6.5-final-0, default/linux/amd64/17.0, gcc-7.3.0, glibc-2.26-r7, 4.9.76-gentoo-r1 x86_64)                                                                                               
=================================================================
System uname: Linux-4.9.76-gentoo-r1-x86_64-Intel_Xeon_E312xx_-Sandy_Bridge-with-gentoo-2.6              KiB Mem:    64431248 total,   4102432 free                                                               KiB Swap:    2097148 total,   2084972 free                                                              
Timestamp of repository gentoo: Sun, 25 Nov 2018 07:00:01 +0000
Head commit of repository gentoo: f78b3b7f82c4de98c22ca940891804f4784f5cd4
sh bash 4.4_p12   
ld GNU ld (Gentoo 2.30 p5) 2.30.0
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.24.3-r1::gentoo
dev-lang/python:          2.7.15::gentoo, 3.6.5::gentoo
dev-util/cmake:           3.9.6::gentoo                                                                 
dev-util/pkgconfig:       0.29.2::gentoo                                                                 sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.38.2::gentoo
sys-apps/sandbox:         2.13::gentoo                                                                   sys-devel/autoconf:       2.69-r4::gentoo                                                                sys-devel/automake:       1.15.1-r2::gentoo                                                              sys-devel/binutils:       2.30-r4::gentoo                                                                sys-devel/gcc:            7.3.0-r3::gentoo                                                               sys-devel/gcc-config:     1.8-r1::gentoo                                                                 sys-devel/libtool:        2.4.6-r3::gentoo                                                              
sys-devel/make:           4.2.1-r4::gentoo                                                               sys-kernel/linux-headers: 4.13::gentoo (virtual/os-headers)                                              sys-libs/glibc:           2.26-r7::gentoo                                                                Repositories:                                                                                                                                                                                                     gentoo                                                                                                       location: /usr/portage                                                                                   sync-type: rsync                                                                                         sync-uri: rsync://rsync.gentoo.org/gentoo-portage                                                        priority: -1000                                                                                          sync-rsync-verify-metamanifest: yes                                                                      sync-rsync-verify-max-age: 24                                                                       
    sync-rsync-extra-opts:                                                                                   sync-rsync-verify-jobs: 1                                                                           

x-portage                               
    location: /usr/local/portage            
    masters: gentoo                                 
    priority: 0                                            
                                      
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"                                 
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"                                                 
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"        
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"                         
CXXFLAGS="-O2 -pipe"                                               
DISTDIR="/usr/portage/distfiles"                          
EMERGE_DEFAULT_OPTS="-j --load-average=8"                            
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"                        
FCFLAGS="-O2 -pipe"                                              
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"                     
FFLAGS="-O2 -pipe"                                             
GENTOO_MIRRORS="http://distfiles.gentoo.org"                              
LANG="fr_FR.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j4 --load-average=6"          
PKGDIR="/usr/portage/packages"             
PORTAGE_CONFIGROOT="/"                                 
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force
--whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"          
PORTAGE_TMPDIR="/var/tmp"               
USE="acl amd64 bash-completion berkdb bzip2 cli crypt cxx dri gdbm iconv ipv6 libtirpc mmx multilib ncurses nls nptl openmp pam pcre readline rtc seccomp sse ssl threads unicode xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers
include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias proxy proxy_http" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-1" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby23 ruby24" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"                                      
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS                                  

== Processor related kernel configuration extracted from /proc/configs.gz ==

# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_FEATURE_NAMES=y
CONFIG_X86_FAST_FEATURE_TESTS=y
CONFIG_X86_X2APIC=y
# CONFIG_X86_MPPARSE is not set
# CONFIG_GOLDFISH is not set
# CONFIG_X86_EXTENDED_PLATFORM is not set
# CONFIG_X86_INTEL_LPSS is not set
# CONFIG_X86_AMD_PLATFORM_DEVICE is not set
# CONFIG_IOSF_MBI is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_DEBUG is not set
# CONFIG_PARAVIRT_SPINLOCKS is not set
# CONFIG_XEN is not set
CONFIG_KVM_GUEST=y
# CONFIG_KVM_DEBUG_FS is not set
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y
CONFIG_NO_BOOTMEM=y
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
CONFIG_MCORE2=y
# CONFIG_MATOM is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_P6_NOP=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_GART_IOMMU is not set
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=64
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
# CONFIG_X86_MCE is not set

== /proc/cpuinfo extract (last processor) ==

processor       : 15
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 42
model name      : Intel Xeon E312xx (Sandy Bridge)
stepping        : 1
microcode       : 0x1000065
cpu MHz         : 2199.876
cache size      : 512 KB
physical id     : 1
siblings        : 8
core id         : 7
cpu cores       : 8
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm nopl eagerfpu pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm cmp_legacy 3dnowprefetch vmmcall xsaveopt arat
bugs            : fxsave_leak sysret_ss_attrs null_seg
bogomips        : 4399.75
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Comment 1 Sergei Trofimovich (RETIRED) gentoo-dev

2018-11-25 20:09:28 UTC

by default glibc attempts to infer best instruction set from runtime CPU at application startup. Does dmesg show you which instruction is attempted to be executed?

gdb should should you what causes SIGILL.

Comment 2 Lionel Bouton 2018-11-25 20:31:28 UTC

(In reply to Sergei Trofimovich from comment #1)
> by default glibc attempts to infer best instruction set from runtime CPU at
> application startup. Does dmesg show you which instruction is attempted to
> be executed?
> 
> gdb should should you what causes SIGILL.

As I wrote it reports this (gdb> layout asm) :

0xf77d6c65 <__kernel_vsyscall+5>        syscal

Nothing in dmesg. I thought about restarting qemu without detaching from the terminal in case it reports attempts to use unsupported opcodes but this is a production system which doesn't make it easy.

Comment 3 Sergei Trofimovich (RETIRED) gentoo-dev

2018-11-25 22:45:15 UTC

Then my guess would be that one of your host CPUs does not support SYSCALL emulation at least in 32-bit mode.

Comment 4 Lionel Bouton 2018-11-25 23:22:15 UTC

(In reply to Sergei Trofimovich from comment #3)
> Then my guess would be that one of your host CPUs does not support SYSCALL
> emulation at least in 32-bit mode.

I started reading about SYSCALL emulation and this seems like a mess with several implementations according to which cpu vendor and model you are using. I've found one KVM patch addressing this : https://lists.ubuntu.com/archives/kernel-team/2012-March/018646.html.

Extract :

Depending on the architecture (AMD or Intel) pretended by
guests, various checks according to vendor's documentation
are implemented to overcome the current issue and behave
like the CPUs physical counterparts.

I'm not familiar with this subject at all but in my case the CPU identifies as AuthenticAMD although QEMU was asked to emulate SandyBridge (probably copying the vendor_id from the initial host before live migration). So this could be triggered because of a mismatch between the vendor_id QEMU reports and how it handles the syscall emulation. This would make it a QEMU bug (running qemu-2.10.0 on hosts, couldn't find any recent bugfix for syscall emulation).

Should I report this to QEMU devs directly (I didn't find a relevant bug report on https://launchpad.net/qemu at first glance) or do you think there's another possibility ?

Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev

2018-11-26 08:11:02 UTC

(In reply to Lionel Bouton from comment #4)
> Should I report this to QEMU devs directly (I didn't find a relevant bug
> report on https://launchpad.net/qemu at first glance) or do you think
> there's another possibility ?

Sounds like a good next step.

I also suggest asking qemu upstream explicitly if it's a supported path to migrate like that. It might be that you need to pass more options to qemu (like, disable more host leaks) to make it work.

Making vendorstring matching emulated sandybridge would make some sense on qemu side. Don't know if it's feasible or would fix your problem. Host kernel, guest kernel and guest userspace all can be subtly persisting initial VM CPU.

CCing our qemu maintainers in case they might already know.

Comment 6 Sergei Trofimovich (RETIRED) gentoo-dev

2019-07-21 09:42:04 UTC

Closing as INVALID assuming qemu is able to change CPU features underneath running glibc/kernel.