Created attachment 450752 [details] kern.log I just upgraded hardened-sources (stable) to latest 4.7.6 release. On my HP ProLiant Server, the server boots and at the moment where I try to start a KVM guest, the kernel panics (See kern.log as attached) Falling back to 4.4.8-hardened-r1
Created attachment 450754 [details] lspci -vvxxx
Created attachment 450756 [details] dmidecode
Created attachment 450758 [details] Kernel configuration
The following info is for the "running" kernel! emerge --info Portage 2.3.0 (python 2.7.10-final-0, hardened/linux/amd64/no-multilib, gcc-4.9.3, glibc-2.22-r4, 4.4.8-hardened-r1 x86_64) ================================================================= System uname: Linux-4.4.8-hardened-r1-x86_64-Intel-R-_Xeon-R-_CPU_L5640_@_2.27GHz-with-gentoo-2.2 KiB Mem: 49452540 total, 22810492 free KiB Swap: 16777212 total, 16777212 free Timestamp of repository gentoo: Tue, 18 Oct 2016 21:15:01 +0000 sh bash 4.3_p48 ld GNU ld (Gentoo 2.25.1 p1.1) 2.25.1 ccache version 3.2.4 [enabled] app-shells/bash: 4.3_p48::gentoo dev-lang/perl: 5.22.2::gentoo dev-lang/python: 2.7.10-r1::gentoo, 3.4.3-r1::gentoo dev-util/ccache: 3.2.4::gentoo dev-util/cmake: 3.5.2-r1::gentoo dev-util/pkgconfig: 0.28-r2::gentoo sys-apps/baselayout: 2.2::gentoo sys-apps/openrc: 0.21.7::gentoo sys-apps/sandbox: 2.10-r1::gentoo sys-devel/autoconf: 2.69::gentoo sys-devel/automake: 1.14.1::gentoo, 1.15::gentoo sys-devel/binutils: 2.25.1-r1::gentoo sys-devel/gcc: 4.9.3::gentoo sys-devel/gcc-config: 1.7.3::gentoo sys-devel/libtool: 2.4.6::gentoo sys-devel/make: 4.1-r1::gentoo sys-kernel/linux-headers: 4.3::gentoo (virtual/os-headers) sys-libs/glibc: 2.22-r4::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage priority: -1000 croessner location: /usr/local/portage masters: gentoo priority: 0 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/easy-rsa /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--keep-going --with-bdeps=y --binpkg-respect-use=y --binpkg-changed-deps=y --usepkg=y --rebuilt-binaries=y --rebuilt-binaries-timestamp=20140405050000" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs ccache compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://de-mirror.org/gentoo/ rsync://de-mirror.org/gentoo/" LANG="en_US.UTF-8" LC_ALL="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j25" PKGDIR="/export/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="acl adns aio amd64 bacula-clientonly bacula-console bash-completion berkdb bindist btrfs bzip2 caps cli cracklib crypt curl cxx device-mapper dri gdbm hardened iconv ipv6 justify logrotate loop-aes lzo mmap mmx mmxext modules ncurses nls nptl nscd ntp openmp openssl pam pax_kernel pcre pie readline seccomp session sse sse2 ssl ssp tcpd threads unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog aggregation cgroups contextswitch cpu cpufreq curl curl_json curl_xml disk email entropy ethstat exec filecount fscache hddtemp ipmi iptables logfile log_logstash multimeter netlink network nfs nginx ntpd numa openvpn ping processes protocols python sensors snmp uptime users uuid virt" CPU_FLAGS_X86="mmx sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" L10N="de en" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="de en" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 i386" QEMU_USER_TARGETS="x86_64 i386" RUBY_TARGETS="ruby20 ruby21" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Thanks for the report. I try my best to check many situations but I can't check everything. I didn't test for this scenario before stabilizing. I'll let the grsec/pax people know.
1. can you try a vanilla kernel? 2. what's the qemu command line you use to start the vm?
(In reply to PaX Team from comment #6) > 1. can you try a vanilla kernel? I will send feedback after this comment. > 2. what's the qemu command line you use to start the vm? /usr/bin/qemu-system-x86_64 -name guest=db.roessner-net.de,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-db.roessner-net.de/master-key.aes -machine pc-i440fx-2.5,accel=kvm,usb=off,vmport=off -cpu Westmere,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+pclmuldq,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+dca,+arat,+pdpe1gb,+rdtscp -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 3e7575b2-266d-497f-b933-9c1ddaa24cf1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-db.roessner-net.de/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive if=none,id=drive-ide0-0-1,readonly=on -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive file=/var/lib/libvirt/images/db.roessner-net.de.img,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:17:02:30,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/domain-1-db.roessner-net.de/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x9 -watchdog-action reset -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on It is started with libvirt virsh start db.roessner-net.de The XML looks like this: <domain type='kvm'> [42/82] <name>db.roessner-net.de</name> <uuid>3e7575b2-266d-497f-b933-9c1ddaa24cf1</uuid> <memory unit='KiB'>1048576</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>2</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type> <bootmenu enable='yes'/> </os> <features> <acpi/> <apic/> <vmport state='off'/> </features> <cpu mode='host-model'> <model fallback='allow'/> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdb' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/db.roessner-net.de.img'/> <target dev='vda' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='bridge'> <mac address='52:54:00:17:02:30'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <channel type='unix'> <source mode='bind'/> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='spice' autoport='yes'> <listen type='address'/> </graphics> <video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <redirdev bus='usb' type='spicevmc'> </redirdev> <redirdev bus='usb' type='spicevmc'> </redirdev> <watchdog model='i6300esb' action='reset'> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </watchdog> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/random</backend> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </rng> </devices> </domain>
(In reply to PaX Team from comment #6) > 1. can you try a vanilla kernel? Tested right now: =vanilla-sources-4.7.8 is working perfectly
It works for me with qemu-2.5.1
(In reply to Agostino Sarubbo from comment #9) > It works for me with qemu-2.5.1 I use the stable version 2.7.0-r4, which does not work. Older versions are no more available. Also latest stable libvirt or in short: My system is up-to-date with stable packages ;-)
I was suggested by PaX Team: linux-grsec-4.7.7 locks up within 30 minutes https://forums.grsecurity.net/viewtopic.php?f=3&t=4586&start=15#p16689 that the bug I encountered and reported here: banning user... until system restart for ... kernel crash w/ Qemu https://forums.grsecurity.net/viewtopic.php?f=3&t=4593 was this one at this page. There are a few Call Traces that I posted there, if they can be of any help.
Just rsync'd and tried 4.7.8-hardened. Same crash (just no Call Trace, log corrupted instead).
Actually 4.7.5 crashes like the .6 and .7 if compiled with the .config of the 4.7.7 of mine that crashed. Here my two configs of 4.7.5 with the localvesion -160929 and the localversion -161022: $ ls -lABRgo /mnt/sr0/config-4.7.5-hardened-16*[2-9] -rw-r--r-- 1 117369 2016-09-29 17:17 /mnt/sr0/config-4.7.5-hardened-160929 -rw-r--r-- 1 120012 2016-10-22 16:34 /mnt/sr0/config-4.7.5-hardened-161022 $ The older runs kvm guests fine, the newer crashes like shown. The diff btwn the two: $ diff /mnt/sr0/config-4.7.5-hardened-16*[2-9] 56c56 < CONFIG_LOCALVERSION="-160929" --- > CONFIG_LOCALVERSION="-161022" 90a91 > # CONFIG_IRQ_DOMAIN_DEBUG is not set 146c147,161 < # CONFIG_CGROUPS is not set --- > CONFIG_CGROUPS=y > CONFIG_PAGE_COUNTER=y > CONFIG_MEMCG=y > CONFIG_MEMCG_SWAP=y > CONFIG_MEMCG_SWAP_ENABLED=y > # CONFIG_BLK_CGROUP is not set > # CONFIG_CGROUP_SCHED is not set > # CONFIG_CGROUP_PIDS is not set > # CONFIG_CGROUP_FREEZER is not set > # CONFIG_CGROUP_HUGETLB is not set > # CONFIG_CPUSETS is not set > # CONFIG_CGROUP_DEVICE is not set > # CONFIG_CGROUP_CPUACCT is not set > # CONFIG_CGROUP_PERF is not set > # CONFIG_CGROUP_DEBUG is not set 281a297 > # CONFIG_GCOV_KERNEL is not set 367a384 > # CONFIG_IOSF_MBI_DEBUG is not set 570a588 > # CONFIG_ACPI_CUSTOM_METHOD is not set 709a728 > CONFIG_NET_EGRESS=y 844c863,864 < # CONFIG_NETFILTER_NETLINK_GLUE_CT is not set --- > CONFIG_NF_CT_NETLINK_HELPER=y > CONFIG_NETFILTER_NETLINK_GLUE_CT=y 884,885c904,905 < # CONFIG_NETFILTER_XT_CONNMARK is not set < # CONFIG_NETFILTER_XT_SET is not set --- > CONFIG_NETFILTER_XT_CONNMARK=y > CONFIG_NETFILTER_XT_SET=y 890c910 < # CONFIG_NETFILTER_XT_TARGET_CHECKSUM is not set --- > CONFIG_NETFILTER_XT_TARGET_CHECKSUM=y 920a941 > # CONFIG_NETFILTER_XT_MATCH_CGROUP is not set 950c971 < # CONFIG_NETFILTER_XT_MATCH_PHYSDEV is not set --- > CONFIG_NETFILTER_XT_MATCH_PHYSDEV=y 1083c1104 < # CONFIG_BRIDGE_EBT_MARK_T is not set --- > CONFIG_BRIDGE_EBT_MARK_T=y 1109c1130,1194 < # CONFIG_NET_SCHED is not set --- > CONFIG_NET_SCHED=y > > # > # Queueing/Scheduling > # > # CONFIG_NET_SCH_CBQ is not set > CONFIG_NET_SCH_HTB=y > # CONFIG_NET_SCH_HFSC is not set > # CONFIG_NET_SCH_PRIO is not set > # CONFIG_NET_SCH_MULTIQ is not set > # CONFIG_NET_SCH_RED is not set > # CONFIG_NET_SCH_SFB is not set > CONFIG_NET_SCH_SFQ=y > # CONFIG_NET_SCH_TEQL is not set > # CONFIG_NET_SCH_TBF is not set > # CONFIG_NET_SCH_GRED is not set > # CONFIG_NET_SCH_DSMARK is not set > # CONFIG_NET_SCH_NETEM is not set > # CONFIG_NET_SCH_DRR is not set > # CONFIG_NET_SCH_MQPRIO is not set > # CONFIG_NET_SCH_CHOKE is not set > # CONFIG_NET_SCH_QFQ is not set > # CONFIG_NET_SCH_CODEL is not set > # CONFIG_NET_SCH_FQ_CODEL is not set > # CONFIG_NET_SCH_FQ is not set > # CONFIG_NET_SCH_HHF is not set > # CONFIG_NET_SCH_PIE is not set > CONFIG_NET_SCH_INGRESS=y > # CONFIG_NET_SCH_PLUG is not set > > # > # Classification > # > CONFIG_NET_CLS=y > # CONFIG_NET_CLS_BASIC is not set > # CONFIG_NET_CLS_TCINDEX is not set > # CONFIG_NET_CLS_ROUTE4 is not set > CONFIG_NET_CLS_FW=y > CONFIG_NET_CLS_U32=y > # CONFIG_CLS_U32_PERF is not set > # CONFIG_CLS_U32_MARK is not set > # CONFIG_NET_CLS_RSVP is not set > # CONFIG_NET_CLS_RSVP6 is not set > # CONFIG_NET_CLS_FLOW is not set > # CONFIG_NET_CLS_CGROUP is not set > # CONFIG_NET_CLS_BPF is not set > # CONFIG_NET_CLS_FLOWER is not set > # CONFIG_NET_EMATCH is not set > CONFIG_NET_CLS_ACT=y > CONFIG_NET_ACT_POLICE=y > CONFIG_NET_ACT_GACT=y > # CONFIG_GACT_PROB is not set > # CONFIG_NET_ACT_MIRRED is not set > # CONFIG_NET_ACT_IPT is not set > # CONFIG_NET_ACT_NAT is not set > # CONFIG_NET_ACT_PEDIT is not set > # CONFIG_NET_ACT_SIMP is not set > # CONFIG_NET_ACT_SKBEDIT is not set > # CONFIG_NET_ACT_CSUM is not set > CONFIG_NET_ACT_VLAN=y > # CONFIG_NET_ACT_BPF is not set > # CONFIG_NET_ACT_CONNMARK is not set > # CONFIG_NET_ACT_IFE is not set > # CONFIG_NET_CLS_IND is not set > CONFIG_NET_SCH_FIFO=y 1123a1209,1210 > # CONFIG_CGROUP_NET_PRIO is not set > # CONFIG_CGROUP_NET_CLASSID is not set 1476a1564 > # CONFIG_IFB is not set 1478c1566,1567 < # CONFIG_MACVLAN is not set --- > CONFIG_MACVLAN=y > CONFIG_MACVTAP=y 1539a1629 > # CONFIG_SKY2_DEBUG is not set 3509a3600 > # CONFIG_MCE_AMD_INJ is not set 3699a3791 > # CONFIG_NFSD_FAULT_INJECTION is not set 3786a3879 > # CONFIG_DYNAMIC_DEBUG is not set 3799c3892 < # CONFIG_DEBUG_FS is not set --- > CONFIG_DEBUG_FS=y 3919a4013 > # CONFIG_LKDTM is not set 3973a4068 > # CONFIG_DEBUG_BOOT_PARAMS is not set $ Regards! Miroslav Rovis https://www.CroatiaFidelis.hr
Created attachment 451046 [details, diff] diff btwn old no-crash 4.7.5 and new crashing 4.7.5 I should have attached that diff... Correcting that now.
Created attachment 451082 [details] syslog Call Trace 4.7.9-hardened For clarity I attach the Call Trace of the syslog with 4.7.9-hardened kernel. Fallback to 4.4.8-hardened-r1 works here too, with all the netfilter advanced frills, with libvirt and all.
i tried to reproduce this without success so let me ask you guys for a few more tests. note that we don't touch any related code and the irqfd list handling itself is simple enough that i don't see how it would be wrong so my guess is that there's probably a higher level race and/or use-after-free condition somewhere that clears the irqfds.items list pointers to NULL (which isn't a valid state even for an otherwise empty list, hence the oops/NULL-deref). so the tests: 1. try to disable SANITIZE 2. try to disable everything in grsec (but still patch it in) 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new feature, imitates a subset of SANITIZE) with and then without PAGE_POISONING_ZERO as well. these tests will hopefully narrow the problem down a bit. also if you can come up with a simpler reproducer than what Christian and Miro posted, i'd like to know.
(In reply to PaX Team from comment #16) > 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new > feature, imitates a subset of SANITIZE) with and then without > PAGE_POISONING_ZERO as well. for the above tests you'll also have to pass page_poison=on on the kernel command line to actually activate poisoning.
(In reply to PaX Team from comment #17) > (In reply to PaX Team from comment #16) > > 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new > > feature, imitates a subset of SANITIZE) with and then without > > PAGE_POISONING_ZERO as well. > for the above tests you'll also have to pass page_poison=on on the kernel > command line to actually activate poisoning. I'll do what I can to try and do the tests you suggest. In slow time, because I'm not advanced enough an my systems are all slow (and for other reasons), but I'll be working at this. If anyone else can do it, pls. do, because it is not likely that it will be soon (great luck if small number of hours, more likely 10-20 hours, can't tell) that I can come up with the results with any of the tests, due to the above.
Created attachment 451296 [details] The emerge--info.txt , complete, as root. Also for later. First test (no SANITIZE) done. Kernel config and Call Trace should follow.
Created attachment 451298 [details] config of 4.7.9-hardened w/o SANITIZE The config of 4.7.9-hardened of 16-10-24 at 9h, complete.
Created attachment 451300 [details] syslog w/ Call Trace The messages_161024_1110_g5n when the Call Trace happened, at 11:10. I did attempt exacly as in my previous tries. I.e.: to follow the https://wiki.gentoo.org/wiki/QEMU/Linux_guest guide. And it happened just like most of the last times. Next, I'll try the test no 2) that PaX Team suggested (disable everything in grsec (but still patch it in)). However, while I taught beginners to grsec-patch vanilla kernel in Debian Forums, I have never yet patched a hardened-sources ;-) The big boys always did it for me... If I get in trouble, maybe you could, blueness and swift, tell the Forum people to let me in the forums, so I can ask for help (I'm still banned since a few months ago)? Could you? Regards! --- Miroslav Rovis Zagreb, Croatia http://www.CroatiaFidelis.hr Try refute: [url=http://www.crmbuyer.com/story/39565.html]rootkit hooks in kernel[/url], [url=https://forums.grsecurity.net/viewtopic.php?f=7&t=2522]linux capabilities for intrusion[/url]? (Linus?)
I thought about this: > However, while I taught beginners to grsec-patch vanilla kernel in Debian > Forums, I have never yet patched a hardened-sources ;-) The big boys always > did it for me... If I get in trouble, maybe you could, blueness and swift, > tell the Forum people to let me in the forums, so I can ask for help (I'm > still banned since a few months ago)? Could you? and I figured out I do not need to go any special ebuild rewriting or somesuch way. But use the same kernel source, in which the grsec is already patched in (looked up the build logs in portage/logs of hardened-sources-4.7.9 just now), and only disable in it everything in grsec. So I'll be doing the test 2) now.
Created attachment 451310 [details, diff] config-4.7.9-hardened-161024_12.diff This is the diff btwn the: config-4.7.9-hardened-161024_09 (posted complete, 2-3 or so comments above) and: config-4.7.9-hardened-161024_12 (being compiled as I write) PaX Team, if that is not quite what you meant, pls. do tell! Once it compiles, I'll reboot into it and run same commands as quite a few times previously by now. If anyone can think of a better way to test for this bug, pls. tell!
(In reply to miro.rovis from comment #23) > This is the diff btwn the: > > config-4.7.9-hardened-161024_09 (posted complete, 2-3 or so comments above) > and: > > config-4.7.9-hardened-161024_12 (being compiled as I write) > > PaX Team, if that is not quite what you meant, pls. do tell! this diff is somewhat confusing, can you try 'diff -u old_file new_file' next time please?
(In reply to PaX Team from comment #24) > this diff is somewhat confusing, can you try 'diff -u old_file new_file' > next time please? I will. I'll attach the results (qemu booted fine). Next.
Created attachment 451314 [details] messages_161024_1325_g5n These are the messages rougly during the time of the qemu command run. The Linux_guest wiki page test went fine. I booted into guest gentoo minimal amd64 install CD, and rebooted and quit. All with the same commands as previously. Here are the logs. With the ' port 0 '. And it's all happening in the Air-Gapped that never sees online (I still got to add more to the topis on grsec Forums about it, but I guess it's unrelated to here). Next I'll try the test 3) that you asked for.
(In reply to PaX Team from comment #16) > 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new > feature, imitates a subset of SANITIZE) with and then without > PAGE_POISONING_ZERO as well. Does the order matter? I've just compiled (but not yet booted into) 4.8.3, but without PAGE_POISONING. config-4.8.3-161024_14 And I'm now compiling the config-4.8.3-161024_1430 with PAGE_POISONING and PAGE_POISONING_ZERO. I'll give the diff next. If the order matters, which one do I run first? I'll try, in some 15-20 minutes that it takes to compile, the with PAGE_POISONING and PAGE_POISONING_ZERO first.
Created attachment 451318 [details, diff] iff config-4.8.3-161024_14 config-4.8.3-161024_1430
(In reply to miro.rovis from comment #26) > These are the messages rougly during the time of the qemu command run. > > The Linux_guest wiki page test went fine. I booted into guest gentoo minimal > amd64 install CD, and rebooted and quit. All with the same commands as > previously. so it means that it's one of the grsecurity features that triggers the oops. this is good news because now you can do a binary search on these options to find out which one.
(In reply to miro.rovis from comment #27) > (In reply to PaX Team from comment #16) > > 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new > > feature, imitates a subset of SANITIZE) with and then without > > PAGE_POISONING_ZERO as well. > Does the order matter? I've just compiled (but not yet booted into) 4.8.3, > but without PAGE_POISONING. no, you can try them in any order, they're independent tests.
Went for the with PAGE_POISONING (and _ZERO). I was just about expecting all to go fine, because the qemu script from the wiki page ran fine, booted the install-gentoo-minimal-somthing.iso and rebooted and quit, but then upon issuing: # shutdown -r 0 to reboot and try the without both PAGE_POISONING (and _ZERO), and there the panic! Since it is not guarranteed that it will be in the system log, I had better manually copy it (my cellphone with the camera is broken). In the next comment, or I'll make it an attachment. In short, there is no string "kvm_irqfd_release" there, and neither kvm on its own, but it is to do with my old Hauppauge HVR3000 Hybrid TV card, because there are cx8800 strings... Useful to manually type and post an attachment with likely pretty accurate Call Trace? Or not needed?
Created attachment 451322 [details] config-4.8.3-161024_1430_CallTrace.txt A few lines, at the start should be uppercase to lowercase and lowercase to uppercase. This is not necessarily the final takedown of it, if important parts are not clear enough. So, PaX Team, pls. do tell me, do I need to go again through all of the screen (eyes huring a bit, buy I will if it is necessary) to check on exactly which lines? The screen will be waiting for your reply. Not rebooting till then.
(In reply to miro.rovis from comment #32) > So, PaX Team, pls. do tell me, do I need to go again through all of the > screen (eyes huring a bit, buy I will if it is necessary) to check on > exactly which lines? it's a different problem that you should perhaps report to kernel devs. so for now i'd suggest to bisect grsec options to find which one causes the irqfd oops.
(In reply to PaX Team from comment #33) > (In reply to miro.rovis from comment #32) > > So, PaX Team, pls. do tell me, do I need to go again through all of the > > screen (eyes huring a bit, buy I will if it is necessary) to check on > > exactly which lines? > it's a different problem that you should perhaps report to kernel devs. I see. Hmmmh... Anyway, first I'll post the corrected version which I already prepared, of the Call Trace (in case that it isn't in the logs). Next. > so > for now i'd suggest to bisect grsec options to find which one causes the > irqfd oops. And then, how do I do that? Using something like half the options to see if the problem is in that half, and go on like that (I looked up "binary search" in duckduckgo.com)?
Created attachment 451328 [details] config-4.8.3-161024_1430_CallTrace_v2.txt This will be the final version in case the Call Trace is not to be found in the syslog.
(In reply to miro.rovis from comment #34) > (In reply to PaX Team from comment #33) > > (In reply to miro.rovis from comment #32) > > so > > for now i'd suggest to bisect grsec options to find which one causes the > > irqfd oops. > And then, how do I do that? Using something like half the options to see if > the problem is in that half, and go on like that (I looked up "binary > search" in duckduckgo.com)? I've slept very little last night. I'll probably be off dozing away soon. Else I'm sick for certain. And also, I need to study Qemu and its options first, to make reasonable command lines out of the few that are there. So I need time now. If anyone else is reading here and wants to step in and try, great! Ah, first I'll go once without PAGE_POISONING, to finish the proposed test, and post it.
I booted into 4.8.3 vanilla compiled without PAGE_POISONING, and I saw the same, well, minor differences, but lots of hex numbers were the same, [I saw the same] Call Trace as with 4.8.3 vanilla compiled with PAGE_POISONING. Sure the main difference was this one was localversion 14, not 1430. I was right to take it down manually. There is nothing in the system log, for none of the with/without PAGE_POISONING vanilla kernel tests. Tests done. Need time now.
Created attachment 451342 [details] GentooVM To make (belatedly) more clear which command I used in all the examples in the bug page, I'll first post the GentooVM command (differs in the current command of that name in the Wiki page, because it uses the -netdev syntax, as per the current note on that Wiki page https://wiki.gentoo.org/wiki/QEMU/Linux_guest).
Created attachment 451344 [details] GentooVM_PART1.sh Sorry that the previous GentooVM attachment doesn't show in Bugzilla but has to be downloaded. I'll add '.sh' to the next command to try and see if it then will. And I cut the last part from the command that causes panic. That is the first next test that I will run. I am now back at: 4.7.9-hardened with all the cgroup and advanced netfilter frills that libvirt, if they are not configured when libvirt compiles, says, "is not set when it should be".
The command: GentooVM_PART1.sh -boot d -cdrom install-amd64-minimal-20161020.iso ran fine, except moaning about "no space left" and dropping me to an emergency shell. (necessary to post the system logs to confirm this?) So I now will run the command like the GentooVM_PART1.sh, but with the memory option added, next.
Created attachment 451348 [details] GentooVM_PART1_mem.sh GentooVM_PART1_mem.sh booted without "no space left" complaint.
Created attachment 451350 [details] GentooVM_PART1_monitor.sh The command: GentooVM_PART1_monitor.sh -boot d -cdrom install-amd64-minimal-20161020.iso also ran fine.
I'm sorry I messed up again. Pls forget all these wrong last three tests or so. I ran them with the grsec patched in but all options disabled. Which means I ran them with the kernel of the localversion -161024_12 instead of -161024_09 (see attachment https://597554.bugs.gentoo.org/attachment.cgi?id=451310 with the diff of those two) Rerunning those three last tests next, but with the 4.7.9-hardened, localvesion -161024_09.
Nope, but I was closer. First I just ran with yesterday's (European time) kernel: which from the 4.7.9-hardened, localvesion -161024_09 differs only in that SANITIZE is enabled in it. And I don't want to lose the Call Trace, but have no patience nor time to manually type it again (since in grsec-hardened kernel of late it seems to me Call Traces are mostly saved in the logs anyway). The difference btwn these already posted Call Traces I'll try and tell... Compared to this one: https://597554.bugs.gentoo.org/attachment.cgi?id=451082 It has similar text (not the same!) to: Oct 23 00:47:39 g5n kernel: [ 170.480825] RAX: 0000000000000000 RBX: ffff8803f88b4000 RCX: 0000000000000001 Oct 23 00:47:39 g5n kernel: [ 170.480974] RDX: 0000000000000001 RSI: ffff880427845000 RDI: ffff8803f88b4a50 Oct 23 00:47:39 g5n kernel: [ 170.481122] RBP: ffffc9000b423d50 R08: 0000000000000000 R09: 0000000000000000 Oct 23 00:47:39 g5n kernel: [ 170.481271] R10: ffff8804278450d0 R11: 0000039286840000 R12: ffff8803f88b4a58 Oct 23 00:47:39 g5n kernel: [ 170.481419] R13: ffff8803f88b4a50 R14: ffff8800b999d9c0 R15: ffff8800b9d441a8 Oct 23 00:47:39 g5n kernel: [ 170.481569] FS: 0000039286640b00(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000 Oct 23 00:47:39 g5n kernel: [ 170.481738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 23 00:47:39 g5n kernel: [ 170.481882] CR2: 0000000000000000 CR3: 0000000002187000 CR4: 00000000000006f0 Oct 23 00:47:39 g5n kernel: [ 170.482029] Stack: Oct 23 00:47:39 g5n kernel: [ 170.482075] ffff8803f88b4000 0000000000000008 ffff8800b9808b40 ffffc9000b423d68 Oct 23 00:47:39 g5n kernel: [ 170.482248] ffffffff8101a334 ffff880427845000 ffffc9000b423da8 ffffffff8119ce39 Oct 23 00:47:39 g5n kernel: [ 170.482419] ffff8804278450d0 ffff880427845080 ffff8803f883e400 ffff8803f883e840 Oct 23 00:47:39 g5n kernel: [ 170.482590] Call Trace: Oct 23 00:47:39 g5n kernel: [ 170.482650] [<ffffffff8101a334>] kvm_vm_release+0x17/0x32 Oct 23 00:47:39 g5n kernel: [ 170.482768] [<ffffffff8119ce39>] __fput+0x121/0x1d5 Oct 23 00:47:39 g5n kernel: [ 170.482874] [<ffffffff8119cf33>] ____fput+0x11/0x22 Oct 23 00:47:39 g5n kernel: [ 170.482981] [<ffffffff810d347e>] task_work_run+0x8c/0xb3 Oct 23 00:47:39 g5n kernel: [ 170.483111] [<ffffffff810bb9c1>] do_exit+0x40f/0x999 Oct 23 00:47:39 g5n kernel: [ 170.483220] [<ffffffff810dc364>] ? wake_up_state+0x1d/0x2d Oct 23 00:47:39 g5n kernel: [ 170.483341] [<ffffffff810c4c04>] ? signal_wake_up_state+0x2c/0x4b Oct 23 00:47:39 g5n kernel: [ 170.483471] [<ffffffff810bbfdd>] do_group_exit+0x48/0xb8 Oct 23 00:47:39 g5n kernel: [ 170.483586] [<ffffffff810bc05f>] sys_exit_group+0x12/0x1a Oct 23 00:47:39 g5n kernel: [ 170.483705] [<ffffffff81b63924>] entry_SYSCALL_64_fastpath+0x13/0xa3 Oct 23 00:47:39 g5n kernel: [ 170.483841] Code: 00 00 00 00 55 48 89 e5 41 55 41 54 49 89 fc 4d 8d ac 24 50 0a 00 00 49 81 c4 58 0a 00 00 53 4c 89 ef e8 b7 44 b4 00 49 8b 04 24 <48> 8b 18 48 8d b8 40 ff ff ff 48 81 eb c0 00 00 00 48 8d 87 c0 Oct 23 00:47:39 g5n kernel: [ 170.484609] RIP [<ffffffff8101ec78>] kvm_irqfd_release+0x27/0x85 Oct 23 00:47:39 g5n kernel: [ 170.484744] RSP <ffffc9000b423d38> Oct 23 00:47:39 g5n kernel: [ 170.484818] CR2: 0000000000000000 and does not have the rest, including the ... kvm_irqfd_release ... Just if I lose it, that it be approximately clear what it looked like.
Created attachment 451352 [details] config-4.7.9-161023_CallTrace.txt Of course I was wrong. It was there, it only didn't show on the screen (wouldn't fit on 1024x768 ?). Now do I try the same with: 4.7.9-hardened, localvesion -161024_09 (SANITIZE disabled)? I think that I should.
Created attachment 451354 [details] kernel-4.7.9-hardened-161024_09_at_161024_2353.syslog This is all that happened: $ GentooVM_PART1.sh -boot d -cdrom install-amd64-minimal-20161020.iso ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory failed to initialize KVM: Cannot allocate memory $ So I guess I should try GentooVM_PART1_mem.sh command. Pls. PaX Team do tell me if I'm still making wrong choices.
And I now get that same "Cannot allocate memory failed to initialize KVM" error with any of: GentooVM_PART1.sh GentooVM_PART1_mem.sh and even GentooVM (see previous attachments). What's happening? I'm lost...
The same kernel as in: Comment 21 And the same command: GentooVM -boot d -cdrom install-amd64-minimal-20161020.iso as in: https://597554.bugs.gentoo.org/attachment.cgi?id=451300 (the attachment of that Comment). Also with the later variants of GentooVM_<something>. But now it all flops, nothing happens, no errors in the logs, qemu doesn't start, and it's uninfluenced from outside Air-Gapped machine...
I'll rerun tests with first) my regular, previous to PaX Team call for tests more or less complete grsec-including features second) the no-SANITIZE one. I'll be including the kernel config comparison with the one posted in https://bugs.gentoo.org/show_bug.cgi?id=597554#c20 I can confirm that the one in my machine that I posted in comment 20 is: bc5961bb27839b11202b95e6ddd189d7fab3444970e259fa2a9725097e3e852a config-4.7.9-hardened-161024_09 just as (should be) when you download it. Below also the first of the next five attachment should be: 11e03066799548c1680a288a935579d64e76bc5acd837b71e17e3661d5552397 config-4.7.9-hardened-161023 And I'll post the config-4.7.9-hardened-161023 for comparison. then the syslog to confirm it, then the test itself for each, with the Call Trace or if not with analysis. I have a few hours maximum the time to work on this. I have important things to do for some maybe even two or three days afterwords, so after a few more tests, I'll be off. So there are now two tests, complete. These files I'll post, it is: config-4.7.9-hardened-161023 hardened-4.7.9-161023-161025_0332_test_syslog_kernel_1_boot.log hardened-4.7.9-161023-161025_0332_test_syslog_kernel_2_CallTrace.log hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_1_boot.log hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_2_CallTrace_NONE.log hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_3.stout I'll attach these files one by one, in the above order.
Created attachment 451360 [details] config-4.7.9-hardened-161023
Created attachment 451362 [details] hardened-4.7.9-161023-161025_0332_test_syslog_kernel_1_boot.log
Created attachment 451364 [details] hardened-4.7.9-161023-161025_0332_test_syslog_kernel_2_CallTrace.log
Created attachment 451366 [details] hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_1_boot.log
Created attachment 451368 [details] hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_2_CallTrace_NONE.log
Created attachment 451370 [details] hardened-4.7.9-161024_09_161025_0347_test_syslog_kernel_3.stout
Now some kind advice could be great to hear, if what I'll (with the last available time left for now) try to understand, and if soon advised, run more tests. What does this mean? If the only differenc btwn the two kernels, one panic'ing, the other not doing the job is: # diff -u config-4.7.9-hardened-161023 config-4.7.9-hardened-161024_09 --- config-4.7.9-hardened-161023 2016-10-23 00:40:35.000000000 +0200 +++ config-4.7.9-hardened-161024_09 2016-10-25 04:10:47.379531793 +0200 @@ -53,7 +53,7 @@ CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set -CONFIG_LOCALVERSION="-161023" +CONFIG_LOCALVERSION="-161024_09" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y @@ -3905,6 +3905,7 @@ # Memory Debugging # # CONFIG_PAGE_EXTENSION is not set +# CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_PAGE_POISONING is not set # CONFIG_DEBUG_OBJECTS is not set # CONFIG_SLUB_DEBUG_ON is not set @@ -4153,7 +4154,7 @@ # # Miscellaneous hardening features # -CONFIG_PAX_MEMORY_SANITIZE=y +# CONFIG_PAX_MEMORY_SANITIZE is not set CONFIG_PAX_MEMORY_STACKLEAK=y CONFIG_PAX_MEMORY_STRUCTLEAK=y # CONFIG_PAX_MEMORY_UDEREF is not set # which means, with SANITIZE: crash, w/o SANITIZE: no work... If that's the only difference, what is the next test to do? I'm still at a loss.
While I wait just a little longer for advice on what to do now, I'll revisit from comment: https://bugs.gentoo.org/show_bug.cgi?id=597554#c19 and also comment 20 and comment 21. I will check out again my logs, and look up more closely because it is unbelievable that back then there was, with: hardened-161024_09 a panic and a Call Trace, and after more tests, and after the install of vanilla kernel, there was only, with the same: hardened-161024_09 [there was only] qemu no-work. That discrepancy is striking to me. But I won't post about it if there is no misposting of my attachments and some error of some kind in my presenting of them, and if there are no more replies to read here... Would shutting down the machine, and disconnecting it from the mains, and pressing the on-switch to discharge, and restarting after a few more minutes and rerunning the test with the no-SANITIZE hardened-161024_09 kernel make any difference? I'll try that too...
(In reply to miro.rovis from comment #57) > While I wait just a little longer for advice on what to do now, I'll revisit > from comment: > https://bugs.gentoo.org/show_bug.cgi?id=597554#c19 > and also comment 20 and comment 21. > I will check out again my logs, and look up more closely because it is > unbelievable that back then there was, with: > hardened-161024_09 > a panic and a Call Trace, and after more tests, and after the install of > vanilla kernel, there was only, with the same: > hardened-161024_09 > [there was only] qemu no-work. > > That discrepancy is striking to me. But I won't post about it if there is no > misposting of my attachments and some error of some kind in my presenting of > them, and if there are no more replies to read here... OK, I'll still confirm that all those logs are correct and correspond to what I'll archive for a little longer, if need appeared that someone else make sure. > Would shutting down the machine, and disconnecting it from the mains, and > pressing the on-switch to discharge, and restarting after a few more minutes > and rerunning the test with the no-SANITIZE > hardened-161024_09 > kernel make any difference? I'll try that too... No, it didn't make any difference. We are at consistent crash for hardened-161023 and now newly consistent "ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory" as in https://bugs.gentoo.org/show_bug.cgi?id=597554#c46 for hardened-161024_09 And we are still at (only) 4.4.8-hardened-r1 doing the job with libvirt and qemu and guests. Everything best I wish to everybody!
Little googling around revealed a use after free on kvm_irqfd_release but no patches nor comments on it :( https://lkml.org/lkml/2016/6/21/541 My guess is that this is caused by some issue on the upstream kernel and the only reason why this is noticed more actively with memory sanitization is because the memory is filled with 0s triggering a null pointer dereference instead of something else. Just throwing around some pointers (pun intended) in case they are of help.
(In reply to Francisco Blas Izquierdo Riera from comment #59) > Little googling around revealed a use after free on kvm_irqfd_release but no > patches nor comments on it :( > https://lkml.org/lkml/2016/6/21/541 > > My guess is that this is caused by some issue on the upstream kernel and the > only reason why this is noticed more actively with memory sanitization is > because the memory is filled with 0s triggering a null pointer dereference > instead of something else. > > Just throwing around some pointers (pun intended) in case they are of help. kernel with kasan compiled in fails to boot #41 https://github.com/google/kasan/issues/41 I like helping getting whitehats on use-after-free bugs... If only we get the perpetrators, willing or nilling perpetrators...
Created attachment 452712 [details] messages_161108_125631_g5n with Call Traces This bug appears to carry on to 4.7.10-hardened-r2 . It's recorded in the syslog that I attach, what happened. Nov 8 12:52:19 $ qemu-img (qemu-img create -f qcow2 GentooVM.img 15G Nov 8 12:56:01 (but the cat... is exec'd at bottom of the 30 seconds later created messages_161108_125631_g5n, below) # sleep 30 && cat /var/log/messages | grep -aE -A30000 68798.280977 \ > messages_$(date +%y%m%d_%H%M%S)_g5n ( NOTE: I broke the line for readability in email ) Nov 8 12:56:09 $ GentooVM -boot d -cdrom install-amd64-minimal-20161020.iso (where GentooVM is attachment: https://597554.bugs.gentoo.org/attachment.cgi?id=451342 at comment: https://bugs.gentoo.org/show_bug.cgi?id=597554#c38) Nov 8 12:56:09 ( $ the script GentooVM command line reported by grsec: qemu-system-x86_64 -enable-kvm -cpu host -drive file=GentooVM.img ... ) Nov 8 12:56:11 ( reported by kernel: BUG: unable to handle kernel NULL pointer... ) ( two Call Traces reported by kernel all in that second ) Pls. see attachment for that. Nov 8 12:56:31 ( grsec reports the execution of the cat, date and grep commands from close to the top of this post --I only cut the excessive after lines (-A3000) from the attachment) I can't let this go on without trying... more testing and maybe get you seniors to identify the roots of what causes this issue and then fix it. My system I always build in Air-Gapped. While I can not be sure, it is still likely that if the "higher level race and/or use-after-free condition" (which might be what breaks it here, as PaX Team said in https://bugs.gentoo.org/show_bug.cgi?id=597554#c16 ) are causing this, these race and/or condition have likely, or at the least it is perfectly possible that, they have been brought into my Air-Gapped by some package(s) that was/were signed-allowed into some of the portage snapshots, which I exclusively use for my emerge-webrsync Air-Gapped updates of my Gentoo (I keep all the portage snapshots for longer yet, just in case, as well as all the packages that I ever installed since I went Air-Gapped from scratch some maybe three years ago now). Should I try and follow PaX Team's suggestion by first: revising my previous steps, but with the updated kernels, both hardened of 4.4.x and 4.7.x (or if it gets to 4.{7,8}.x in the meantime) series as well as vanilla kernels, when I followed PaX Team's advice at https://bugs.gentoo.org/show_bug.cgi?id=597554#c16 , and then go for his suggestion at https://bugs.gentoo.org/show_bug.cgi?id=597554#c29 and do a binary search on the options given to qemu-system-x86_x64 and hopefully narrow down as to the root causes of this? Kind regards! Miroslav Rovis Zagreb, Croatia http://www.CroatiaFidelis.hr
Created attachment 452714 [details] config-4.7.10-hardened-r2-161107_06.diff from config-4.7.9-hardened-161024_09 The kernel config diff of today's Call Traces. The reference config is: https://bugs.gentoo.org/attachment.cgi?id=451298 at comment: https://bugs.gentoo.org/show_bug.cgi?id=597554#c20 I hope the old emerge --info is fine, as in: https://bugs.gentoo.org/attachment.cgi?id=451296 at comment https://bugs.gentoo.org/show_bug.cgi?id=597554#c19 but if any more is needed, you seniors pls. do tell!
After being hit by this bug with hardened-sources-4.7.10 (kvm going bad with a similar stack trace, other parts of the system looked fine) on my workstation, i ran some tests on a spare box. I could not reproduce the issue on the test box, whatever kernel version (tried vanilla 4.7.6 with page poisoning enabled & 4.4.8, hardened-4.7.10 with and without SANITIZE). Thus, i'm inclined to think the bug happens only on some hardware configurations. The test box (where i couldn't reproduce the bug) dates back from 2008 and has 8GB of RAM. The workstation (where the bug happens) has 32GB of RAM. To other people who experience the bug, can you report a bit more about your hardware? FWIW: looking at kvm_irqfd_release source, generated code, and the stack trace, it looks like the whole kvm struct is freed/zeroed.
(In reply to Étienne Buira from comment #63) > To other people who experience the bug, can you report a bit more about your > hardware? My hardwere is as old as this Gentoo Forum post of mine (posted right when I bought it): https://forums.gentoo.org/viewtopic-t-940916-postdays-0-postorder-asc-start-0.html#7173430 The HDDs and some peripherals only may be newer currently.
Created attachment 459184 [details] Qemu_GentooVM_170108_emerge--info.txt There have been some changes with this bug (just: not solved yet). I've followed (completely correctly this time around) the procedure outlined by PaX Team in Comment 16: > i tried to reproduce this without success so let me ask you guys for a few > more tests. note that we don't touch any related code and the irqfd list > handling itself is simple enough that i don't see how it would be wrong so > my guess is that there's probably a higher level race and/or use-after-free > condition somewhere that clears the irqfds.items list pointers to NULL > (which isn't a valid state even for an otherwise empty list, hence the > oops/NULL-deref). so the tests: > > 1. try to disable SANITIZE > 2. try to disable everything in grsec (but still patch it in) > 3. try a vanilla kernel (no grsec patched in) but with PAGE_POISONING (new > feature, imitates a subset of SANITIZE) with and then without > PAGE_POISONING_ZERO as well. > > these tests will hopefully narrow the problem down a bit. also if you can > come up with a simpler reproducer than what Christian and Miro posted, i'd > like to know. and in Comment 17: > for the above tests you'll also have to pass page_poison=on on the > kernel command line to actually activate poisoning. The attachment to this comment is: Qemu_GentooVM_170108_emerge--info.txt Here are the kernels that I did the testing with, first the names of the kernels, well: of their config's, next their config's (for just the two "master" kernels) or the diffs from their respective "master" (for the remaining three). The config's/the config.diff's I am, unless the internet should misbehave (just generally saying, my Gentoo works just fine; meaning only: I don't control future, and the least do I control, say, my provider)... [The config's/the config.diff's I am, unless the internet should misbehave] attaching next, in successive order, to this Bugzilla. But there's one that I don't need to attach, because it's already there, and I carefully checked it: I used that same script in all today's tests. It's the script that I named (just as in the: https://wiki.gentoo.org/wiki/QEMU/Linux_guest#Configuration the wiki page which I never yet have been able to complete _with_ the _hardened_), "GentooVM", and it is at the address: https://bugs.gentoo.org/attachment.cgi?id=451342 near Comment 38 above. The command line I used in all the five tests was, again (just with the current ISO from: https://www.gentoo.org/downloads/): $ GentooVM -boot d -cdrom install-amd64-minimal-20170105.iso So these are the kernels that I tested with today: -rw-r--r-- 1 120981 2017-01-06 05:42 config-4.8.15-hardened-r2-170106_05 -rw-r--r-- 1 118033 2017-01-08 04:28 config-4.9.1-170108_04 -rw-r--r-- 1 118044 2017-01-08 06:23 config-4.9.1-170108_05 -rw-r--r-- 1 121028 2017-01-08 07:12 config-4.8.15-hardened-r2-170108_05 -rw-r--r-- 1 120082 2017-01-08 11:24 config-4.8.15-hardened-r2-170108_10 And accordingly these [will] be the attachments to follow, in this order, next: config-4.8.15-hardened-r2-170106_05 config-4.9.1-170108_04 config-4.9.1-170108_05.diff config-4.8.15-hardened-r2-170108_05.diff config-4.8.15-hardened-r2-170108_10.diff The diffs I obtained in this way: diff -u config-4.9.1-170108_04 config-4.9.1-170108_05 > \ config-4.9.1-170108_05.diff diff -u config-4.8.15-hardened-r2-170106_05 \ config-4.8.15-hardened-r2-170108_05 > \ config-4.8.15-hardened-r2-170108_05.diff diff -u config-4.8.15-hardened-r2-170106_05 \ config-4.8.15-hardened-r2-170108_10 > \ config-4.8.15-hardened-r2-170108_10.diff After I, hopefully, post them, I will tell how each of the five tests fared. And I'm bracing (teeth gnashing ;-) ) for the binary search on which of the grsecurity options clashes with the likely race condition and/or use-after free condition we (likely) have here ( because, in short, only the kernels corresponding to config-4.8.15-hardened-r2-170106_05 and config-4.8.15-hardened-r2-170108_05 (the with and without SANITIZE) didn't start the VM at all, short report on the error: "ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory" ), and the config-4.8.15-hardened-r2-170108_10 is grsec compiled-in but no options whatsoever selected at all.
Created attachment 459188 [details] config-4.8.15-hardened-r2-170106_05
Created attachment 459190 [details] config-4.9.1-170108_04
Created attachment 459192 [details] config-4.9.1-170108_05.diff
Created attachment 459194 [details] config-4.8.15-hardened-r2-170108_05.diff
Created attachment 459196 [details, diff] config-4.8.15-hardened-r2-170108_10.diff
More in detail now, how those kernels fared. Where "kvm: zapping shadow pages", text from /var/log/messages, the VM started, booted, all fine. So only config-4.8.15-hardened-r2-170106_05 and config-4.8.15-hardened-r2-170108_05 no start, let alone booting. config-4.8.15-hardened-r2-170106_05 $ GentooVM -boot d -cdrom install-amd64-minimal-20170105.iso ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory failed to initialize KVM: Cannot allocate memory $ --- config-4.9.1-170108_04 page_poison=on both POISON, and POISON_ZERO Jan 8 06:27:07 g5n login[3983]: ROOT LOGIN on '/dev/tty5' Jan 8 06:28:20 g5n kernel: [ 147.964388] kvm [4390]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Jan 8 06:28:20 g5n kernel: [ 148.006688] kvm: zapping shadow pages for mmio generation wraparound Jan 8 06:28:20 g5n kernel: [ 148.024985] kvm: zapping shadow pages for mmio generation wraparound --- config-4.9.1-170108_05 page_poison=on both POISON, *no* POISON_ZERO Jan 8 06:34:56 g5n sudo: miro : TTY=pts/11 ; PWD=/home/miro ; USER=root ; COMMAND=/bin/bash Jan 8 06:35:39 g5n kernel: [ 211.424682] kvm [4404]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Jan 8 06:35:39 g5n kernel: [ 211.467392] kvm: zapping shadow pages for mmio generation wraparound Jan 8 06:35:39 g5n kernel: [ 211.487056] kvm: zapping shadow pages for mmio generation wraparound --- config-4.8.15-hardened-r2-170108_05 $ GentooVM -boot d -cdrom install-amd64-minimal-20170105.iso ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory failed to initialize KVM: Cannot allocate memory $ --- config-4.8.15-hardened-r2-170108_10 Jan 8 11:31:10 g5n sudo: miro : TTY=pts/12 ; PWD=/home/miro ; USER=root ; COMMAND=/bin/bash Jan 8 11:32:50 g5n kernel: [ 329.157388] kvm [4450]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Jan 8 11:32:50 g5n kernel: [ 329.200026] kvm: zapping shadow pages for mmio generation wraparound Jan 8 11:32:50 g5n kernel: [ 329.218557] kvm: zapping shadow pages for mmio generation wraparound Jan 8 11:34:49 g5n kernel: [ 447.564079] kvm [4450]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Next, but this has been half a day's work, so, pls. bear longer now, the binary search on which grsecurity option it is that clashes with virtualization (or reveals things...). Bear longer now pls.
KVM gives me problems with this enabled in kernel "CONFIG_GRKERNSEC_SYSFS_RESTRICT" should try and turn that off. It gives me exact same error as you stated above. https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Sysfs.2Fdebugfs_restriction
Created attachment 459214 [details, diff] config-4.8.15-hardened-r2-170108_18.diff (In reply to yandereson from comment #72) > KVM gives me problems with this enabled in kernel > "CONFIG_GRKERNSEC_SYSFS_RESTRICT" should try and turn that off. > It gives me exact same error as you stated above. > > https://en.wikibooks.org/wiki/Grsecurity/Appendix/ > Grsecurity_and_PaX_Configuration_Options#Sysfs.2Fdebugfs_restriction Apparently that is it! Thanks! While the kernel with the only difference, from my full grsecurity-hardened optimized for security, being in that GRKERNSEC_SYSFS option, in the newly being compiled, disabled, is compiling, here's the result from my initial binary search: The attachment config-4.8.15-hardened-r2-170108_18.diff is derived like this: diff -u config-4.8.15-hardened-r2-170106_05 \ config-4.8.15-hardened-r2-170108_18 > \ config-4.8.15-hardened-r2-170108_18.diff And qemu booted fine: Jan 8 19:17:51 g5n kernel: [ 264.064129] grsec: (miro:U:/usr/bin/qemu-system-x86_64) exec of /usr/bin/qemu-system-x86_64 (qemu-system-x86_64 -enable-kvm -cpu host -drive file=GentooVM.img,if=virtio -netdev user,id=vmnic,hostname=gentoovm -device virt) by /usr/bin/qemu-system-x86_64[GentooVM:4437] uid/euid:1000/1000 gid/egid:1000/1000, parent /bin/bash[bash:4306] uid/euid:1000/1000 gid/egid:1000/1000 ... Jan 8 19:17:59 g5n kernel: [ 272.513073] kvm [4437]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Jan 8 19:17:59 g5n kernel: [ 272.556214] kvm: zapping shadow pages for mmio generation wraparound Jan 8 19:17:59 g5n kernel: [ 272.573557] kvm: zapping shadow pages for mmio generation wraparound Likely because if you grep the diff for that string, you get: $ grep GRKERNSEC_SYSFS config-4.8.15-hardened-r2-170108_18.diff -CONFIG_GRKERNSEC_SYSFS_RESTRICT=y +# CONFIG_GRKERNSEC_SYSFS_RESTRICT is not set $ Will post the (likely) confirmation upon the (likely) final compilation, as the binary search might not be necessary any more.
Created attachment 459218 [details, diff] config-4.8.15-hardened-r2-170108_20.diff Pls. see config-4.8.15-hardened-r2-170108_20.diff , with the sole difference from my security optimized grsecurity-hardened being in that: +# CONFIG_GRKERNSEC_SYSFS_RESTRICT is not set And qemu booted the CD image just fine! Jan 8 20:47:58 g5n kernel: [ 192.256308] grsec: (miro:U:/usr/bin/qemu-system-x86_64) exec of /usr/bin/qemu-system-x86_64 (qemu-system-x86_64 -enable-kvm -cpu host -drive file=GentooVM.img,if=virtio -netdev user,id=vmnic,hostname=gentoovm -device virt) by /usr/bin/qemu-system-x86_64[GentooVM:4208] uid/euid:1000/1000 gid/egid:1000/1000, parent /bin/bash[bash:4079] uid/euid:1000/1000 gid/egid:1000/1000 ... Jan 8 20:48:03 g5n kernel: [ 197.427606] kvm [4208]: vcpu0, guest rIP: 0xffffffff8103a831 unhandled rdmsr: 0xc0010048 Jan 8 20:48:03 g5n kernel: [ 197.476991] kvm: zapping shadow pages for mmio generation wraparound Jan 8 20:48:03 g5n kernel: [ 197.509899] kvm: zapping shadow pages for mmio generation wraparound Phew! PaX Team, spender, blueness, what are the security considerations here? Virtualization stripping us off from another pretective layer here? How bad will that protection be missing to us, if I may ask, and if any of you have time to tell us about it?
I can confirm CONFIG_GRKERNSEC_SYSFS_RESTRICT exposes the bug. From an strace of qemu (compared with and w/o sysfs_restrict), everything looks the same until ioctl VM creation.
Can you paste me from the log what sysfs entries qemu is requiring access to? Thanks, -Brad
hardened-sources-4.8.17-r2 fails in a cleaner way (ENOMEM). @Brad: there is no sysfs required, i have open("/dev/kvm", O_RDWR|O_CLOEXEC) = 9, followed by some ioctls on fd9, then ioctl(9, KVM_CREATE_VM, 0) = -1 ENOMEM (Cannot allocate memory) Reading at the sources, i see only kzalloc(sizeof(struct whatever), GFP_KERNEL), kvm_kvzalloc, alloc_percpu, kmalloc(sizeof(struct whatever), GFP_KERNEL) that can return ENOMEM. Hard to believe it has something to do with sysfs protection!
Created attachment 464872 [details] Strace of failing qemu on hardened-sources-4.8.17-r2 with sysfs protection
Can you apply https://grsecurity.net/~spender/debugfs_debug.diff and give me the kernel logs it produces? I'm not sure how SYSFS_RESTRICT would be causing the failure of any of that code, but we'll get it figured out and fixed. Thanks! -Brad
"failed to create directory", nice catch :)
tried to grasp a bit more about this issue, here what i got dentry creation failed in fs/debugfs/inode.c:start_creation because EACCES /sys/kernel/debug/kvm is rwx------ root:root i start qemu as a user (member of kvm group) hence, the task had no right to create a dentry there after pth="/sys/kernel/debug/kvm"; chown :kvm $pth; chmod 770 $pth ; the directory can be created, but it fails to create pf_fixed: "failed to create debugfs file for pf_fixed"(ie the first entry)
Created attachment 464920 [details, diff] Remove grsec protection from debugfs Also, as debugfs mountpoint is hard enough to reach, i considered removing its grsec protection (done with attached patch). Works nice with this patch applied (with still some sysfs protection).
Apply this instead: https://grsecurity.net/~spender/kvm.diff I'll be applying it to the next patches. -Brad
Just to add to this, I believe GRKERNSEC_KMEM being enabled would prevent this problem from happening as well, as it'd force DEBUG_FS off and then make kvm's check for debugfs presence fail, letting the function return 0 and continuing without error. -Brad
Your patch is better (despite generating a warning) and works. Thanks!