Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 558138 - =sys-kernel/hardened-sources-4.1.4 intel_iommu deadlocks server
Summary: =sys-kernel/hardened-sources-4.1.4 intel_iommu deadlocks server
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Anthony Basile
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-19 08:27 UTC by Christian Roessner
Modified: 2015-12-23 08:16 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
lspci -vvvxx (lspci.gz,6.29 KB, application/x-gzip)
2015-08-19 08:28 UTC, Christian Roessner
Details
lstopo (lstopo.gz,423 bytes, application/x-gzip)
2015-08-19 08:28 UTC, Christian Roessner
Details
kernel config (kernel-config.gz,23.48 KB, application/x-gzip)
2015-08-19 08:29 UTC, Christian Roessner
Details
dmidecode (dmidecode.gz,4.15 KB, application/x-gzip)
2015-08-19 08:29 UTC, Christian Roessner
Details
Spec file of the server (SE316M1_TechSpecs.pdf,248.40 KB, application/pdf)
2015-08-19 08:30 UTC, Christian Roessner
Details
Screenshot from iLO2 while booting (Boot.png,83.61 KB, image/png)
2015-08-25 08:42 UTC, Christian Roessner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Roessner 2015-08-19 08:27:39 UTC
I tried very hard to get VT-d support in my KVM guests. So I built a kernel with IOMMU support. I started the new kernel by giving intel_iommu=on at the grub command line.

The kernel boots and if I am fast enough, I can login to the server. Maybe even if I am fast enough, I can start htop. What I see is that the load of the server is raising up, until the server is fully stuck (like a fork bomb).

Unfortunately I can not see any process that could cause this problem. It all looks normal. What I found in the logs is this:

Aug 18 20:10:48 mon kernel: NMI: PCI system error (SERR) for reason a1 on CPU 0.
Aug 18 20:10:48 mon kernel: Dazed and confused, but trying to continue
Aug 18 20:10:48 mon kernel: dmar: DRHD: handling fault status reg 2
Aug 18 20:10:48 mon kernel: dmar: DMAR:[DMA Read] Request device [00:1e.0] fault addr 1000
                            DMAR:[fault reason 06] PTE Read access is not set


Aug 18 20:10:49 mon kernel: dmar: DRHD: handling fault status reg 102
Aug 18 20:10:49 mon kernel: dmar: DMAR:[DMA Read] Request device [00:1e.0] fault addr e763e000
                            DMAR:[fault reason 06] PTE Read access is not set


The device that causes trouble:

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

I attached several files for my server and its configuration. The server is a HP ProLiant SE316M1. See spec file, if interested.

Reproducible: Always

Steps to Reproduce:
1. Build kernel (hardened-sources) with IOMMU support
2. Add intel_iommu=on on command line
3. Boot the server.
Actual Results:  
Server raises load until it totally freezes.

Expected Results:  
Normal work. A stable running system with VT-d support (for KVM guests)

emerge --info hardened-sources libvirt qemu
Portage 2.2.20 (python 2.7.9-final-0, hardened/linux/amd64/no-multilib, gcc-4.8.4, glibc-2.20-r2, 4.1.4-hardened x86_64)
=================================================================
                         System Settings
=================================================================
System uname: Linux-4.1.4-hardened-x86_64-Intel-R-_Xeon-R-_CPU_L5520_@_2.27GHz-with-gentoo-2.2
KiB Mem:    49453524 total,  10976440 free
KiB Swap:    2097148 total,   2097148 free
Timestamp of repository gentoo: Tue, 18 Aug 2015 21:15:01 +0000
sh bash 4.3_p33-r2
ld GNU ld (Gentoo 2.24 p1.4) 2.24
ccache version 3.1.9 [enabled]
app-shells/bash:          4.3_p33-r2::gentoo
dev-lang/perl:            5.20.2::gentoo
dev-lang/python:          2.7.9-r1::gentoo, 3.4.1::gentoo
dev-util/ccache:          3.1.9-r4::gentoo
dev-util/cmake:           3.2.2::gentoo
dev-util/pkgconfig:       0.28-r2::gentoo
sys-apps/baselayout:      2.2::gentoo
sys-apps/openrc:          0.17::gentoo
sys-apps/sandbox:         2.6-r1::gentoo
sys-devel/autoconf:       2.69::gentoo
sys-devel/automake:       1.15::gentoo
sys-devel/binutils:       2.24-r3::gentoo
sys-devel/gcc:            4.8.4::gentoo
sys-devel/gcc-config:     1.7.3::gentoo
sys-devel/libtool:        2.4.6::gentoo
sys-devel/make:           4.1-r1::gentoo
sys-kernel/linux-headers: 3.18::gentoo (virtual/os-headers)
sys-libs/glibc:           2.20-r2::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage
    priority: -1000

x-portage
    location: /usr/local/portage
    masters: gentoo
    priority: 0

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/easy-rsa /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.6/ext-active/ /etc/php/cgi-php5.6/ext-active/ /etc/php/cli-php5.6/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--keep-going --with-bdeps=y --binpkg-respect-use=y --binpkg-changed-deps=y --usepkg=y --rebuilt-binaries=y --rebuilt-binaries-timestamp=20140405050000"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs ccache compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://de-mirror.org/gentoo/ rsync://de-mirror.org/gentoo/"
LANG="en_US.utf8"
LC_ALL="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j17"
PKGDIR="/export/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
USE="acl adns aio amd64 bacula-clientonly bacula-console bash-completion berkdb bindist btrfs bzip2 caps cli cracklib crypt curl cxx device-mapper dri gdbm hardened iconv ipv6 justify logrotate loop-aes lzo mmap mmx mmxext modules ncurses nls nptl nscd ntp openmp openssl pam pax_kernel pcre pie readline seccomp session sse sse2 ssl ssp systemd tcpd threads unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="de en" NGINX_MODULES_HTTP="access auth_basic autoindex browser charset dav empty_gif fastcgi geo gzip headers_more limit_conn limit_req map memcached proxy referer rewrite scgi spdy split_clients ssi upstream_ip_hash userid uwsgi" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 i386" QEMU_USER_TARGETS="x86_64 i386" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON

=================================================================
                        Package Settings
=================================================================

sys-kernel/hardened-sources-4.0.8::gentoo was built with the following:
USE="symlink -build -deblob"


sys-kernel/hardened-sources-4.1.4::gentoo was built with the following:
USE="symlink -build -deblob"


app-emulation/libvirt-1.2.15-r1::gentoo was built with the following:
USE="audit caps fuse iscsi libvirtd lvm lxc macvtap nfs nls numa parted pcap qemu sasl systemd udev vepa -avahi -firewalld (-glusterfs) -openvz -phyp -policykit -rbd (-selinux) -uml -virt-network -virtualbox (-wireshark-plugins) -xen"


app-emulation/qemu-2.3.0-r5::gentoo was built with the following:
USE="aio caps curl fdt filecaps jpeg lzo ncurses nls numa pin-upstream-blobs png python sasl seccomp spice threads tls uuid vhost-net vnc xattr -accessibility -alsa -bluetooth -debug (-glusterfs) -gtk -gtk2 -infiniband -iscsi -nfs -opengl -pulseaudio -rbd -sdl (-selinux) -smartcard -snappy -ssh -static -static-softmmu -static-user -systemtap -tci -test -usb -usbredir -vde -virtfs -xen -xfs" PYTHON_TARGETS="python2_7" QEMU_SOFTMMU_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -cris -lm32 (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -moxie -or32 (-ppc) (-ppc64) -ppcemb -s390x -sh4 -sh4eb (-sparc) -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -armeb -cris (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 (-ppc) (-ppc64) -ppc64abi32 -s390x -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -unicore32"

>>> Attempting to run pkg_info() for 'app-emulation/qemu-2.3.0-r5'
Using:
  app-emulation/spice-protocol-0.12.3
  sys-firmware/ipxe-1.0.0_p20130925
  sys-firmware/seabios-1.7.5
    USE=binary
  sys-firmware/vgabios-0.7a
Comment 1 Christian Roessner 2015-08-19 08:28:24 UTC
Created attachment 409410 [details]
lspci -vvvxx
Comment 2 Christian Roessner 2015-08-19 08:28:51 UTC
Created attachment 409412 [details]
lstopo
Comment 3 Christian Roessner 2015-08-19 08:29:18 UTC
Created attachment 409414 [details]
kernel config
Comment 4 Christian Roessner 2015-08-19 08:29:41 UTC
Created attachment 409416 [details]
dmidecode
Comment 5 Christian Roessner 2015-08-19 08:30:10 UTC
Created attachment 409418 [details]
Spec file of the server
Comment 6 Attila Tóth 2015-08-20 15:21:08 UTC
I've been seeing intel-iommu.c related log messages for many kernel versions, but without any deadlocks. The systems is a custom built server based on Asus Z8PE-D12X mobo housing two Xeon 5620 CPUs. I could not get closer to the problem since it's a really early log message.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at drivers/iommu/intel-iommu.c:3214 intel_unmap+0x146/0x200()
Driver unmaps unmatched page at PFN 0
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.5-hardened-r1 #1
Hardware name: System manufacturer System Product Name/Z8P(N)E-D12(X), BIOS 1302    06/25/2012
 0000000000000000 fafc4ab8f91ddaac ffffffff9819efbf 0000000000000000
 ffffffff9819efbf ffffffff91e59d45 ffff880237c03d80 ffffffff910a9757
 ffffffff9819efbf 0000000000000c8e ffffffff981de5b0 ffff8802361fcf68
Call Trace:
 <IRQ>  [<ffffffff91e59d45>] ? dump_stack+0x40/0x56
 [<ffffffff910a9757>] ? warn_slowpath_common+0x77/0xb0
 [<ffffffff910a97fc>] ? warn_slowpath_fmt+0x6c/0x90
 [<ffffffff916a7716>] ? intel_unmap+0x146/0x200
 [<ffffffff9177a45e>] ? twa_interrupt+0x48e/0x780
 [<ffffffff910f9c63>] ? handle_irq_event_percpu+0x73/0x120
 [<ffffffff910f9d40>] ? handle_irq_event+0x30/0x50
 [<ffffffff910fce58>] ? handle_fasteoi_irq+0x88/0x180
 [<ffffffff91005385>] ? handle_irq+0x85/0x160
 [<ffffffff910ce014>] ? atomic_notifier_call_chain+0x24/0x30
 [<ffffffff91004c01>] ? do_IRQ+0x41/0xf0
 [<ffffffff91e65c57>] ? common_interrupt+0x97/0x97
 <EOI>  [<ffffffff919d82c7>] ? cpuidle_enter_state+0xb7/0x160
 [<ffffffff910eebfb>] ? cpu_startup_entry+0x27b/0x300
 [<ffffffff9cc1507a>] ? start_kernel+0x4a9/0x4ca
 [<ffffffff9cc14120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff9cc145f7>] ? x86_64_start_kernel+0x10b/0x12f
---[ end trace a7906508600bc5fc ]---
Comment 7 Anthony Basile gentoo-dev 2015-08-22 19:46:12 UTC
I just yanked 4.1.4 off the tree and rapid stabilized 4.1.6 because of bug #558282.  Can you test to see if you hit the same issue with 4.1.6 which has the very latest grsecurity patches.  If so, I'll push this upstream.  I hope we don't get into a succession of rapid stabilizations.
Comment 8 Christian Roessner 2015-08-22 23:12:42 UTC
I just tested 4.1.6. Giving intel_iommu=on boots the kernel. I can login remotely with ssh, but after that the system gets stuck again. But slightly different.

While with 4.1.4 a command totally let the bash freeze, with 4.1.6 some commands still work. This is really interesting.

It works: ls, cd /
It does not work: top, top, lspci

But I can interrupt them now with Ctrl+C. This is new in 4.1.6.

I could not get ls -la working.

So this is somewhat funny. Ctrl+Alt+Del did not succeed. I had to use iLO to poweroff the server.

Disabling intel_iommu, the server gets back to normal behavior (without VT-d)
Comment 9 Anthony Basile gentoo-dev 2015-08-22 23:15:44 UTC
Okay cc-ing upstream.

@pageexec, hardened-sources-4.1.6 = latest patches.
Comment 10 Christian Roessner 2015-08-23 07:26:53 UTC
This morning I quickly tested vanilla-sources-4.1.6 with intel_iommu=on. The kernel sucks while booting. So I guess the problem with IOMMU is directly in the kernel and not in the patches.
Comment 11 Attila Tóth 2015-08-23 14:38:44 UTC
If I search for intel_iommu related problems, there are several reports, but many of those are several years old. It seems to me, that the actual problem can be related to some system specific hardware, like for example iwlwifi.
In my case:
 [<ffffffff916a7716>] ? intel_unmap+0x146/0x200
 [<ffffffff9177a45e>] ? twa_interrupt+0x48e/0x780
 [<ffffffff910f9c63>] ? handle_irq_event_percpu+0x73/0x120
 [<ffffffff910f9d40>] ? handle_irq_event+0x30/0x50
Here twa_interruption is related to a 3ware hardware RAID controller. The driver may misbehaves in a way iommu would not expect it to...

I'm interested in whether the reporter can fetch an oops from his logs as he can now log in to the server before an eventual deadlock.
Comment 12 Christian Roessner 2015-08-23 14:44:34 UTC
Unfortunately I also follow a different bug that might be kernel related and this takes a lot of time for testing. So I have to be patient and can not interrupt this at the present.

All I can say is that I could log in with the hardened sources kernel, but nothing else. With a pure vanilla kernel, the kernel not even boots, because it loops endless with errors. No Oopses. The messages are already shown above in this thread.
Comment 13 Anthony Basile gentoo-dev 2015-08-24 19:55:04 UTC
(In reply to Christian Roessner from comment #12)
> Unfortunately I also follow a different bug that might be kernel related and
> this takes a lot of time for testing. So I have to be patient and can not
> interrupt this at the present.
> 
> All I can say is that I could log in with the hardened sources kernel, but
> nothing else. With a pure vanilla kernel, the kernel not even boots, because
> it loops endless with errors. No Oopses. The messages are already shown
> above in this thread.

i just added 4.1.6-r1 to the tree with the latest grsec patches.  You may want to test those.
Comment 14 Christian Roessner 2015-08-25 08:42:29 UTC
Created attachment 410182 [details]
Screenshot from iLO2 while booting

I tested 4.1.6-r1

intel_iommu=yes does not boot. The system hangs forever. See screenshot
Comment 15 PaX Team 2015-08-25 09:28:27 UTC
(In reply to Christian Roessner from comment #14)
> I tested 4.1.6-r1
> 
> intel_iommu=yes does not boot. The system hangs forever. See screenshot

can you capture the kernel logs via netconsole or normal remote logging?
Comment 16 Christian Roessner 2015-08-25 13:22:35 UTC
https://www.roessner-network-solutions.com/wp-content/uploads/hardened-sources-4.1.6-r1-boot.pdf

The file is too big for this bug tracker and bzip2 -9 did not really shrink it enough.
Comment 17 Christian Roessner 2015-08-25 13:53:10 UTC
By looking deeper into the screenshots, I find one difference between booting with and without intel_iommu:

WITH intel_iommu, I get this NMI error, which might be the radeon graphics card. I also see no [drm] lines with this.

WITHOUT intel_iommu, there are no NMI errors and the radeon card prints out messages:

[    8.226144] [drm] radeon kernel modesetting enabled.
[    8.227037] [drm] initializing kernel modesetting (RV100 0x1002:0x515E 0x103C:0x31FB).
[    8.227049] [drm] register mmio base: 0xFB8F0000
[    8.227051] [drm] register mmio size: 65536
[    8.227118] radeon 0000:01:03.0: VRAM: 128M 0x00000000F0000000 - 0x00000000F7FFFFFF (64M used)
[    8.227121] radeon 0000:01:03.0: GTT: 512M 0x00000000D0000000 - 0x00000000EFFFFFFF
[    8.227129] [drm] Detected VRAM RAM=128M, BAR=128M
[    8.227130] [drm] RAM width 16bits DDR
[    8.227269] [TTM] Zone  kernel: Available graphics memory: 24726852 kiB
[    8.227271] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    8.227273] [TTM] Initializing pool allocator
[    8.227282] [TTM] Initializing DMA pool allocator
[    8.227306] [drm] radeon: 64M of VRAM memory ready
[    8.227308] [drm] radeon: 512M of GTT memory ready.
[    8.227321] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    8.247922] [drm] PCI GART of 512M enabled (table at 0x00000000E2700000).
[    8.248044] radeon 0000:01:03.0: WB disabled
[    8.248049] radeon 0000:01:03.0: fence driver on ring 0 use gpu addr 0x00000000d0000000 and cpu addr 0xffff8800e2f65000
[    8.248052] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    8.248053] [drm] Driver supports precise vblank timestamp query.
[    8.248074] [drm] radeon: irq initialized.
[    8.248082] [drm] Loading R100 Microcode
[    8.298358] [drm] radeon: ring at 0x00000000D0001000
[    8.298385] [drm] ring test succeeded in 1 usecs
[    8.359619] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
[    8.431227] scsi_id (2832) used greatest stack depth: 11744 bytes left
[    8.756651] EXT4-fs (sdb1): warning: maximal mount count reached, running e2fsck is recommended
[    8.760880] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[    8.772386] Adding 2097148k swap on /dev/sda1.  Priority:-1 extents:1 across:2097148k FS
[    8.799795] [drm] ib test succeeded in 0 usecs
[    8.800105] [drm] No TV DAC info found in BIOS
[    8.800144] [drm] Radeon Display Connectors
[    8.800146] [drm] Connector 0:
[    8.800147] [drm]   VGA-1
[    8.800150] [drm]   DDC: 0x60 0x60 0x60 0x60 0x60 0x60 0x60 0x60
[    8.800151] [drm]   Encoders:
[    8.800152] [drm]     CRT1: INTERNAL_DAC1
[    8.800154] [drm] Connector 1:
[    8.800155] [drm]   VGA-2
[    8.800157] [drm]   DDC: 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c 0x6c
[    8.800158] [drm]   Encoders:
[    8.800160] [drm]     CRT2: INTERNAL_DAC2
[    8.873071] [drm] fb mappable at 0xF0040000
[    8.873073] [drm] vram apper at 0xF0000000
[    8.873075] [drm] size 786432
[    8.873076] [drm] fb depth is 8
[    8.873076] [drm]    pitch is 1024
[    8.873158] fbcon: radeondrmfb (fb0) is primary device
[    9.030904] Console: switching to colour frame buffer device 128x48
[    9.039514] radeon 0000:01:03.0: fb0: radeondrmfb frame buffer device
[    9.039516] radeon 0000:01:03.0: registered panic notifier
[    9.046952] [drm] Initialized radeon 2.42.0 20080528 for 0000:01:03.0 on minor 0

Do you thin IOMMU conflicts with the radeon stuff?
Comment 18 PaX Team 2015-08-25 15:03:31 UTC
based on page 3 the kernel thinks that your motherboard has all kinds of problems, you should probably not be using VT-d with it (or performance counters for that matter).
Comment 19 Christian Roessner 2015-08-25 16:43:44 UTC
(In reply to PaX Team from comment #18)
> based on page 3 the kernel thinks that your motherboard has all kinds of
> problems, you should probably not be using VT-d with it (or performance
> counters for that matter).

I have seen this. The interesting thing is that the same server hardware running with Debian Jessie runs perfectly with intel_iommu=on, even the server logs exactly the same messages:

[    0.034929] This system BIOS has enabled interrupt remapping
on a chipset that contains an erratum making that
feature unstable.  To maintain system stability
interrupt remapping is being disabled.  Please
contact your BIOS vendor for an update
[    0.035405] Switched APIC routing to physical flat.
[    0.036051] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.075871] smpboot: CPU0: Intel(R) Xeon(R) CPU           L5640  @ 2.27GHz (fam: 06, model: 2c, stepping: 02)
[    0.182131] Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events, Broken BIOS detected, complain to your hardware vendor.
[    0.182477] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)

And this server is not freezing or having any problems at all. Only difference is that his server has 2xXeon L5640 and I have 2xXeon L5520.

So the server should not stuck on this.
Comment 20 Anthony Basile gentoo-dev 2015-10-25 11:43:38 UTC
(In reply to Christian Roessner from comment #19)
> (In reply to PaX Team from comment #18)
> > based on page 3 the kernel thinks that your motherboard has all kinds of
> > problems, you should probably not be using VT-d with it (or performance
> > counters for that matter).
> 
> I have seen this. The interesting thing is that the same server hardware
> running with Debian Jessie runs perfectly with intel_iommu=on, even the
> server logs exactly the same messages:
> 
> [    0.034929] This system BIOS has enabled interrupt remapping
> on a chipset that contains an erratum making that
> feature unstable.  To maintain system stability
> interrupt remapping is being disabled.  Please
> contact your BIOS vendor for an update
> [    0.035405] Switched APIC routing to physical flat.
> [    0.036051] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.075871] smpboot: CPU0: Intel(R) Xeon(R) CPU           L5640  @
> 2.27GHz (fam: 06, model: 2c, stepping: 02)
> [    0.182131] Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events,
> Broken BIOS detected, complain to your hardware vendor.
> [    0.182477] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
> 38d is 330)
> 
> And this server is not freezing or having any problems at all. Only
> difference is that his server has 2xXeon L5640 and I have 2xXeon L5520.
> 
> So the server should not stuck on this.

Did you ever test kernels after this.  If you can, try hardened-sources-4.2.4.
Comment 21 Christian Roessner 2015-11-02 16:42:51 UTC
I try to uprage the firmware of my server this evening. If that succeeds, I will change my CPUs from L5520 to L5640.

After that I can do new tests with the kernel.

I will come back later some time and update this report.
Comment 22 Anthony Basile gentoo-dev 2015-11-09 09:05:52 UTC
(In reply to Christian Roessner from comment #21)
> I try to uprage the firmware of my server this evening. If that succeeds, I
> will change my CPUs from L5520 to L5640.
> 
> After that I can do new tests with the kernel.
> 
> I will come back later some time and update this report.

Any news on this?  FYI, I'm looking at stabilizing hardened-sources-4.2.5-r1 so you may want to test that kernel.
Comment 23 Christian Roessner 2015-11-09 12:16:10 UTC
Not so good news

I could not upgrade the BIOS. HP does not offer a Service-Pack, which detects my server. I _have_ the BIOS here as ROM file, but no software under Linux that could upgrade it :-( I spent days of tests and downloaded tons of gigs of service packs and stuff. No success. At the end I even coulnd not exchange the CPUs as the BIOS is too old and does not turn on.

I tested the new kernel and received errors like this:

[   22.499979] PAX: size overflow detected in function em_call arch/x86/kvm/emulate.c:3333 cicus.720_78 max, count: 13, decl: _eip; num: 0; context: x86_emulate_ctxt;
[   22.500268] CPU: 15 PID: 3683 Comm: qemu-system-x86 Tainted: G          I     4.2.5-hardened-r1 #1
[   22.500270] Hardware name: HP ProLiant SE316M1   , BIOS R02 10/02/2009
[   22.500272]  ffffffffa073b67e ffffc90014d13a28 ffffffff81594584 ffff880c17aeeb70
[   22.500275]  ffffffffa073b6a0 ffffc90014d13a58 ffffffff811908f6 ffff880bec641570
[   22.500277]  0000000000000006 000000000000957e ffffffffffffdf61 ffffc90014d13aa8
[   22.500279] Call Trace:
[   22.500292]  [<ffffffff81594584>] dump_stack+0x45/0x57
[   22.500298]  [<ffffffff811908f6>] report_size_overflow+0x56/0x60
[   22.500310]  [<ffffffffa07979ab>] em_call+0x4b/0x280 [kvm]
[   22.500318]  [<ffffffffa0797960>] ? em_call_near_abs+0x150/0x150 [kvm]
[   22.500326]  [<ffffffffa079a099>] x86_emulate_insn+0x279/0xe20 [kvm]
[   22.500334]  [<ffffffffa077efed>] x86_emulate_instruction+0x1bd/0x740 [kvm]
[   22.500340]  [<ffffffffa1076100>] ? vmx_sync_dirty_debug_regs+0x40/0x70 [kvm_intel]
[   22.500343]  [<ffffffffa107c55b>] vmx_handle_exit+0x1cb/0x12c0 [kvm_intel]
[   22.500346]  [<ffffffffa1077fc9>] ? vmx_set_cr3+0xa9/0x130 [kvm_intel]
[   22.500357]  [<ffffffffa079d3ff>] ? kvm_lapic_find_highest_irr+0x4f/0x70 [kvm]
[   22.500360]  [<ffffffffa107359e>] ? vmx_handle_external_intr+0x5e/0x60 [kvm_intel]
[   22.500367]  [<ffffffffa0782efb>] kvm_arch_vcpu_ioctl_run+0x66b/0x12a0 [kvm]
[   22.500375]  [<ffffffffa077c5ca>] ? kvm_arch_vcpu_load+0x14a/0x1d0 [kvm]
[   22.500381]  [<ffffffffa076c57e>] kvm_vcpu_ioctl+0x34e/0x610 [kvm]
[   22.500384]  [<ffffffffa1077323>] ? vmx_vcpu_put+0x43/0x50 [kvm_intel]
[   22.500391]  [<ffffffffa07814f3>] ? kvm_arch_vcpu_put+0x23/0x40 [kvm]
[   22.500394]  [<ffffffff8119f972>] do_vfs_ioctl+0x422/0x740
[   22.500398]  [<ffffffff810e140c>] ? __audit_syscall_entry+0xac/0xf0
[   22.500400]  [<ffffffff8119fccc>] SyS_ioctl+0x3c/0x70
[   22.500403]  [<ffffffff8159991b>] entry_SYSCALL_64_fastpath+0x12/0x79
[   22.501312] br0: port 3(vnet1) entered forwarding state
[   22.501329] br0: port 3(vnet1) entered forwarding state
[   22.529688] PAX: size overflow detected in function em_call arch/x86/kvm/emulate.c:3333 cicus.720_78 max, count: 13, decl: _eip; num: 0; context: x86_emulate_ctxt;
[   22.529903] CPU: 15 PID: 3683 Comm: qemu-system-x86 Tainted: G          I     4.2.5-hardened-r1 #1
[   22.529904] Hardware name: HP ProLiant SE316M1   , BIOS R02 10/02/2009
[   22.529906]  ffffffffa073b67e ffffc90014d13a08 ffffffff81594584 ffff880c17aeeb70
[   22.529908]  ffffffffa073b6a0 ffffc90014d13a38 ffffffff811908f6 ffff880bec641570
[   22.529910]  0000000000000006 000000000000957e ffffffffffffdf61 ffffc90014d13a88
[   22.529912] Call Trace:
[   22.529918]  [<ffffffff81594584>] dump_stack+0x45/0x57
[   22.529920]  [<ffffffff811908f6>] report_size_overflow+0x56/0x60
[   22.529929]  [<ffffffffa07979ab>] em_call+0x4b/0x280 [kvm]
[   22.529937]  [<ffffffffa0797960>] ? em_call_near_abs+0x150/0x150 [kvm]
[   22.529944]  [<ffffffffa079a099>] x86_emulate_insn+0x279/0xe20 [kvm]
[   22.529952]  [<ffffffffa077efed>] x86_emulate_instruction+0x1bd/0x740 [kvm]
[   22.529955]  [<ffffffffa107c55b>] vmx_handle_exit+0x1cb/0x12c0 [kvm_intel]
[   22.529958]  [<ffffffffa1077fc9>] ? vmx_set_cr3+0xa9/0x130 [kvm_intel]
[   22.529965]  [<ffffffffa079d3ff>] ? kvm_lapic_find_highest_irr+0x4f/0x70 [kvm]
[   22.529973]  [<ffffffffa0782efb>] kvm_arch_vcpu_ioctl_run+0x66b/0x12a0 [kvm]
[   22.529981]  [<ffffffffa077c5ca>] ? kvm_arch_vcpu_load+0x14a/0x1d0 [kvm]
[   22.529986]  [<ffffffffa076c57e>] kvm_vcpu_ioctl+0x34e/0x610 [kvm]
[   22.529989]  [<ffffffffa1077323>] ? vmx_vcpu_put+0x43/0x50 [kvm_intel]
[   22.529997]  [<ffffffffa07814f3>] ? kvm_arch_vcpu_put+0x23/0x40 [kvm]
[   22.529999]  [<ffffffff8119f972>] do_vfs_ioctl+0x422/0x740
[   22.530001]  [<ffffffff8119f972>] ? do_vfs_ioctl+0x422/0x740
[   22.530003]  [<ffffffff8119f972>] ? do_vfs_ioctl+0x422/0x740
[   22.530005]  [<ffffffff810e140c>] ? __audit_syscall_entry+0xac/0xf0
[   22.530017]  [<ffffffff8119fccc>] SyS_ioctl+0x3c/0x70
[   22.530019]  [<ffffffff8159991b>] entry_SYSCALL_64_fastpath+0x12/0x79
[   22.530032]  [<ffffffff810e162c>] ? __audit_syscall_exit+0x1dc/0x270
[   22.530097] PAX: size overflow detected in function em_call arch/x86/kvm/emulate.c:3333 cicus.720_78 max, count: 13, decl: _eip; num: 0; context: x86_emulate_ctxt;
[   22.530309] CPU: 15 PID: 3683 Comm: qemu-system-x86 Tainted: G          I     4.2.5-hardened-r1 #1
[   22.530310] Hardware name: HP ProLiant SE316M1   , BIOS R02 10/02/2009
[   22.530311]  ffffffffa073b67e ffffc90014d139f8 ffffffff81594584 0000000000000007
[   22.530314]  ffffffffa073b6a0 ffffc90014d13a28 ffffffff811908f6 ffff880bec641570
[   22.530315]  0000000000000006 000000000000957e ffffffffffffdf61 ffffc90014d13a78
[   22.530317] Call Trace:
[   22.530320]  [<ffffffff81594584>] dump_stack+0x45/0x57
[   22.530323]  [<ffffffff811908f6>] report_size_overflow+0x56/0x60
[   22.530331]  [<ffffffffa07979ab>] em_call+0x4b/0x280 [kvm]
[   22.530338]  [<ffffffffa0797960>] ? em_call_near_abs+0x150/0x150 [kvm]
[   22.530346]  [<ffffffffa079a099>] x86_emulate_insn+0x279/0xe20 [kvm]
[   22.530353]  [<ffffffffa077efed>] x86_emulate_instruction+0x1bd/0x740 [kvm]
[   22.530357]  [<ffffffffa107c55b>] vmx_handle_exit+0x1cb/0x12c0 [kvm_intel]
[   22.530360]  [<ffffffffa1077fc9>] ? vmx_set_cr3+0xa9/0x130 [kvm_intel]
[   22.530367]  [<ffffffffa079d3ff>] ? kvm_lapic_find_highest_irr+0x4f/0x70 [kvm]
[   22.530374]  [<ffffffffa0782efb>] kvm_arch_vcpu_ioctl_run+0x66b/0x12a0 [kvm]
[   22.530382]  [<ffffffffa077c5ca>] ? kvm_arch_vcpu_load+0x14a/0x1d0 [kvm]
[   22.530387]  [<ffffffffa076c57e>] kvm_vcpu_ioctl+0x34e/0x610 [kvm]
[   22.530390]  [<ffffffffa1077323>] ? vmx_vcpu_put+0x43/0x50 [kvm_intel]
[   22.530398]  [<ffffffffa07814f3>] ? kvm_arch_vcpu_put+0x23/0x40 [kvm]
[   22.530400]  [<ffffffff8119f972>] do_vfs_ioctl+0x422/0x740
[   22.530406]  [<ffffffffa076c363>] ? kvm_vcpu_ioctl+0x133/0x610 [kvm]
[   22.530408]  [<ffffffff8119f972>] ? do_vfs_ioctl+0x422/0x740
[   22.530410]  [<ffffffff810e140c>] ? __audit_syscall_entry+0xac/0xf0
[   22.530412]  [<ffffffff8119fccc>] SyS_ioctl+0x3c/0x70
[   22.530414]  [<ffffffff8119f972>] ? do_vfs_ioctl+0x422/0x740
[   22.530416]  [<ffffffff8159991b>] entry_SYSCALL_64_fastpath+0x12/0x79
Comment 24 PaX Team 2015-11-09 14:39:21 UTC
thanks, the next patch will fix the size overflow report (in the future it'd be better to open a separate bug and not mix it with other issues ;).
Comment 25 Christian Roessner 2015-11-09 16:01:49 UTC
You are absolutely right. I shouldn't had mixed these issues :-) I think I posted it here, because you planed on stabilizing the next kernel and I wanted to be quick :-)
Comment 26 Anthony Basile gentoo-dev 2015-11-09 23:02:01 UTC
(In reply to Christian Roessner from comment #25)
> You are absolutely right. I shouldn't had mixed these issues :-) I think I
> posted it here, because you planed on stabilizing the next kernel and I
> wanted to be quick :-)

hardened-sources-4.2.5-r2 is in the tree with grsecurity-3.1-4.2.5-201511081815

I don't know i pipac's fix made it in there.
Comment 27 PaX Team 2015-11-09 23:14:58 UTC
(In reply to Anthony Basile from comment #26)
> I don't know i pipac's fix made it in there.

no, not yet, i'm about to release that patch.
Comment 28 Anthony Basile gentoo-dev 2015-12-16 00:50:44 UTC
(In reply to PaX Team from comment #27)
> (In reply to Anthony Basile from comment #26)
> > I don't know i pipac's fix made it in there.
> 
> no, not yet, i'm about to release that patch.

@Roessner and pipacs.  I assume this is fixed in the lastest hardened-sources 4.2.7 i the tree?  Can I close this bug or no?
Comment 29 PaX Team 2015-12-16 03:06:18 UTC
the size overflow report was fixed long ago, but i don't know about the iommu one, doesn't look like it's related to grsec.
Comment 30 Christian Roessner 2015-12-16 10:02:58 UTC
I ordered a new server. Same hardware, BUT newest BIOS and this server can deal with the newer CPUs. It should arrive this week and then I will copy the file system from the old server to the new machine. After that I will give IOMMU a new try and if you want to leave the report open, I can come back here and report the results. With the current "stable" and latest kernel
Comment 31 Anthony Basile gentoo-dev 2015-12-23 08:16:13 UTC
(In reply to Christian Roessner from comment #30)
> I ordered a new server. Same hardware, BUT newest BIOS and this server can
> deal with the newer CPUs. It should arrive this week and then I will copy
> the file system from the old server to the new machine. After that I will
> give IOMMU a new try and if you want to leave the report open, I can come
> back here and report the results. With the current "stable" and latest kernel

Okay reopen this if its still an issue.