Multiple page allocation failure errors I upgraded the kernel from 2.6.31-r6 to kernel 2.6.32-r7 and this seems to have diminished the number of page allocation failures The vast majority of the time they are in the form (swapper: page allocation failure. order:4, mode:0x20) occasionally from sshd When I was using kernel 2.6.31-r6 they never occured (machine has been running this install for 1 year) until I did an emerge --sync about a week ago and then emerge --update --deep --newuse world After that I started to get page allocation failures. There were so many packagees updated in that process that I will never know what was upgraded that cause the issue. (Before package upgrade, this the machine ran 67 days without a single error or issue since the last reboot, and no errors since the gentoo install) I thought that upgrading the kernel to the most recent version (at the time) to kernel 2.6.32-r7 would alleviate the issue. The page allocation failures happen less frequently but still occur. At this time this does not appear to impact my machine/server in any negative way. But it is important to me to have everything running tip top so to speak. I have spent considerable time searching, to no avail..... Reproducible: Always Steps to Reproduce: Not sure I can purposely initiate the error. Actual Results: swapper: page allocation failure. order:4, mode:0x20 Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1 Call Trace: <IRQ> [<ffffffff8107f082>] 0xffffffff8107f082 [<ffffffff810a29a4>] 0xffffffff810a29a4 [<ffffffff810a2b9f>] 0xffffffff810a2b9f [<ffffffff810a2d35>] 0xffffffff810a2d35 [<ffffffff810a2de3>] 0xffffffff810a2de3 [<ffffffff810a2e4e>] 0xffffffff810a2e4e [<ffffffff813ed961>] 0xffffffff813ed961 [<ffffffff813ee21b>] 0xffffffff813ee21b [<ffffffffa07d8499>] 0xffffffffa07d8499 [<ffffffff813f4a3d>] 0xffffffff813f4a3d [<ffffffff81406417>] 0xffffffff81406417 [<ffffffff813f4f57>] 0xffffffff813f4f57 [<ffffffff8143217d>] 0xffffffff8143217d [<ffffffff81432294>] 0xffffffff81432294 [<ffffffff8143122f>] 0xffffffff8143122f [<ffffffff81431aba>] 0xffffffff81431aba [<ffffffff8107e00e>] ? 0xffffffff8107e00e [<ffffffff81443473>] 0xffffffff81443473 [<ffffffff81445a4b>] 0xffffffff81445a4b [<ffffffff813ec99d>] ? 0xffffffff813ec99d [<ffffffff81445b8b>] 0xffffffff81445b8b [<ffffffff8144162a>] 0xffffffff8144162a [<ffffffff81468393>] ? 0xffffffff81468393 [<ffffffff81448067>] 0xffffffff81448067 [<ffffffff8103d570>] ? 0xffffffff8103d570 [<ffffffff81448710>] 0xffffffff81448710 [<ffffffff81412728>] ? 0xffffffff81412728 [<ffffffff8142d43c>] ? 0xffffffff8142d43c [<ffffffff8142d57a>] 0xffffffff8142d57a [<ffffffff8142d6e9>] 0xffffffff8142d6e9 [<ffffffff8142d138>] 0xffffffff8142d138 [<ffffffff8142d405>] 0xffffffff8142d405 [<ffffffffa07d84e5>] ? 0xffffffffa07d84e5 [<ffffffff813f3e17>] 0xffffffff813f3e17 [<ffffffff813f3f9b>] 0xffffffff813f3f9b [<ffffffff813f4426>] 0xffffffff813f4426 [<ffffffff813b096e>] 0xffffffff813b096e [<ffffffff813f4504>] 0xffffffff813f4504 [<ffffffff812fd555>] ? 0xffffffff812fd555 [<ffffffff8103d691>] 0xffffffff8103d691 [<ffffffff812fd624>] ? 0xffffffff812fd624 [<ffffffff8100cb2c>] 0xffffffff8100cb2c [<ffffffff8100e583>] 0xffffffff8100e583 [<ffffffff8103d38c>] 0xffffffff8103d38c [<ffffffff8100dc88>] 0xffffffff8100dc88 [<ffffffff8100c393>] 0xffffffff8100c393 <EOI> [<ffffffff81012828>] ? 0xffffffff81012828 [<ffffffff81012b69>] ? 0xffffffff81012b69 [<ffffffff8104f12f>] ? 0xffffffff8104f12f [<ffffffff8100ae00>] ? 0xffffffff8100ae00 [<ffffffff814de83f>] ? 0xffffffff814de83f Mem-Info: Node 0 DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 CPU 2: hi: 0, btch: 1 usd: 0 CPU 3: hi: 0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 163 CPU 1: hi: 186, btch: 31 usd: 4 CPU 2: hi: 186, btch: 31 usd: 29 CPU 3: hi: 186, btch: 31 usd: 2 Node 1 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 3 CPU 1: hi: 186, btch: 31 usd: 22 CPU 2: hi: 186, btch: 31 usd: 2 CPU 3: hi: 186, btch: 31 usd: 59 active_anon:49380 inactive_anon:93037 isolated_anon:27 active_file:23029 inactive_file:64869 isolated_file:0 unevictable:231748 dirty:33 writeback:0 unstable:0 free:7864 slab_reclaimable:5126 slab_unreclaimable:6956 mapped:25610 shmem:19 pagetables:2623 bounce:0 Node 0 DMA free:4020kB min:40kB low:48kB high:60kB active_anon:1172kB inactive_anon:1756kB active_file:3716kB inactive_file:4308kB unevictable:776kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:776kB dirty:0kB writeback:0kB mapped:80kB shmem:0kB slab_reclaimable:92kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 994 994 994 Node 0 DMA32 free:12212kB min:2828kB low:3532kB high:4240kB active_anon:108424kB inactive_anon:154224kB active_file:40036kB inactive_file:75924kB unevictable:546228kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:546228kB dirty:36kB writeback:0kB mapped:38020kB shmem:16kB slab_reclaimable:8480kB slab_unreclaimable:15628kB kernel_stack:2312kB pagetables:4552kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:35 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 1 DMA32 free:15224kB min:2872kB low:3588kB high:4308kB active_anon:87924kB inactive_anon:216168kB active_file:48364kB inactive_file:179244kB unevictable:379988kB isolated(anon):108kB isolated(file):0kB present:1033048kB mlocked:379988kB dirty:96kB writeback:0kB mapped:64340kB shmem:60kB slab_reclaimable:11932kB slab_unreclaimable:12088kB kernel_stack:784kB pagetables:5936kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:43 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Node 0 DMA: 1*4kB 0*8kB 3*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4020kB Node 0 DMA32: 1917*4kB 412*8kB 72*16kB 3*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12212kB Node 1 DMA32: 2138*4kB 658*8kB 88*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15224kB 93483 total pagecache pages 5603 pages in swap cache Swap cache stats: add 308963, delete 303359, find 276600/282629 Free swap = 4016516kB Total swap = 4200988kB 523986 pages RAM 31808 pages reserved 352863 pages shared 143308 pages non-shared I can include additional examples if needed.... Expected Results: No page allocation failures Linux 2.6.32-r7 make.conf CFLAGS="-march=k8 -O2 -pipe" CXXFLAGS="${CFLAGS}" # WARNING: Changing your CHOST is not something that should be done lightly. # Please consult http://www.gentoo.org/doc/en/change-chost.xml before changing. CHOST="x86_64-pc-linux-gnu" # These are the USE flags that were used in addition to what is provided by the # profile used for building. USE="policykit sqlite gnome-keyring consolekit crypt ctype imap libwww maildir sasl cairo xulrunner apache2 mysql php pam pcre session multilib hal dbus java png svg server jpeg jpeg2k java png truetype X opengl gtk gnome ipv6 unicode xorg dri nptl xml xml2 smp sockets ssl samba mmx sse sse2 -cups" MAKEOPTS="-j5" ACCEPT_LICENSE="*" GENTOO_MIRRORS="ftp://mirror.mcs.anl.gov/pub/gentoo/ " SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" INPUT_DEVICE="evdev" VIDEO_CARDS="vesa" No known or observed hardware errors/failures The machine has: 2 AMD dual core Opteron 2212 2 GiB Ram lspci (can make more verbose if needed) 00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge 00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge 00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC 00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01) 00:04.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:0b.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21) 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21) 03:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0) 03:0e.0 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode) Using virtualbox I host three vm's on this box, two ubuntu 64 bit server virtual machine and one Vista Business 32 bit virtual machine. I use Virtualbox 3.1.8 previously I used 3.0.12 Host created tap driver for two vm's sharing a static ip and the third vm with its own static ip. The machine also a VPN server using OpenVPN. The vm's use a total of 1320 Ram the rest about 688 is left for the host. The machine rarely uses more than 100MB swap space, my swap size is 4 GiB Software/services running at all times: Virtualbox (headless) Postfix Bind Dovecot OpenSSH (sshd) key file access only Xvnc OpenVPN syslog-ng vixie-cron dbus hald webmin (sometimes, not really a fan) mysql dkim-filter apache (apache2) emerge --info Portage 2.1.8.3 (default/linux/amd64/10.0, gcc-4.3.4, glibc-2.10.1-r1, 2.6.32-gentoo-r7 x86_64) ================================================================= System uname: Linux-2.6.32-gentoo-r7-x86_64-Dual-Core_AMD_Opteron-tm-_Processor_2212-with-gentoo-1.12.13 Timestamp of tree: Thu, 20 May 2010 03:45:01 +0000 app-shells/bash: 4.0_p37 dev-java/java-config: 2.1.10 dev-lang/python: 2.5.4-r2, 2.6.4-r1 dev-python/pycrypto: 2.1.0_beta1 dev-util/cmake: 2.6.4-r3 sys-apps/baselayout: 1.12.13 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.13, 2.63-r1 sys-devel/automake: 1.4_p6, 1.5, 1.8.5-r3, 1.9.6-r2, 1.10.3, 1.11.1 sys-devel/binutils: 2.18-r3 sys-devel/gcc: 4.3.4 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b virtual/os-headers: 2.6.30-r1 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=k8 -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb /var/bind /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=k8 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://mirror.mcs.anl.gov/pub/gentoo/ " LDFLAGS="-Wl,-O1" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" USE="X acl amd64 apache2 berkdb bzip2 cairo cli consolekit cracklib crypt ctype cxx dbus dri fortran gdbm gnome gnome-keyring gpm gtk hal iconv imap ipv6 java jpeg jpeg2k libwww maildir mmx modules mudflap multilib mysql ncurses nls nptl nptlonly opengl openmp pam pcre perl php png policykit pppd python readline reflection samba sasl server session smp sockets spl sqlite sse sse2 ssl svg sysfs tcpd truetype unicode xml xml2 xorg xulrunner zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
all occurences since Monday 05/17/2010 (last reboot) just including first line (can include the full log if needed) swapper: page allocation failure. order:4, mode:0x20 swapper: page allocation failure. order:4, mode:0x20 sshd: page allocation failure. order:3, mode:0x20 swapper: page allocation failure. order:4, mode:0x20 sshd: page allocation failure. order:3, mode:0x20 VBoxHeadless: page allocation failure. order:3, mode:0x20
Let's have a look at the output of your "free -l" and the contents of your /proc/meminfo.
(In reply to comment #2) > Let's have a look at the output of your "free -l" and the contents of your > /proc/meminfo. > First thanks for you reply, some time has gone by. Nothing new to report, sometimes I get one page allocation failure a day and sometimes many more. free -l total used free shared buffers cached Mem: 2059028 1931972 127056 0 10152 330576 Low: 2059028 1931972 127056 High: 0 0 0 -/+ buffers/cache: 1591244 467784 Swap: 4200988 233028 3967960 /proc/meminfo MemTotal: 2059028 kB MemFree: 127772 kB Buffers: 10208 kB Cached: 330740 kB SwapCached: 52884 kB Active: 203444 kB Inactive: 238992 kB Active(anon): 14528 kB Inactive(anon): 87068 kB Active(file): 188916 kB Inactive(file): 151924 kB Unevictable: 1313680 kB Mlocked: 1313680 kB SwapTotal: 4200988 kB SwapFree: 3968000 kB Dirty: 16 kB Writeback: 0 kB AnonPages: 1371252 kB Mapped: 103028 kB Shmem: 76 kB Slab: 44208 kB SReclaimable: 18940 kB SUnreclaim: 25268 kB KernelStack: 3016 kB PageTables: 10064 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 5230500 kB Committed_AS: 2487752 kB VmallocTotal: 34359738367 kB VmallocUsed: 102756 kB VmallocChunk: 34358560396 kB DirectMap4k: 2888 kB DirectMap2M: 2093056 kB Thank you for your assistance
It may sound a bit silly, but can you check if your NIC is in jumbo frames mode (MTU > 1500)? You can see that by running `ifconfig -a` and checking the options of your NIC.
MTU is 1500 NIC's are not in jumbo frame mode eth0 Link encap:Ethernet HWaddr OBFUSCATED inet addr:OBFUSCATED Bcast:OBFUSCATED Mask:255.255.224.0 inet6 addr: OBFUSCATED Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:128134569 errors:0 dropped:0 overruns:0 frame:0 TX packets:54245009 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:20527700983 (19.1 GiB) TX bytes:9734678389 (9.0 GiB) Interrupt:33 eth0:0 Link encap:Ethernet HWaddr OBFUSCATED inet addr:OBFUSCATED Bcast:OBFUSCATED Mask:255.255.224.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:33 tapl2w Link encap:Ethernet HWaddr OBFUSCATED inet addr:OBFUSCATED Bcast:OBFUSCATED Mask:255.255.255.0 inet6 addr: OBFUSCATED Scope:Link UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:OBFUSCATED P-t-P:OBFUSCATED Mask:255.255.255.255 UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1 RX packets:11403856 errors:0 dropped:0 overruns:0 frame:0 TX packets:12293875 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:598856521 (571.1 MiB) TX bytes:3588866282 (3.3 GiB)
Alright, could you pass that calltrace (or even any other calltrace that you have saved) through call2sym so that we can get something meaningful out of it?
Interestingly enough, call2sym (which I had to search for) had no output. I suspect this is because System.map is not matching up with the call trace, why this is I have no idea at the moment. Perhaps you would have some insight into this. Would a recompile of the kernel and then a reboot help? I included the System.map at http://gsxm.net/sm on my system it was located at /boot/System.map-genkernel-x86_64-2.6.32-gentoo-r7 I included some call trace's at http://gsxm.net/ct My kernel is still a genkernel, but I have customized the kernel to compile in all the drivers that I know my system needs as well as all the networking components it needs, the system would probably run without using genkernel kernel and initramfs, this is actually the last genkernel I was going to use before completely compiling my own kernel. However this page allocation failure issue is making me hesitant to do so as you might imagine. Thanks for you help, how to proceed next?
Even though I tried to find a solution to avoid making you reboot the machine, I think that in the end you will have to do it. Just enable CONFIG_KALLSYMS in your kernel config and the next time you reboot you will get meaningful call traces. Unfortunately, I'm totally clueless regarding genkernel so I can't help you with the procedure of enabling CONFIG_KALLSYMS. Shouldn't be too hard though.
I will compile in CONFIG_KALLSYMS, it is easy to modify the kernel using genkernel, and reboot tonight. I used genkernel to get the system deployed, while I learned the Linux kernel. I now have a highly customized kernel that is almost ready. Of course I want to make sure the kernel I compile has the necessary drivers and is stable. Rebooting frequently is not a luxury I have. The goal with t he next kernel release is to not use genkernel. This kind of hinges on solving this issue to some degree. Thanks for your help, I will post when I have recompiled the kernel and have rebooted.
Kernel recompiled (with CONFIG_KALLSYMS) and the system is rebooted. In case its needed, Kernel Config http://gsxm.net/cc System.map http://gsxm.net/sm I will post the first new page allocation that occurs.
update.... since the recompile of the kernel and the reboot there have been no page allocation failures so far. On occasion it has taken a few days for the first page allocation failure to appear. Typically the rate picks up after the initial occurence. I also updated a few packages before the reboot. If you do not mind, leave this bug open for a few more days, to see if another page allocation failure is generated. Thank-you for your help so far...
It took three days to start, but here are the page allocation errors. We have meaningful call traces with CONFIG_KALLSYMS compiled in the kernel. I included only part of the calltrace here. The rest can be seen at http://gsxm.net/ct HELP!!! Below is the first line of each page allocation failure that can be viewed in full at http://gsxm.net/ct Jun 14 00:02:02 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20 Jun 14 00:02:08 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20 Jun 14 00:02:08 gsrv kernel: imap-login: page allocation failure. order:5, mode:0x20 Jun 14 00:02:08 gsrv kernel: VBoxHeadless: page allocation failure. order:5, mode:0x20 Jun 14 00:02:08 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20
Alright, it seems like all the call traces lead to vboxNetFltOsInitInstance() before spitting the error. I reported the issue in the VirtualBox bugzilla here: http://www.virtualbox.org/ticket/6997 By the way, if you are not content with http://gsxm.net/ct appearing in the vbox report, just remove the file and we will find another way of linking to the call trace.
Once again thanks! During the time frame I have been getting page allocation failures, I have used these Virtual Box versions, including 3.0.12, 3.1.8, 3.2.2, as well as several others for short periods of time. I am fine with http://gsxm.net/ct appearing in the report. Perhaps at some point in the future for the long term archival of this bug it could be included as attachment, but that is up to you guys. There were issues getting the gentoo bugzilla attachment capability to work, that is why I decided to host the attachment. You seem pretty confident that VirtualBox is causing the issue, I wonder how many others are having similar issues. Please let me know if there is any more information you need.
Yes, I'm indeed a bit overconfident on the fact that virtualbox is to blame, but it's the common starting point in all call traces. I also found another vbox bug report that seems of interest [1], can you try to disable GSO on your NIC, like this: ethtool -K <interface> gso off ? [1]: http://www.virtualbox.org/ticket/5260
I will give the 'ethtool -K <interface> gso off' a shot. My set up is as follows, info in case its helpful, Three virtual machines, all run headless, running on a Gentoo 64 bit host, 2 GiB ram (will be expanding soon enough) AMD-V enabled in the bios for virtualization support for VirtualBox. CPU is AMD Opteron 2212 (dual core) x2 cpu's for a total of four cores. This Gentoo install is over a year old, it was the first Gentoo system I built. I have upgraded the kernel several times. I have used both the OSE and the PUEL versions of VirtualBox, currently PUEL. One Ubuntu 64 bit (384MB ram, 1CPU, INTEL PRO/1000 MT Desktop) guest is bridged via VirtualBox to the host ethernet adapter (Broadcom NIC, tg3 driver) and guest has its own wan static ip. Two other guests, Vista Business 32 (512MB, 2CPU, PCnet-FAST III) bit and another Ubuntu 64 bit (384MB ram, 1CPU, PCnet-FAST III) are bridged via VirtualBox to a TAP adapter created on the host. These two machines share a wan static ip, via NAT/SNAT running on the host using iptables, so basically I set up a small virtual lan for these two guests. As a side note I have experimented with all of the VirtualBox guest network adapters, it seems to make no difference which one is used, with regards to the page allocation errors. This is basically a production machine (currently testing a hosting concept, but needs high availability) The host has its own wan static ip as well. Interestingly enough, my machine defaulted to the tg3 driver, I noticed at this link http://www.virtualbox.org/ticket/5260 (that you sent) that someone discusses your suggestion specifically for the tg3. If this fix works how 'good' of a fix is this? and what are the ramifications? I will research this as well. Was not my intent to insinuate you were overconfident ;) Thanks for your help!
(In reply to comment #16) > I will give the 'ethtool -K <interface> gso off' a shot. > > My set up is as follows, info in case its helpful, > > Three virtual machines, all run headless, running on a Gentoo 64 bit host, 2 > GiB ram (will be expanding soon enough) AMD-V enabled in the bios for > virtualization support for VirtualBox. CPU is AMD Opteron 2212 (dual core) x2 > cpu's for a total of four cores. This Gentoo install is over a year old, it was > the first Gentoo system I built. I have upgraded the kernel several times. > > I have used both the OSE and the PUEL versions of VirtualBox, currently PUEL. > > One Ubuntu 64 bit (384MB ram, 1CPU, INTEL PRO/1000 MT Desktop) guest is bridged > via VirtualBox to the host ethernet adapter (Broadcom NIC, tg3 driver) and > guest has its own wan static ip. > > Two other guests, Vista Business 32 (512MB, 2CPU, PCnet-FAST III) bit and > another Ubuntu 64 bit (384MB ram, 1CPU, PCnet-FAST III) are bridged via > VirtualBox to a TAP adapter created on the host. These two machines share a wan > static ip, via NAT/SNAT running on the host using iptables, so basically I set > up a small virtual lan for these two guests. > > As a side note I have experimented with all of the VirtualBox guest network > adapters, it seems to make no difference which one is used, with regards to the > page allocation errors. > > This is basically a production machine (currently testing a hosting concept, > but needs high availability) The host has its own wan static ip as well. > > Interestingly enough, my machine defaulted to the tg3 driver, I noticed at this > link http://www.virtualbox.org/ticket/5260 (that you sent) that someone > discusses your suggestion specifically for the tg3. > > If this fix works how 'good' of a fix is this? and what are the ramifications? > I will research this as well. > > Was not my intent to insinuate you were overconfident ;) > > Thanks for your help! > If that's, indeed, your problem it's simply a matter of a memory leak when GSO is enabled. There seems to be a patch to plug the leak in that same vbox bug report [1], which means that if that's indeed your problem by patching vbox you will be able to use GSO without page allocation failures. tl;dr: it's a clean solution, not a dirty hack. [1]: http://www.virtualbox.org/attachment/ticket/5260/diff_vboxnetflt_linux
Ok new page allocation failure, remember that gso has been off. Jun 23 00:42:44 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20 Jun 23 00:42:44 gsrv kernel: Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1 Jun 23 00:42:44 gsrv kernel: Call Trace: Jun 23 00:42:44 gsrv kernel: <IRQ> [<ffffffff81080a26>] __alloc_pages_nodemask+0x5ad/0x5f7 Jun 23 00:42:44 gsrv kernel: [<ffffffff81034676>] ? select_task_rq_fair+0x6e9/0x748 Jun 23 00:42:44 gsrv kernel: [<ffffffff810a43b0>] kmem_getpages+0x53/0x119 Jun 23 00:42:44 gsrv kernel: [<ffffffff810a45ab>] fallback_alloc+0x135/0x1ab Jun 23 00:42:44 gsrv kernel: [<ffffffff810a4741>] ____cache_alloc_node+0x120/0x135 Jun 23 00:42:44 gsrv kernel: [<ffffffff810a47ef>] kmem_cache_alloc_node+0x99/0xc1 Jun 23 00:42:44 gsrv kernel: [<ffffffff810a485a>] __kmalloc_node+0x43/0x45 Jun 23 00:42:49 gsrv kernel: [<ffffffff813ef305>] __alloc_skb+0x6b/0x164 Jun 23 00:42:49 gsrv kernel: [<ffffffff813efbbf>] skb_copy+0x30/0x97 Jun 23 00:42:49 gsrv kernel: [<ffffffffa09803d9>] vboxNetFltOsInitInstance+0x35d/0xa00 [vboxnetflt] Jun 23 00:42:49 gsrv kernel: [<ffffffff813ea3e0>] ? sock_def_readable+0x6a/0x6f Jun 23 00:42:49 gsrv kernel: [<ffffffff813ec821>] ? __skb_clone+0x29/0xf1 Jun 23 00:42:49 gsrv kernel: [<ffffffff813f63e1>] dev_hard_start_xmit+0x152/0x2fa Jun 23 00:42:49 gsrv kernel: [<ffffffff81407dbb>] sch_direct_xmit+0x5e/0x160 Jun 23 00:42:49 gsrv kernel: [<ffffffff813f68fb>] dev_queue_xmit+0x25f/0x3ca Jun 23 00:42:49 gsrv kernel: [<ffffffff81433b21>] ip_finish_output+0x27f/0x2c5 Jun 23 00:42:49 gsrv kernel: [<ffffffff81433c38>] ip_output+0xd1/0xde Jun 23 00:42:49 gsrv kernel: [<ffffffff81432bd3>] ip_local_out+0x20/0x24 Jun 23 00:42:49 gsrv kernel: [<ffffffff8143345e>] ip_queue_xmit+0x2e5/0x35e Jun 23 00:42:49 gsrv kernel: [<ffffffff8107f99b>] ? free_hot_cold_page+0x1aa/0x22e Jun 23 00:42:49 gsrv kernel: [<ffffffff8107fa67>] ? free_hot_page+0xb/0xd Jun 23 00:42:49 gsrv kernel: [<ffffffff81444e17>] tcp_transmit_skb+0x635/0x674 Jun 23 00:42:49 gsrv kernel: [<ffffffff814473ef>] tcp_write_xmit+0x83f/0x924 Jun 23 00:42:49 gsrv kernel: [<ffffffff814e5bd1>] ? _spin_lock+0x16/0x2e Jun 23 00:42:49 gsrv kernel: [<ffffffff8144752f>] __tcp_push_pending_frames+0x2a/0x81 Jun 23 00:42:49 gsrv kernel: [<ffffffff814427ff>] tcp_rcv_established+0x10b/0xacd Jun 23 00:42:49 gsrv kernel: [<ffffffff81469d37>] ? ipv4_confirm+0x161/0x179 Jun 23 00:42:50 gsrv kernel: [<ffffffff81449a0b>] tcp_v4_do_rcv+0x31/0x1d7 Jun 23 00:42:50 gsrv kernel: [<ffffffff81436186>] ? __inet_lookup_established+0x1e1/0x263 Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d555>] ? local_bh_enable+0x82/0x9b Jun 23 00:42:50 gsrv kernel: [<ffffffff8144a0b4>] tcp_v4_rcv+0x503/0x79a Jun 23 00:42:50 gsrv kernel: [<ffffffff814140b5>] ? nf_hook_slow+0xcc/0xf4 Jun 23 00:42:50 gsrv kernel: [<ffffffff8142ede0>] ? ip_local_deliver_finish+0x0/0x23b Jun 23 00:42:50 gsrv kernel: [<ffffffff8142ef1e>] ip_local_deliver_finish+0x13e/0x23b Jun 23 00:42:50 gsrv kernel: [<ffffffff8142f08d>] ip_local_deliver+0x72/0x7a Jun 23 00:42:50 gsrv kernel: [<ffffffff8142eadc>] ip_rcv_finish+0x37c/0x396 Jun 23 00:42:50 gsrv kernel: [<ffffffff8142eda9>] ip_rcv+0x2b3/0x2ea Jun 23 00:42:50 gsrv kernel: [<ffffffff813f57bb>] netif_receive_skb+0x4a1/0x4e7 Jun 23 00:42:50 gsrv kernel: [<ffffffff813f593f>] napi_skb_finish+0x2b/0x42 Jun 23 00:42:50 gsrv kernel: [<ffffffff813f5dca>] napi_gro_receive+0x2a/0x2f Jun 23 00:42:50 gsrv kernel: [<ffffffff813b2312>] tg3_poll+0x711/0x965 Jun 23 00:42:50 gsrv kernel: [<ffffffff8139c3b1>] ? ata_hsm_qc_complete+0xf1/0x114 Jun 23 00:42:50 gsrv kernel: [<ffffffff813f5ea8>] net_rx_action+0x74/0x145 Jun 23 00:42:50 gsrv kernel: [<ffffffff812feed1>] ? add_timer_randomness+0x129/0x14b Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d68d>] __do_softirq+0xa9/0x134 Jun 23 00:42:50 gsrv kernel: [<ffffffff812fefa0>] ? add_interrupt_randomness+0x24/0x28 Jun 23 00:42:50 gsrv kernel: [<ffffffff8100cb2c>] call_softirq+0x1c/0x28 Jun 23 00:42:50 gsrv kernel: [<ffffffff8100e583>] do_softirq+0x33/0x6b Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d388>] irq_exit+0x36/0x7e Jun 23 00:42:50 gsrv kernel: [<ffffffff8100dc88>] do_IRQ+0xa6/0xbd Jun 23 00:42:50 gsrv kernel: [<ffffffff8100c393>] ret_from_intr+0x0/0xa Jun 23 00:42:50 gsrv kernel: <EOI> [<ffffffff81012840>] ? default_idle+0x22/0x37 Jun 23 00:42:50 gsrv kernel: [<ffffffff81012b81>] ? c1e_idle+0xde/0xe5 Jun 23 00:42:50 gsrv kernel: [<ffffffff8104f13b>] ? atomic_notifier_call_chain+0xf/0x11 Jun 23 00:42:50 gsrv kernel: [<ffffffff8100ae00>] ? cpu_idle+0x52/0x9e Jun 23 00:42:50 gsrv kernel: [<ffffffff814e01ee>] ? start_secondary+0x19c/0x1a1 Jun 23 00:42:50 gsrv kernel: Mem-Info: Jun 23 00:42:50 gsrv kernel: Node 0 DMA per-cpu: Jun 23 00:42:50 gsrv kernel: CPU 0: hi: 0, btch: 1 usd: 0 Jun 23 00:42:50 gsrv kernel: CPU 1: hi: 0, btch: 1 usd: 0 Jun 23 00:42:50 gsrv kernel: CPU 2: hi: 0, btch: 1 usd: 0 Jun 23 00:42:50 gsrv kernel: CPU 3: hi: 0, btch: 1 usd: 0 Jun 23 00:42:50 gsrv kernel: Node 0 DMA32 per-cpu: Jun 23 00:42:50 gsrv kernel: CPU 0: hi: 186, btch: 31 usd: 157 Jun 23 00:42:50 gsrv kernel: CPU 1: hi: 186, btch: 31 usd: 182 Jun 23 00:42:50 gsrv kernel: CPU 2: hi: 186, btch: 31 usd: 42 Jun 23 00:42:50 gsrv kernel: CPU 3: hi: 186, btch: 31 usd: 183 Jun 23 00:42:50 gsrv kernel: Node 1 DMA32 per-cpu: Jun 23 00:42:50 gsrv kernel: CPU 0: hi: 186, btch: 31 usd: 42 Jun 23 00:42:50 gsrv kernel: CPU 1: hi: 186, btch: 31 usd: 172 Jun 23 00:42:50 gsrv kernel: CPU 2: hi: 186, btch: 31 usd: 77 Jun 23 00:42:50 gsrv kernel: CPU 3: hi: 186, btch: 31 usd: 143 Jun 23 00:42:50 gsrv kernel: active_anon:9680 inactive_anon:20728 isolated_anon:0 Jun 23 00:42:50 gsrv kernel: active_file:23072 inactive_file:22755 isolated_file:0 Jun 23 00:42:50 gsrv kernel: unevictable:0 dirty:121 writeback:518 unstable:0 Jun 23 00:42:50 gsrv kernel: free:62747 slab_reclaimable:4175 slab_unreclaimable:5758 Jun 23 00:42:50 gsrv kernel: mapped:353410 shmem:12 pagetables:3102 bounce:0 Jun 23 00:42:50 gsrv kernel: Node 0 DMA free:4016kB min:40kB low:48kB high:60kB active_anon:4kB inactive_anon:180kB active_file:7108kB inactive_file:4256kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:352kB shmem:0kB slab_reclaimable:68kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 994 994 994 Jun 23 00:42:50 gsrv kernel: Node 0 DMA32 free:49704kB min:2828kB low:3532kB high:4240kB active_anon:38636kB inactive_anon:82280kB active_file:85028kB inactive_file:85664kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:484kB writeback:2072kB mapped:632228kB shmem:48kB slab_reclaimable:8860kB slab_unreclaimable:12720kB kernel_stack:1816kB pagetables:4888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 0 0 0 Jun 23 00:42:50 gsrv kernel: Node 1 DMA32 free:197268kB min:2872kB low:3588kB high:4308kB active_anon:80kB inactive_anon:452kB active_file:152kB inactive_file:1100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1033048kB mlocked:0kB dirty:0kB writeback:0kB mapped:781060kB shmem:0kB slab_reclaimable:7772kB slab_unreclaimable:10300kB kernel_stack:1624kB pagetables:7520kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 0 0 0 Jun 23 00:42:50 gsrv kernel: Node 0 DMA: 10*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4016kB Jun 23 00:42:50 gsrv kernel: Node 0 DMA32: 7030*4kB 2352*8kB 151*16kB 5*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49704kB Jun 23 00:42:50 gsrv kernel: Node 1 DMA32: 33319*4kB 7925*8kB 35*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 197268kB Jun 23 00:42:50 gsrv kernel: 55491 total pagecache pages Jun 23 00:42:50 gsrv kernel: 9644 pages in swap cache Jun 23 00:42:50 gsrv kernel: Swap cache stats: add 1332611, delete 1322967, find 1158489/1284463 Jun 23 00:42:50 gsrv kernel: Free swap = 4012816kB Jun 23 00:42:50 gsrv kernel: Total swap = 4200988kB Jun 23 00:42:50 gsrv kernel: 523986 pages RAM Jun 23 00:42:50 gsrv kernel: 360536 pages reserved Jun 23 00:42:50 gsrv kernel: 47358 pages shared Jun 23 00:42:50 gsrv kernel: 75580 pages non-shared
I updated the vbox bug report: http://www.virtualbox.org/ticket/6997 but apart from that, I'm afraid, that there are not many things I can do.