Bug 320677

Summary:	page allocation failure (swapper: page allocation failure. order:4, mode:0x20) sometimes sshd as well
Product:	Gentoo Linux	Reporter:	Gabriel LePage <g>
Component:	[OLD] Core system	Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status:	RESOLVED UPSTREAM
Severity:	normal	CC:	g
Priority:	High
Version:	unspecified
Hardware:	AMD64
OS:	Linux
URL:	http://www.virtualbox.org/ticket/6997
Whiteboard:
Package list:		Runtime testing required:	---

Description Gabriel LePage 2010-05-20 04:49:16 UTC

Multiple page allocation failure errors
I upgraded the kernel from 2.6.31-r6 to kernel 2.6.32-r7 and this seems to have diminished the number of page allocation failures

The vast majority of the time they are in the form (swapper: page allocation failure. order:4, mode:0x20)
occasionally from sshd

When I was using kernel 2.6.31-r6 they never occured (machine has been running this install for 1 year) until I did an emerge --sync about a week ago and then emerge --update --deep --newuse world

After that I started to get page allocation failures. There were so many packagees updated in that process that I will never know what was upgraded that cause the issue. (Before package upgrade, this the machine ran 67 days without a single error or issue since the last reboot, and no errors since the gentoo install)

I thought that upgrading the kernel to the most recent version (at the time) to kernel 2.6.32-r7 would alleviate the issue. The page allocation failures happen less frequently but still occur.

At this time this does not appear to impact my machine/server in any negative way. But it is important to me to have everything running tip top so to speak.

I have spent considerable time searching, to no avail.....



Reproducible: Always

Steps to Reproduce:
Not sure I can purposely initiate the error.
Actual Results:  
swapper: page allocation failure. order:4, mode:0x20
Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1
Call Trace:
 <IRQ>  [<ffffffff8107f082>] 0xffffffff8107f082
 [<ffffffff810a29a4>] 0xffffffff810a29a4
 [<ffffffff810a2b9f>] 0xffffffff810a2b9f
 [<ffffffff810a2d35>] 0xffffffff810a2d35
 [<ffffffff810a2de3>] 0xffffffff810a2de3
 [<ffffffff810a2e4e>] 0xffffffff810a2e4e
 [<ffffffff813ed961>] 0xffffffff813ed961
 [<ffffffff813ee21b>] 0xffffffff813ee21b
 [<ffffffffa07d8499>] 0xffffffffa07d8499
 [<ffffffff813f4a3d>] 0xffffffff813f4a3d
 [<ffffffff81406417>] 0xffffffff81406417
 [<ffffffff813f4f57>] 0xffffffff813f4f57
 [<ffffffff8143217d>] 0xffffffff8143217d
 [<ffffffff81432294>] 0xffffffff81432294
 [<ffffffff8143122f>] 0xffffffff8143122f
 [<ffffffff81431aba>] 0xffffffff81431aba
 [<ffffffff8107e00e>] ? 0xffffffff8107e00e
 [<ffffffff81443473>] 0xffffffff81443473
 [<ffffffff81445a4b>] 0xffffffff81445a4b
 [<ffffffff813ec99d>] ? 0xffffffff813ec99d
 [<ffffffff81445b8b>] 0xffffffff81445b8b
 [<ffffffff8144162a>] 0xffffffff8144162a
 [<ffffffff81468393>] ? 0xffffffff81468393
 [<ffffffff81448067>] 0xffffffff81448067
 [<ffffffff8103d570>] ? 0xffffffff8103d570
 [<ffffffff81448710>] 0xffffffff81448710
 [<ffffffff81412728>] ? 0xffffffff81412728
 [<ffffffff8142d43c>] ? 0xffffffff8142d43c
 [<ffffffff8142d57a>] 0xffffffff8142d57a
 [<ffffffff8142d6e9>] 0xffffffff8142d6e9
 [<ffffffff8142d138>] 0xffffffff8142d138
 [<ffffffff8142d405>] 0xffffffff8142d405
 [<ffffffffa07d84e5>] ? 0xffffffffa07d84e5
 [<ffffffff813f3e17>] 0xffffffff813f3e17
 [<ffffffff813f3f9b>] 0xffffffff813f3f9b
 [<ffffffff813f4426>] 0xffffffff813f4426
 [<ffffffff813b096e>] 0xffffffff813b096e
 [<ffffffff813f4504>] 0xffffffff813f4504
 [<ffffffff812fd555>] ? 0xffffffff812fd555
 [<ffffffff8103d691>] 0xffffffff8103d691
 [<ffffffff812fd624>] ? 0xffffffff812fd624
 [<ffffffff8100cb2c>] 0xffffffff8100cb2c
 [<ffffffff8100e583>] 0xffffffff8100e583
 [<ffffffff8103d38c>] 0xffffffff8103d38c
 [<ffffffff8100dc88>] 0xffffffff8100dc88
 [<ffffffff8100c393>] 0xffffffff8100c393
 <EOI>  [<ffffffff81012828>] ? 0xffffffff81012828
 [<ffffffff81012b69>] ? 0xffffffff81012b69
 [<ffffffff8104f12f>] ? 0xffffffff8104f12f
 [<ffffffff8100ae00>] ? 0xffffffff8100ae00
 [<ffffffff814de83f>] ? 0xffffffff814de83f
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 163
CPU    1: hi:  186, btch:  31 usd:   4
CPU    2: hi:  186, btch:  31 usd:  29
CPU    3: hi:  186, btch:  31 usd:   2
Node 1 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:   3
CPU    1: hi:  186, btch:  31 usd:  22
CPU    2: hi:  186, btch:  31 usd:   2
CPU    3: hi:  186, btch:  31 usd:  59
active_anon:49380 inactive_anon:93037 isolated_anon:27
 active_file:23029 inactive_file:64869 isolated_file:0
 unevictable:231748 dirty:33 writeback:0 unstable:0
 free:7864 slab_reclaimable:5126 slab_unreclaimable:6956
 mapped:25610 shmem:19 pagetables:2623 bounce:0
Node 0 DMA free:4020kB min:40kB low:48kB high:60kB active_anon:1172kB inactive_anon:1756kB active_file:3716kB inactive_file:4308kB unevictable:776kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:776kB dirty:0kB writeback:0kB mapped:80kB shmem:0kB slab_reclaimable:92kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 994 994 994
Node 0 DMA32 free:12212kB min:2828kB low:3532kB high:4240kB active_anon:108424kB inactive_anon:154224kB active_file:40036kB inactive_file:75924kB unevictable:546228kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:546228kB dirty:36kB writeback:0kB mapped:38020kB shmem:16kB slab_reclaimable:8480kB slab_unreclaimable:15628kB kernel_stack:2312kB pagetables:4552kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:35 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 1 DMA32 free:15224kB min:2872kB low:3588kB high:4308kB active_anon:87924kB inactive_anon:216168kB active_file:48364kB inactive_file:179244kB unevictable:379988kB isolated(anon):108kB isolated(file):0kB present:1033048kB mlocked:379988kB dirty:96kB writeback:0kB mapped:64340kB shmem:60kB slab_reclaimable:11932kB slab_unreclaimable:12088kB kernel_stack:784kB pagetables:5936kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:43 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 3*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4020kB
Node 0 DMA32: 1917*4kB 412*8kB 72*16kB 3*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12212kB
Node 1 DMA32: 2138*4kB 658*8kB 88*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15224kB
93483 total pagecache pages
5603 pages in swap cache
Swap cache stats: add 308963, delete 303359, find 276600/282629
Free swap  = 4016516kB
Total swap = 4200988kB
523986 pages RAM
31808 pages reserved
352863 pages shared
143308 pages non-shared


I can include additional examples if needed....

Expected Results:  
No page allocation failures

Linux 2.6.32-r7

make.conf

CFLAGS="-march=k8 -O2 -pipe"
CXXFLAGS="${CFLAGS}"
# WARNING: Changing your CHOST is not something that should be done lightly.
# Please consult http://www.gentoo.org/doc/en/change-chost.xml before changing.
CHOST="x86_64-pc-linux-gnu"
# These are the USE flags that were used in addition to what is provided by the
# profile used for building.
USE="policykit sqlite gnome-keyring consolekit crypt ctype imap libwww maildir sasl cairo xulrunner apache2 mysql php pam pcre session multilib hal dbus java png svg server jpeg jpeg2k java png truetype X opengl gtk gnome ipv6 unicode xorg dri nptl xml xml2 smp sockets ssl samba mmx sse sse2 -cups"
MAKEOPTS="-j5"
ACCEPT_LICENSE="*"

GENTOO_MIRRORS="ftp://mirror.mcs.anl.gov/pub/gentoo/ "

SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
INPUT_DEVICE="evdev"
VIDEO_CARDS="vesa"

No known or observed hardware errors/failures

The machine has:
2 AMD dual core Opteron 2212
2 GiB Ram

lspci (can make more verbose if needed)

00:01.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge
00:02.0 Host bridge: Broadcom BCM5785 [HT1000] Legacy South Bridge
00:02.2 ISA bridge: Broadcom BCM5785 [HT1000] LPC
00:03.0 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.1 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:03.2 USB Controller: Broadcom BCM5785 [HT1000] USB (rev 01)
00:04.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
00:07.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:08.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:09.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:0a.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:0b.0 PCI bridge: Broadcom HT2100 PCI-Express Bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 21)
03:0d.0 PCI bridge: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge (rev c0)
03:0e.0 IDE interface: Broadcom BCM5785 [HT1000] SATA (PATA/IDE Mode)

Using virtualbox I host three vm's on this box, two ubuntu 64 bit server virtual machine and one Vista Business 32 bit virtual machine. I use Virtualbox 3.1.8 previously I used 3.0.12 Host created tap driver  for two vm's sharing a static ip and the third vm with its own static ip.

The machine also a VPN server using OpenVPN.

The vm's use a total of 1320 Ram the rest about 688 is left for the host.
The machine rarely uses more than 100MB swap space, my swap size is 4 GiB

Software/services running at all times:

Virtualbox (headless)
Postfix
Bind
Dovecot
OpenSSH (sshd) key file access only
Xvnc 
OpenVPN
syslog-ng
vixie-cron
dbus
hald
webmin (sometimes, not really a fan)
mysql
dkim-filter
apache (apache2)


emerge --info

Portage 2.1.8.3 (default/linux/amd64/10.0, gcc-4.3.4, glibc-2.10.1-r1, 2.6.32-gentoo-r7 x86_64)
=================================================================
System uname: Linux-2.6.32-gentoo-r7-x86_64-Dual-Core_AMD_Opteron-tm-_Processor_2212-with-gentoo-1.12.13
Timestamp of tree: Thu, 20 May 2010 03:45:01 +0000
app-shells/bash:     4.0_p37
dev-java/java-config: 2.1.10
dev-lang/python:     2.5.4-r2, 2.6.4-r1
dev-python/pycrypto: 2.1.0_beta1
dev-util/cmake:      2.6.4-r3
sys-apps/baselayout: 1.12.13
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.13, 2.63-r1
sys-devel/automake:  1.4_p6, 1.5, 1.8.5-r3, 1.9.6-r2, 1.10.3, 1.11.1
sys-devel/binutils:  2.18-r3
sys-devel/gcc:       4.3.4
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6b
virtual/os-headers:  2.6.30-r1
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb /var/bind /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://mirror.mcs.anl.gov/pub/gentoo/ "
LDFLAGS="-Wl,-O1"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="X acl amd64 apache2 berkdb bzip2 cairo cli consolekit cracklib crypt ctype cxx dbus dri fortran gdbm gnome gnome-keyring gpm gtk hal iconv imap ipv6 java jpeg jpeg2k libwww maildir mmx modules mudflap multilib mysql ncurses nls nptl nptlonly opengl openmp pam pcre perl php png policykit pppd python readline reflection samba sasl server session smp sockets spl sqlite sse sse2 ssl svg sysfs tcpd truetype unicode xml xml2 xorg xulrunner zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="vesa" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

Comment 1 Gabriel LePage 2010-05-20 05:12:32 UTC

all occurences since Monday 05/17/2010 (last reboot)
just including first line (can include the full log if needed)

swapper: page allocation failure. order:4, mode:0x20
swapper: page allocation failure. order:4, mode:0x20
sshd: page allocation failure. order:3, mode:0x20
swapper: page allocation failure. order:4, mode:0x20
sshd: page allocation failure. order:3, mode:0x20
VBoxHeadless: page allocation failure. order:3, mode:0x20

Comment 2 George Kadianakis (RETIRED) gentoo-dev

2010-06-09 03:49:57 UTC

Let's have a look at the output of your "free -l" and the contents of your /proc/meminfo.

Comment 3 Gabriel LePage 2010-06-09 04:02:52 UTC

(In reply to comment #2)
> Let's have a look at the output of your "free -l" and the contents of your
> /proc/meminfo.
> 

First thanks for you reply, some time has gone by.
Nothing new to report, sometimes I get one page allocation failure a day and sometimes many more.

free -l

             total       used       free     shared    buffers     cached
Mem:       2059028    1931972     127056          0      10152     330576
Low:       2059028    1931972     127056
High:            0          0          0
-/+ buffers/cache:    1591244     467784
Swap:      4200988     233028    3967960


/proc/meminfo

MemTotal:        2059028 kB
MemFree:          127772 kB
Buffers:           10208 kB
Cached:           330740 kB
SwapCached:        52884 kB
Active:           203444 kB
Inactive:         238992 kB
Active(anon):      14528 kB
Inactive(anon):    87068 kB
Active(file):     188916 kB
Inactive(file):   151924 kB
Unevictable:     1313680 kB
Mlocked:         1313680 kB
SwapTotal:       4200988 kB
SwapFree:        3968000 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:       1371252 kB
Mapped:           103028 kB
Shmem:                76 kB
Slab:              44208 kB
SReclaimable:      18940 kB
SUnreclaim:        25268 kB
KernelStack:        3016 kB
PageTables:        10064 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5230500 kB
Committed_AS:    2487752 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      102756 kB
VmallocChunk:   34358560396 kB
DirectMap4k:        2888 kB
DirectMap2M:     2093056 kB

Thank you for your assistance

Comment 4 George Kadianakis (RETIRED) gentoo-dev

2010-06-09 13:21:47 UTC

It may sound a bit silly, but can you check if your NIC is in jumbo frames mode (MTU > 1500)?
You can see that by running `ifconfig -a` and checking the options of your NIC.

Comment 5 Gabriel LePage 2010-06-09 15:29:05 UTC

MTU is 1500 NIC's are not in jumbo frame mode

eth0      Link encap:Ethernet  HWaddr OBFUSCATED
          inet addr:OBFUSCATED  Bcast:OBFUSCATED  Mask:255.255.224.0
          inet6 addr: OBFUSCATED Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:128134569 errors:0 dropped:0 overruns:0 frame:0
          TX packets:54245009 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:20527700983 (19.1 GiB)  TX bytes:9734678389 (9.0 GiB)
          Interrupt:33 

eth0:0    Link encap:Ethernet  HWaddr OBFUSCATED
          inet addr:OBFUSCATED  Bcast:OBFUSCATED  Mask:255.255.224.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:33 
tapl2w    Link encap:Ethernet  HWaddr OBFUSCATED
          inet addr:OBFUSCATED  Bcast:OBFUSCATED  Mask:255.255.255.0
          inet6 addr: OBFUSCATED Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:OBFUSCATED  P-t-P:OBFUSCATED  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:11403856 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12293875 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:598856521 (571.1 MiB)  TX bytes:3588866282 (3.3 GiB)

Comment 6 George Kadianakis (RETIRED) gentoo-dev

2010-06-09 15:44:45 UTC

Alright, could you pass that calltrace (or even any other calltrace that you have saved) through call2sym so that we can get something meaningful out of it?

Comment 7 Gabriel LePage 2010-06-09 16:22:05 UTC

Interestingly enough, call2sym (which I had to search for) had no output. I suspect this is because System.map is not matching up with the call trace, why this is I have no idea at the moment. Perhaps you would have some insight into this.
Would a recompile of the kernel and then a reboot help?

I included the System.map at http://gsxm.net/sm on my system it was located at /boot/System.map-genkernel-x86_64-2.6.32-gentoo-r7

I included some call trace's at http://gsxm.net/ct

My kernel is still a genkernel, but I have customized the kernel to compile in all the drivers that I know my system needs as well as all the networking components it needs, the system would probably run without using genkernel kernel and initramfs, this is actually the last genkernel I was going to use before completely compiling my own kernel.

However this page allocation failure issue is making me hesitant to do so as you might imagine.

Thanks for you help, how to proceed next?

Comment 8 George Kadianakis (RETIRED) gentoo-dev

2010-06-09 20:04:52 UTC

Even though I tried to find a solution to avoid making you reboot the machine, I think that in the end you will have to do it.

Just enable CONFIG_KALLSYMS in your kernel config and the next time you reboot you will get meaningful call traces.

Unfortunately, I'm totally clueless regarding genkernel so I can't help you with the procedure of enabling CONFIG_KALLSYMS. 
Shouldn't be too hard though.

Comment 9 Gabriel LePage 2010-06-09 21:34:05 UTC

I will compile in CONFIG_KALLSYMS, it is easy to modify the kernel using genkernel, and reboot tonight.

I used genkernel to get the system deployed, while I learned the Linux kernel. I now have a highly customized kernel that is almost ready. Of course I want to make sure the kernel I compile has the necessary drivers and is stable. Rebooting frequently is not a luxury I have.

The goal with t he next kernel release is to not use genkernel. This kind of hinges on solving this issue to some degree.

Thanks for your help, I will post when I have recompiled the kernel and have rebooted.

Comment 10 Gabriel LePage 2010-06-10 04:58:14 UTC

Kernel recompiled (with CONFIG_KALLSYMS) and the system is rebooted.

In case its needed,

Kernel Config
http://gsxm.net/cc

System.map
http://gsxm.net/sm

I will post the first new page allocation that occurs.

Comment 11 Gabriel LePage 2010-06-13 05:38:19 UTC

update....
since the recompile of the kernel and the reboot there have been no page allocation failures so far. On occasion it has taken a few days for the first page allocation failure to appear. Typically the rate picks up after the initial occurence.

I also updated a few packages before the reboot. If you do not mind, leave this bug open for a few more days, to see if another page allocation failure is generated.

Thank-you for your help so far...

Comment 12 Gabriel LePage 2010-06-14 04:47:00 UTC

It took three days to start, but here are the page allocation errors. We have meaningful call traces with CONFIG_KALLSYMS compiled in the kernel.
I included only part of the calltrace here.
The rest can be seen at http://gsxm.net/ct
HELP!!!

Below is the first line of each page allocation failure that can be viewed in full at http://gsxm.net/ct

Jun 14 00:02:02 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20
Jun 14 00:02:08 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20
Jun 14 00:02:08 gsrv kernel: imap-login: page allocation failure. order:5, mode:0x20
Jun 14 00:02:08 gsrv kernel: VBoxHeadless: page allocation failure. order:5, mode:0x20
Jun 14 00:02:08 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20

Comment 13 George Kadianakis (RETIRED) gentoo-dev

2010-06-14 13:00:01 UTC

Alright, it seems like all the call traces lead to vboxNetFltOsInitInstance() before spitting the error.

I reported the issue in the VirtualBox bugzilla here: 
http://www.virtualbox.org/ticket/6997

By the way, if you are not content with http://gsxm.net/ct appearing in the vbox report, just remove the file and we will find another way of linking to the call trace.

Comment 14 Gabriel LePage 2010-06-14 15:02:04 UTC

Once again thanks!
During the time frame I have been getting page allocation failures, I have used these Virtual Box versions, including 3.0.12, 3.1.8, 3.2.2, as well as several others for short periods of time.

I am fine with http://gsxm.net/ct appearing in the report. Perhaps at some point in the future for the long term archival of this bug it could be included as attachment, but that is up to you guys. 

There were issues getting the gentoo bugzilla attachment capability to work, that is why I decided to host the attachment.

You seem pretty confident that VirtualBox is causing the issue, I wonder how many others are having similar issues.

Please let me know if there is any more information you need.

Comment 15 George Kadianakis (RETIRED) gentoo-dev

2010-06-14 20:42:09 UTC

Yes, I'm indeed a bit overconfident on the fact that virtualbox is to blame, but it's the common starting point in all call traces.

I also found another vbox bug report that seems of interest [1], can you try to disable GSO on your NIC, like this:  ethtool -K <interface> gso off ?

[1]: http://www.virtualbox.org/ticket/5260

Comment 16 Gabriel LePage 2010-06-14 22:44:00 UTC

I will give the 'ethtool -K <interface> gso off' a shot. 

My set up is as follows, info in case its helpful,

Three virtual machines, all run headless, running on a Gentoo 64 bit host, 2 GiB ram (will be expanding soon enough) AMD-V enabled in the bios for virtualization support for VirtualBox. CPU is AMD Opteron 2212 (dual core) x2 cpu's for a total of four cores. This Gentoo install is over a year old, it was the first Gentoo system I built. I have upgraded the kernel several times.

I have used both the OSE and the PUEL versions of VirtualBox, currently PUEL.

One Ubuntu 64 bit (384MB ram, 1CPU, INTEL PRO/1000 MT Desktop) guest is bridged via VirtualBox to the host ethernet adapter (Broadcom NIC, tg3 driver) and guest has its own wan static ip.

Two other guests, Vista Business 32 (512MB, 2CPU, PCnet-FAST III) bit and another Ubuntu 64 bit (384MB ram, 1CPU, PCnet-FAST III) are bridged via VirtualBox to a TAP adapter created on the host. These two machines share a wan static ip, via NAT/SNAT running on the host using iptables, so basically I set up a small virtual lan for these two guests.

As a side note I have experimented with all of the VirtualBox guest network adapters, it seems to make no difference which one is used, with regards to the page allocation errors.

This is basically a production machine (currently testing a hosting concept, but needs high availability) The host has its own wan static ip as well.

Interestingly enough, my machine defaulted to the tg3 driver, I noticed at this link http://www.virtualbox.org/ticket/5260 (that you sent) that someone discusses your suggestion specifically for the tg3.

If this fix works how 'good' of a fix is this? and what are the ramifications? I will research this as well.

Was not my intent to insinuate you were overconfident ;)

Thanks for your help!

Comment 17 George Kadianakis (RETIRED) gentoo-dev

2010-06-14 22:50:02 UTC

(In reply to comment #16)
> I will give the 'ethtool -K <interface> gso off' a shot. 
> 
> My set up is as follows, info in case its helpful,
> 
> Three virtual machines, all run headless, running on a Gentoo 64 bit host, 2
> GiB ram (will be expanding soon enough) AMD-V enabled in the bios for
> virtualization support for VirtualBox. CPU is AMD Opteron 2212 (dual core) x2
> cpu's for a total of four cores. This Gentoo install is over a year old, it was
> the first Gentoo system I built. I have upgraded the kernel several times.
> 
> I have used both the OSE and the PUEL versions of VirtualBox, currently PUEL.
> 
> One Ubuntu 64 bit (384MB ram, 1CPU, INTEL PRO/1000 MT Desktop) guest is bridged
> via VirtualBox to the host ethernet adapter (Broadcom NIC, tg3 driver) and
> guest has its own wan static ip.
> 
> Two other guests, Vista Business 32 (512MB, 2CPU, PCnet-FAST III) bit and
> another Ubuntu 64 bit (384MB ram, 1CPU, PCnet-FAST III) are bridged via
> VirtualBox to a TAP adapter created on the host. These two machines share a wan
> static ip, via NAT/SNAT running on the host using iptables, so basically I set
> up a small virtual lan for these two guests.
> 
> As a side note I have experimented with all of the VirtualBox guest network
> adapters, it seems to make no difference which one is used, with regards to the
> page allocation errors.
> 
> This is basically a production machine (currently testing a hosting concept,
> but needs high availability) The host has its own wan static ip as well.
> 
> Interestingly enough, my machine defaulted to the tg3 driver, I noticed at this
> link http://www.virtualbox.org/ticket/5260 (that you sent) that someone
> discusses your suggestion specifically for the tg3.
> 
> If this fix works how 'good' of a fix is this? and what are the ramifications?
> I will research this as well.
> 
> Was not my intent to insinuate you were overconfident ;)
> 
> Thanks for your help!
> 

If that's, indeed, your problem it's simply a matter of a memory leak when GSO is enabled.
There seems to be a patch to plug the leak in that same vbox bug report [1], which means that if that's indeed your problem by patching vbox you will be able to use GSO without page allocation failures.
tl;dr: it's a clean solution, not a dirty hack.

[1]: http://www.virtualbox.org/attachment/ticket/5260/diff_vboxnetflt_linux

Comment 18 Gabriel LePage 2010-06-23 07:57:45 UTC

Ok new page allocation failure, remember that gso has been off.

Jun 23 00:42:44 gsrv kernel: swapper: page allocation failure. order:5, mode:0x20
Jun 23 00:42:44 gsrv kernel: Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1
Jun 23 00:42:44 gsrv kernel: Call Trace:
Jun 23 00:42:44 gsrv kernel: <IRQ>  [<ffffffff81080a26>] __alloc_pages_nodemask+0x5ad/0x5f7
Jun 23 00:42:44 gsrv kernel: [<ffffffff81034676>] ? select_task_rq_fair+0x6e9/0x748
Jun 23 00:42:44 gsrv kernel: [<ffffffff810a43b0>] kmem_getpages+0x53/0x119
Jun 23 00:42:44 gsrv kernel: [<ffffffff810a45ab>] fallback_alloc+0x135/0x1ab
Jun 23 00:42:44 gsrv kernel: [<ffffffff810a4741>] ____cache_alloc_node+0x120/0x135
Jun 23 00:42:44 gsrv kernel: [<ffffffff810a47ef>] kmem_cache_alloc_node+0x99/0xc1
Jun 23 00:42:44 gsrv kernel: [<ffffffff810a485a>] __kmalloc_node+0x43/0x45
Jun 23 00:42:49 gsrv kernel: [<ffffffff813ef305>] __alloc_skb+0x6b/0x164
Jun 23 00:42:49 gsrv kernel: [<ffffffff813efbbf>] skb_copy+0x30/0x97
Jun 23 00:42:49 gsrv kernel: [<ffffffffa09803d9>] vboxNetFltOsInitInstance+0x35d/0xa00 [vboxnetflt]
Jun 23 00:42:49 gsrv kernel: [<ffffffff813ea3e0>] ? sock_def_readable+0x6a/0x6f
Jun 23 00:42:49 gsrv kernel: [<ffffffff813ec821>] ? __skb_clone+0x29/0xf1
Jun 23 00:42:49 gsrv kernel: [<ffffffff813f63e1>] dev_hard_start_xmit+0x152/0x2fa
Jun 23 00:42:49 gsrv kernel: [<ffffffff81407dbb>] sch_direct_xmit+0x5e/0x160
Jun 23 00:42:49 gsrv kernel: [<ffffffff813f68fb>] dev_queue_xmit+0x25f/0x3ca
Jun 23 00:42:49 gsrv kernel: [<ffffffff81433b21>] ip_finish_output+0x27f/0x2c5
Jun 23 00:42:49 gsrv kernel: [<ffffffff81433c38>] ip_output+0xd1/0xde
Jun 23 00:42:49 gsrv kernel: [<ffffffff81432bd3>] ip_local_out+0x20/0x24
Jun 23 00:42:49 gsrv kernel: [<ffffffff8143345e>] ip_queue_xmit+0x2e5/0x35e
Jun 23 00:42:49 gsrv kernel: [<ffffffff8107f99b>] ? free_hot_cold_page+0x1aa/0x22e
Jun 23 00:42:49 gsrv kernel: [<ffffffff8107fa67>] ? free_hot_page+0xb/0xd
Jun 23 00:42:49 gsrv kernel: [<ffffffff81444e17>] tcp_transmit_skb+0x635/0x674
Jun 23 00:42:49 gsrv kernel: [<ffffffff814473ef>] tcp_write_xmit+0x83f/0x924
Jun 23 00:42:49 gsrv kernel: [<ffffffff814e5bd1>] ? _spin_lock+0x16/0x2e
Jun 23 00:42:49 gsrv kernel: [<ffffffff8144752f>] __tcp_push_pending_frames+0x2a/0x81
Jun 23 00:42:49 gsrv kernel: [<ffffffff814427ff>] tcp_rcv_established+0x10b/0xacd
Jun 23 00:42:49 gsrv kernel: [<ffffffff81469d37>] ? ipv4_confirm+0x161/0x179
Jun 23 00:42:50 gsrv kernel: [<ffffffff81449a0b>] tcp_v4_do_rcv+0x31/0x1d7
Jun 23 00:42:50 gsrv kernel: [<ffffffff81436186>] ? __inet_lookup_established+0x1e1/0x263
Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d555>] ? local_bh_enable+0x82/0x9b
Jun 23 00:42:50 gsrv kernel: [<ffffffff8144a0b4>] tcp_v4_rcv+0x503/0x79a
Jun 23 00:42:50 gsrv kernel: [<ffffffff814140b5>] ? nf_hook_slow+0xcc/0xf4
Jun 23 00:42:50 gsrv kernel: [<ffffffff8142ede0>] ? ip_local_deliver_finish+0x0/0x23b
Jun 23 00:42:50 gsrv kernel: [<ffffffff8142ef1e>] ip_local_deliver_finish+0x13e/0x23b
Jun 23 00:42:50 gsrv kernel: [<ffffffff8142f08d>] ip_local_deliver+0x72/0x7a
Jun 23 00:42:50 gsrv kernel: [<ffffffff8142eadc>] ip_rcv_finish+0x37c/0x396
Jun 23 00:42:50 gsrv kernel: [<ffffffff8142eda9>] ip_rcv+0x2b3/0x2ea
Jun 23 00:42:50 gsrv kernel: [<ffffffff813f57bb>] netif_receive_skb+0x4a1/0x4e7
Jun 23 00:42:50 gsrv kernel: [<ffffffff813f593f>] napi_skb_finish+0x2b/0x42
Jun 23 00:42:50 gsrv kernel: [<ffffffff813f5dca>] napi_gro_receive+0x2a/0x2f
Jun 23 00:42:50 gsrv kernel: [<ffffffff813b2312>] tg3_poll+0x711/0x965
Jun 23 00:42:50 gsrv kernel: [<ffffffff8139c3b1>] ? ata_hsm_qc_complete+0xf1/0x114
Jun 23 00:42:50 gsrv kernel: [<ffffffff813f5ea8>] net_rx_action+0x74/0x145
Jun 23 00:42:50 gsrv kernel: [<ffffffff812feed1>] ? add_timer_randomness+0x129/0x14b
Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d68d>] __do_softirq+0xa9/0x134
Jun 23 00:42:50 gsrv kernel: [<ffffffff812fefa0>] ? add_interrupt_randomness+0x24/0x28
Jun 23 00:42:50 gsrv kernel: [<ffffffff8100cb2c>] call_softirq+0x1c/0x28
Jun 23 00:42:50 gsrv kernel: [<ffffffff8100e583>] do_softirq+0x33/0x6b
Jun 23 00:42:50 gsrv kernel: [<ffffffff8103d388>] irq_exit+0x36/0x7e
Jun 23 00:42:50 gsrv kernel: [<ffffffff8100dc88>] do_IRQ+0xa6/0xbd
Jun 23 00:42:50 gsrv kernel: [<ffffffff8100c393>] ret_from_intr+0x0/0xa
Jun 23 00:42:50 gsrv kernel: <EOI>  [<ffffffff81012840>] ? default_idle+0x22/0x37
Jun 23 00:42:50 gsrv kernel: [<ffffffff81012b81>] ? c1e_idle+0xde/0xe5
Jun 23 00:42:50 gsrv kernel: [<ffffffff8104f13b>] ? atomic_notifier_call_chain+0xf/0x11
Jun 23 00:42:50 gsrv kernel: [<ffffffff8100ae00>] ? cpu_idle+0x52/0x9e
Jun 23 00:42:50 gsrv kernel: [<ffffffff814e01ee>] ? start_secondary+0x19c/0x1a1
Jun 23 00:42:50 gsrv kernel: Mem-Info:
Jun 23 00:42:50 gsrv kernel: Node 0 DMA per-cpu:
Jun 23 00:42:50 gsrv kernel: CPU    0: hi:    0, btch:   1 usd:   0
Jun 23 00:42:50 gsrv kernel: CPU    1: hi:    0, btch:   1 usd:   0
Jun 23 00:42:50 gsrv kernel: CPU    2: hi:    0, btch:   1 usd:   0
Jun 23 00:42:50 gsrv kernel: CPU    3: hi:    0, btch:   1 usd:   0
Jun 23 00:42:50 gsrv kernel: Node 0 DMA32 per-cpu:
Jun 23 00:42:50 gsrv kernel: CPU    0: hi:  186, btch:  31 usd: 157
Jun 23 00:42:50 gsrv kernel: CPU    1: hi:  186, btch:  31 usd: 182
Jun 23 00:42:50 gsrv kernel: CPU    2: hi:  186, btch:  31 usd:  42
Jun 23 00:42:50 gsrv kernel: CPU    3: hi:  186, btch:  31 usd: 183
Jun 23 00:42:50 gsrv kernel: Node 1 DMA32 per-cpu:
Jun 23 00:42:50 gsrv kernel: CPU    0: hi:  186, btch:  31 usd:  42
Jun 23 00:42:50 gsrv kernel: CPU    1: hi:  186, btch:  31 usd: 172
Jun 23 00:42:50 gsrv kernel: CPU    2: hi:  186, btch:  31 usd:  77
Jun 23 00:42:50 gsrv kernel: CPU    3: hi:  186, btch:  31 usd: 143
Jun 23 00:42:50 gsrv kernel: active_anon:9680 inactive_anon:20728 isolated_anon:0
Jun 23 00:42:50 gsrv kernel: active_file:23072 inactive_file:22755 isolated_file:0
Jun 23 00:42:50 gsrv kernel: unevictable:0 dirty:121 writeback:518 unstable:0
Jun 23 00:42:50 gsrv kernel: free:62747 slab_reclaimable:4175 slab_unreclaimable:5758
Jun 23 00:42:50 gsrv kernel: mapped:353410 shmem:12 pagetables:3102 bounce:0
Jun 23 00:42:50 gsrv kernel: Node 0 DMA free:4016kB min:40kB low:48kB high:60kB active_anon:4kB inactive_anon:180kB active_file:7108kB inactive_file:4256kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:352kB shmem:0kB slab_reclaimable:68kB slab_unreclaimable:12kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 994 994 994
Jun 23 00:42:50 gsrv kernel: Node 0 DMA32 free:49704kB min:2828kB low:3532kB high:4240kB active_anon:38636kB inactive_anon:82280kB active_file:85028kB inactive_file:85664kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:484kB writeback:2072kB mapped:632228kB shmem:48kB slab_reclaimable:8860kB slab_unreclaimable:12720kB kernel_stack:1816kB pagetables:4888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 0 0 0
Jun 23 00:42:50 gsrv kernel: Node 1 DMA32 free:197268kB min:2872kB low:3588kB high:4308kB active_anon:80kB inactive_anon:452kB active_file:152kB inactive_file:1100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1033048kB mlocked:0kB dirty:0kB writeback:0kB mapped:781060kB shmem:0kB slab_reclaimable:7772kB slab_unreclaimable:10300kB kernel_stack:1624kB pagetables:7520kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 23 00:42:50 gsrv kernel: lowmem_reserve[]: 0 0 0 0
Jun 23 00:42:50 gsrv kernel: Node 0 DMA: 10*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4016kB
Jun 23 00:42:50 gsrv kernel: Node 0 DMA32: 7030*4kB 2352*8kB 151*16kB 5*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 49704kB
Jun 23 00:42:50 gsrv kernel: Node 1 DMA32: 33319*4kB 7925*8kB 35*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 197268kB
Jun 23 00:42:50 gsrv kernel: 55491 total pagecache pages
Jun 23 00:42:50 gsrv kernel: 9644 pages in swap cache
Jun 23 00:42:50 gsrv kernel: Swap cache stats: add 1332611, delete 1322967, find 1158489/1284463
Jun 23 00:42:50 gsrv kernel: Free swap  = 4012816kB
Jun 23 00:42:50 gsrv kernel: Total swap = 4200988kB
Jun 23 00:42:50 gsrv kernel: 523986 pages RAM
Jun 23 00:42:50 gsrv kernel: 360536 pages reserved
Jun 23 00:42:50 gsrv kernel: 47358 pages shared
Jun 23 00:42:50 gsrv kernel: 75580 pages non-shared

Comment 19 George Kadianakis (RETIRED) gentoo-dev

2010-06-23 08:10:38 UTC

I updated the vbox bug report:
http://www.virtualbox.org/ticket/6997
but apart from that, I'm afraid, that there are not many things I can do.