I have to apologize up front for the vagaries of this report, but I'll be happy to supply any info that might help clear things up. I'm running Gentoo on a P4-2.8Ghz machine with 512M RAM. The server is a for a church which offers downloadable sermons, streaming sermons (RealAudio via Helix Server) and e-mail service. Over the last week or two, I've noticed that my system has started running out of memory every 1-2 days. Since the downloads are rather popular. I had thought maybe Apache was the culprit. However even with the webserver shut down entirely for a day, the system still runs out of memory. Here are the symptoms. First, a screenshot of "top" output, sorted by MEM% (showing everything using 0.2% of MEM or more:) -------------------------------------------------------------- top - 10:39:39 up 11:25, 2 users, load average: 0.19, 0.22, 0.08 Tasks: 72 total, 1 running, 71 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% user, 0.2% system, 0.0% nice, 99.8% idle Mem: 514704k total, 491876k used, 22828k free, 68896k buffers Swap: 506036k total, 63748k used, 442288k free, 121444k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14342 amavis 9 0 30072 21m 7428 S 0.0 4.3 0:00.72 amavisd 14339 amavis 9 0 29944 21m 7464 S 0.0 4.3 0:00.62 amavisd 3902 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.03 hlxserverplus 3904 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.01 hlxserverplus 3905 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.03 hlxserverplus 3906 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.01 hlxserverplus 3907 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.01 hlxserverplus 3908 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.01 hlxserverplus 3909 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.00 hlxserverplus 3910 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.00 hlxserverplus 3911 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.01 hlxserverplus 3912 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.00 hlxserverplus 3913 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.00 hlxserverplus 3914 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.04 hlxserverplus 3915 root 9 0 46672 13m 3360 S 0.0 2.8 0:00.03 hlxserverplus 4465 xfs 9 0 4864 4836 904 S 0.0 0.9 0:00.12 xfs 3969 ntp 9 0 4292 4292 3108 S 0.0 0.8 0:00.38 ntpd 1781 amavis 9 0 27996 2912 2212 S 0.0 0.6 0:00.81 amavisd 3433 root 9 0 2760 1980 1308 S 0.0 0.4 0:00.10 cupsd 14763 david 9 0 1964 1908 1716 S 0.0 0.4 0:01.63 sshd 14764 david 9 0 1952 1896 1716 S 0.0 0.4 0:00.02 sshd 4215 root 8 0 2468 1880 1620 S 0.0 0.4 0:00.02 smbd 14759 root 9 0 1856 1800 1644 S 0.0 0.3 0:00.00 sshd 14761 root 9 0 1856 1800 1644 S 0.0 0.3 0:00.00 sshd 4217 root 9 0 1776 1600 1344 S 0.0 0.3 0:00.13 nmbd 4162 postfix 9 0 1576 1576 1256 S 0.0 0.3 0:00.22 qmgr 4111 root 9 0 1500 1500 1200 S 0.0 0.3 0:00.39 master 14979 postfix 9 0 1488 1488 1200 S 0.0 0.3 0:00.01 flush 13441 postfix 9 0 1472 1472 1192 S 0.0 0.3 0:00.01 pickup 14783 root 0 0 1304 1304 1064 S 0.0 0.3 0:00.01 bash 14778 root 9 0 1292 1292 1056 S 0.0 0.3 0:00.00 bash 14765 david 9 0 1280 1280 1056 S 0.0 0.2 0:00.01 bash 14768 david 9 0 1280 1280 1056 S 0.0 0.2 0:00.00 bash 3903 root 9 0 1296 1236 1204 S 0.0 0.2 0:00.00 hlxserverplus 3820 root 9 0 1268 1220 1184 S 0.0 0.2 0:00.01 hlxserverplus 3363 root 9 0 1244 1136 1084 S 0.0 0.2 0:00.01 sshd 4165 root 9 0 1144 1044 916 S 0.0 0.2 0:00.00 pure-ftpd 14781 root 15 0 1012 1012 800 R 0.3 0.2 0:02.22 top 14782 root 9 0 948 948 752 S 0.0 0.2 0:00.00 su 14777 root 9 0 944 944 752 S 0.0 0.2 0:00.01 su 1874 root 9 0 1092 940 940 S 0.0 0.2 0:00.01 mysqld_safe 3693 dhcp 9 0 1356 924 856 S 0.0 0.2 0:00.00 dhcpd 1982 mysql 9 0 3712 916 904 S 0.0 0.2 0:00.02 mysqld 2255 mysql 9 0 3712 916 904 S 0.0 0.2 0:00.00 mysqld 2256 mysql 9 0 3712 916 904 S 0.0 0.2 0:00.00 mysqld 2277 mysql 9 0 3712 916 904 S 0.0 0.2 0:00.00 mysqld 154 root 9 0 860 788 628 S 0.0 0.2 0:00.02 devfsd 1544 root 9 0 792 776 536 S 0.0 0.2 0:00.84 syslog-ng -------------------------------------------------------------- Now here's "free -kt": -------------------------------------------------------------- total used free shared buffers cached Mem: 514704 496336 18368 0 70068 122636 -/+ buffers/cache: 303632 211072 Swap: 506036 63748 442288 Total: 1020740 560084 460656 -------------------------------------------------------------- At this point, my system is nearly out of physical memory, although it looks like it still has swap space. Eventually that will run out too, though, and I'll start getting log entries like this in /var/log/messages: -------------------------------------------------------------- Apr 26 19:03:02 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:03:02 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:03:02 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:03:20 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:07:19 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:07:19 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:07:21 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:07:21 www VM: killing process apache2 Apr 26 19:07:59 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:08:54 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:08:55 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:08:55 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:01 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:01 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:01 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:01 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:04 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:09:05 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:09:06 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:06 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:15 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:19 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:19 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:21 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:09:21 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 19:09:21 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:09:21 www VM: killing process apache2 Apr 26 19:09:21 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:18:22 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:18:24 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:18:24 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:18:24 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:18:51 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:18:51 www VM: killing process amavisd Apr 26 20:18:51 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:01 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:28 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:41:28 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:41:34 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:50:16 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:50:17 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:50:17 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 20:50:17 www VM: killing process index.cgi Apr 26 20:50:17 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 20:50:17 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:20:28 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:20:30 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:20:31 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 21:20:35 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 21:20:36 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:20:38 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:20:38 www VM: killing process apache2 Apr 26 21:22:59 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Apr 26 21:23:03 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:23:03 www VM: killing process cron Apr 26 21:23:03 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 21:23:03 www VM: killing process cron Apr 26 21:23:03 www __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) -------------------------------------------------------------- And finally, in case it helps, here is more info about my system: -------------------------------------------------------------- > emerge -s gs-sources * sys-kernel/gs-sources Latest version available: 2.4.25_pre7-r4 Latest version installed: 2.4.25_pre7-r4 Size of downloaded files: 31,556 kB > emerge info Portage 2.0.50-r6 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3_pre20040420-r0, 2.4.25_pre7-gss-r3) ================================================================= System uname: 2.4.25_pre7-gss-r3 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz Gentoo Base System version 1.4.9 Autoconf: sys-devel/autoconf-2.59-r3 Automake: sys-devel/automake-1.8.3 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-O3 -mcpu=pentium4 -march=pentium4 -pipe -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.1/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/mozilla/defaults/pref /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -mcpu=pentium4 -march=pentium4 -pipe -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://gentoo.mirrors.pair.com/ ftp://gentoo.mirrors.pair.com/ http://www.gtlib.cc.gatech.edu/pub/gentoo ftp://ftp.gtlib.cc.gatech.edu/pub/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X acl alsa apache2 apm arts avi berkdb cdr crypt cups curl dga directfb dvd dvdr encode esd fbcon flash foomaticdb gd gdbm ggi gif gnome gphoto2 gpm gstreamer gtk gtk2 imap imlib java jpeg kde ldap libg++ libwww mad maildir matrox mbox mcal mikmod mmx motif mozilla mpeg mysql nas ncurses nls odbc oggvorbis opengl oss pam pdflib perl png ppds python qt quicktime readline samba sasl scanner sdl slang snmp spell sse ssl svga tcltk tcpd tiff truetype usb x86 xml xml2 xmms xv zlib" -------------------------------------------------------------- If there's any other information that can help diagnose this problem, I'd be happy to provide it. Until then, I'll just have to keep rebooting my machine every day or so. Thanks to anyone who can help.
Here's another snapshot of what "top" looks like today: --------------------------------------------------------------------------------- top - 10:14:45 up 1 day, 11:01, 4 users, load average: 0.00, 0.11, 0.07 Tasks: 64 total, 1 running, 62 sleeping, 1 stopped, 0 zombie Cpu(s): 0.0% user, 0.3% system, 0.0% nice, 99.7% idle Mem: 514704k total, 491712k used, 22992k free, 91340k buffers Swap: 506036k total, 41060k used, 464976k free, 265516k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11258 amavis 9 0 30212 22m 7392 S 0.0 4.4 0:01.27 amavisd 3969 ntp 9 0 4292 4292 3108 S 0.0 0.8 0:01.27 ntpd 12193 amavis 9 0 27996 2576 1964 S 0.0 0.5 0:00.00 amavisd 12350 root 9 0 3008 2460 2200 S 0.0 0.5 0:00.00 smbd 1781 amavis 9 0 27936 2236 1888 S 0.0 0.4 0:00.86 amavisd 3433 root 9 0 2668 1692 1108 S 0.0 0.3 0:00.14 cupsd 18104 david 9 0 1804 1568 1432 S 0.0 0.3 0:23.23 sshd 12335 postfix 10 0 1480 1480 1192 S 0.0 0.3 0:00.00 pickup 14764 david 10 0 1604 1380 1264 S 0.0 0.3 0:00.95 sshd 12299 root 12 0 1360 1360 784 S 0.0 0.3 0:00.24 jpico 4162 postfix 9 0 1560 1340 1236 S 0.0 0.3 0:00.72 qmgr 4217 root 9 0 1568 1340 1144 S 0.0 0.3 0:00.59 nmbd 14763 david 9 0 1548 1312 1208 S 0.0 0.3 0:03.35 sshd 18102 root 9 0 1636 1312 1312 S 0.0 0.3 0:00.00 sshd 4111 root 7 0 1468 1240 1168 S 0.0 0.2 0:01.25 master 18112 root 8 0 1288 1208 1128 S 0.0 0.2 0:00.01 bash 14783 root 9 0 1268 1176 1096 S 0.0 0.2 0:00.02 bash 14778 root 9 0 1256 1156 1100 S 0.0 0.2 0:00.00 bash 4215 root 9 0 1896 1136 1056 S 0.0 0.2 0:00.09 smbd 14759 root 9 0 1460 1132 1132 S 0.0 0.2 0:00.00 sshd 11928 root 9 0 1132 1132 932 S 0.0 0.2 0:00.01 pppd 3363 root 8 0 1180 1032 996 S 0.0 0.2 0:00.01 sshd 14761 root 9 0 1340 1008 1008 S 0.0 0.2 0:00.00 sshd 12278 root 11 0 1008 1008 800 R 0.0 0.2 0:00.12 top 18105 david 9 0 1180 960 960 S 0.0 0.2 0:00.00 bash 14765 david 9 0 1176 956 956 S 0.0 0.2 0:00.01 bash 14768 david 9 0 1176 956 956 S 0.0 0.2 0:00.00 bash 1874 root 9 0 1088 940 940 S 0.0 0.2 0:00.01 mysqld_safe 1982 mysql 9 0 3712 920 908 S 0.0 0.2 0:00.02 mysqld 2255 mysql 9 0 3712 920 908 S 0.0 0.2 0:00.01 mysqld 2256 mysql 9 0 3712 920 908 S 0.0 0.2 0:00.00 mysqld 2277 mysql 9 0 3712 920 908 S 0.0 0.2 0:00.00 mysqld 3693 dhcp 9 0 1356 900 864 S 0.0 0.2 0:00.00 dhcpd 4165 root 8 0 1072 848 848 S 0.0 0.2 0:00.00 pure-ftpd 154 root 9 0 856 740 624 S 0.0 0.1 0:00.02 devfsd 4268 nut 8 0 760 716 668 S 0.0 0.1 0:00.01 upsmon 1544 root 9 0 764 696 512 S 0.0 0.1 0:03.69 syslog-ng 18111 root 9 0 880 688 688 S 0.0 0.1 0:00.00 su 14777 root 9 0 872 684 684 S 0.0 0.1 0:00.01 su 14782 root 9 0 876 684 684 S 0.0 0.1 0:00.00 su 15969 root 9 0 1396 676 676 T 0.0 0.1 0:00.00 lynx 4465 xfs 9 0 4560 632 604 S 0.0 0.1 0:00.12 xfs 4305 root 0 0 636 588 552 S 0.0 0.1 0:00.03 cron 4266 root 9 0 604 520 520 S 0.0 0.1 0:00.00 upsmon 4019 root 10 0 564 512 484 S 0.0 0.1 0:00.47 popa3d 4480 root 9 0 552 492 492 S 0.0 0.1 0:00.00 agetty 4481 root 9 0 556 492 492 S 0.0 0.1 0:00.00 agetty 4482 root 9 0 556 492 492 S 0.0 0.1 0:00.00 agetty 4483 root 9 0 556 492 492 S 0.0 0.1 0:00.00 agetty 4484 root 9 0 556 492 492 S 0.0 0.1 0:00.02 agetty 4485 root 9 0 556 492 492 S 0.0 0.1 0:00.00 agetty 4684 root 9 0 500 448 424 S 0.0 0.1 0:00.00 tail 1 root 8 0 456 444 412 S 0.0 0.1 0:04.89 init 1582 root 9 0 488 428 428 S 0.0 0.1 0:00.00 acpid 3772 root 9 0 424 368 368 S 0.0 0.1 0:00.03 gpm 2 root 8 0 0 0 0 S 0.0 0.0 0:00.01 keventd 3 root 18 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd_CPU0 4 root 19 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd_CPU1 5 root 9 0 0 0 0 S 0.0 0.0 0:07.94 kswapd 6 root 9 0 0 0 0 S 0.0 0.0 0:00.26 bdflush 7 root 9 0 0 0 0 S 0.0 0.0 0:01.97 kupdated 9 root 9 0 0 0 0 S 0.0 0.0 0:00.00 khubd 14 root 9 0 0 0 0 S 0.0 0.0 0:00.00 kreiserfsd 398 root 9 0 0 0 0 S 0.0 0.0 0:00.00 kjournald That's all the processes that "top" lists. If you add up all the memory in the "RES" column, it only comes out to 79,084KB (only about 77 Meg used). So why does "top" report that the system is using 491,712KB? What's using the rest of the memory?
More info, in case it's helpful: (at this point, there's about 45M physical RAM unused), though "top" still only shows about 70-80M of actual processes running. > vmstat procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 41048 45600 86924 232004 2 1 15 31 67 82 11 2 87 0 > vmstat -s 514704 total memory 469604 used memory 198756 active memory 160076 inactive memory 45100 free memory 87092 buffer memory 232120 swap cache 506036 total swap 41048 used swap 464988 free swap 2907182 non-nice user cpu ticks 241 nice user cpu ticks 464050 system cpu ticks 21951409 idle cpu ticks 0 IO-wait cpu ticks 0 IRQ cpu ticks 0 softirq cpu ticks 3716045 pages paged in 7886284 pages paged out 115632 pages swapped in 92925 pages swapped out 16974899 interrupts 20750721 CPU context switches 1083035617 boot time 1869842 forks
I understand a bit better now about Linux memory utilization, in particular how the buffers listed by "top" and "free" should really be considered free memory, since they can be flushed and reclaimed if necessary. Still, something is happening every day or two that seems to drain the physical and swap memory to almost empty. At least that's my assumption based on all the log entries that look like this: Apr 26 19:07:21 www __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Apr 26 19:07:21 www VM: killing process apache2
Could you try a newer kernel? Perhaps a 2.6 kernel.
Boy - this is production system and I hate to introduce too many unknowns into the equation. I'd really prefer to keep 2.4 for a while until I can do some testing with 2.6 and make sure everything I have (including Helix) works correctly with it. Do you know of some memory-related problem with 2.4 that 2.6 has fixed? Meanwhile, I'll keep an eye on my system and post more "top" and "free" details if/when the system appears to run out of physical and swap memory again.
Hm; would you be able to try out vanilla-sources or ck-sources - this may be something which is brought in through a patch in the gs-sources patchset... Thanks!
you might also try latest gentoo-sources 2.4.25 its actually the new gs-sources renamed.
Ok, I'm going to mark this as "FIXED". The memory problems have happened occasionaly in the last few days, but each time I've been able to finger Apache (with a lot of modules and a lot of child processes) as the likely culprit. I didn't expect it to eat up 512M RAM + all swap space, but that's apparently what it does sometimes. I've lowered the MaxChildren in Apache's configuration and we'll see how that plays out. For now, I'm satisfied that the system/kernel seems to be behaving properly. Thanks for the help.