Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 123651 - opteron system randomly crashes
Summary: opteron system randomly crashes
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-21 12:55 UTC by Thomas Beutin
Modified: 2007-05-11 12:15 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel config (config.gz,8.09 KB, application/x-tar)
2006-02-21 12:56 UTC, Thomas Beutin
Details
kernel panic screen shot (panic2.gif,49.44 KB, image/gif)
2006-02-21 12:57 UTC, Thomas Beutin
Details
"sysctl -a" output (sysctl.log.gz,2.63 KB, application/x-tar)
2006-02-21 12:58 UTC, Thomas Beutin
Details
/proc/cpuinfo (cpuinfo.log.gz,466 bytes, application/x-tar)
2006-02-21 12:58 UTC, Thomas Beutin
Details
kernel config (config,31.40 KB, text/plain)
2006-02-21 13:04 UTC, Thomas Beutin
Details
"sysctl -a" output (sysctl.log,10.90 KB, text/plain)
2006-02-21 13:06 UTC, Thomas Beutin
Details
/proc/cpuinfo (cpuinfo.log,2.40 KB, text/plain)
2006-02-21 13:06 UTC, Thomas Beutin
Details
dmesg output (dmesg.log,15.44 KB, text/plain)
2006-02-22 00:24 UTC, Thomas Beutin
Details
console screen shot (banzai_crash.gif,31.75 KB, image/gif)
2006-05-22 06:58 UTC, Thomas Beutin
Details
stat output (vm.20060522-142101.log,1.48 KB, text/plain)
2006-05-22 07:01 UTC, Thomas Beutin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Beutin 2006-02-21 12:55:33 UTC
Our database server randomly crashes when the load raises. i don't have the complete panic log, only the provided screen shot from our KVM switch. We're running postgres 8.0.3 on this server; no X is installed. unfortunally this is our production server so we're very unhappy with this situation. any suggestion or hints to stop this random crashes will be greatfully accepted!

i try to add some infos here, if you need more, please request and i'll provide all you need.

Memory:
             total       used       free     shared    buffers     cached
Mem:      16281964    2217164   14064800          0      61196    1846136
-/+ buffers/cache:     309832   15972132
Swap:     31254416          0   31254416





Portage 2.0.54 (default-linux/amd64/2005.1, gcc-3.4.4, glibc-2.3.5-r2, 2.6.15-gentoo-r5 x86_64)
=================================================================
System uname: 2.6.15-gentoo-r5 x86_64 AMD Opteron(tm) Processor 275
Gentoo Base System version 1.6.14
dev-lang/python:     2.3.5-r2, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ ftp://gentoo.risq.qc.ca/ http://gentoo.ccccom.com ftp://gentoo.ccccom.com http://gentoo.inode.at/ http://gd.tuwien.ac.at/opsys/linux/gentoo/ ftp://gd.tuwien.ac.at/opsys/linux/gentoo/ http://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ ftp://ftp.tu-clausthal.de/pub/linux/gentoo/ ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ ftp://linux.rz.ruhr-uni-bochum.de/gentoo-mirror/ http://ftp.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo http://ftp6.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp6.uni-erlangen.de/pub/mirrors/gentoo ftp://ftp.join.uni-muenster.de/pub/linux/distributions/gentoo ftp://ftp.wh2.tu-dresden.de/pub/mirrors/gentoo ftp://ftp.join.uni-muenster.de/pub/linux/distributions/gentoo ftp://ftp6.uni-muenster.de/pub/linux/distributions/gentoo ftp://ftp.ipv6.uni-muenster.de/pub/linux/distributions/gentoo http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://ftp.gentoo.mesh-solutions.com/gentoo/ http://pandemonium.tiscali.de/pub/gentoo/ ftp://pandemonium.tiscali.de/pub/gentoo/ ftp://ftp.rz.tu-bs.de/pub/mirror/ftp.gentoo.org/gentoo-distfiles/"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 apache2 avi bash-completion berkdb bitmap-fonts bzip2 cal caps crypt cups curl dba dbase dbm dbx eds emboss encode exif expat fftw flash flatfile foomaticdb fortran gd gdbm gif gmp gnutls gpm gstreamer iconv imagemagick imap imlib innodb ipv6 jabber java jikes jpeg ldap libwww lm_sensors lzw lzw-tiff m17n-lib mcal mhash ming mmap mng mp3 mpeg ncurses netcdf nls nptl odbc ogg openal opengl pam pcntl pcre pdflib perl php png posix postgres prelude python quicktime readline recode ruby sasl sdl sharedext sharedmem shorten simplexml sndfile snmp soap sockets sox speex spell spl ssl svg sysvipc tcpd theora tidy tiff tokenizer truetype truetype-fonts type1-fonts udev unicode usb userlocales vhosts vorbis wddx wmf xml xml2 xmlrpc xpm xsl xv xvid yaz yeo zlib userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Comment 1 Thomas Beutin 2006-02-21 12:56:23 UTC
Created attachment 80375 [details]
kernel config
Comment 2 Thomas Beutin 2006-02-21 12:57:41 UTC
Created attachment 80376 [details]
kernel panic screen shot
Comment 3 Thomas Beutin 2006-02-21 12:58:16 UTC
Created attachment 80377 [details]
"sysctl -a" output
Comment 4 Thomas Beutin 2006-02-21 12:58:58 UTC
Created attachment 80378 [details]
/proc/cpuinfo
Comment 5 Thomas Beutin 2006-02-21 13:04:52 UTC
Created attachment 80379 [details]
kernel config
Comment 6 Thomas Beutin 2006-02-21 13:06:04 UTC
Created attachment 80381 [details]
"sysctl -a" output
Comment 7 Thomas Beutin 2006-02-21 13:06:32 UTC
Created attachment 80382 [details]
/proc/cpuinfo
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2006-02-21 15:38:23 UTC
Please enable CONFIG_KALLSYMS and post a new screenshot. It's almost impossible to diagnose this otherwise (kallsyms will add some useful text into those meaningless numbers). How often does the crash occur?
Comment 9 Thomas Beutin 2006-02-22 00:24:34 UTC
Created attachment 80404 [details]
dmesg output
Comment 10 Thomas Beutin 2006-02-22 00:35:34 UTC
crashes occurs "usually" about every 3 weeks, but this week on monday morning and (after rebooting with 2.6.15.r5) tuesday evening. now i turned off swap. at the moment i recompile the kernel with CONFIG_KALLSYMS enabled but i cannot reboot before 8pm (GMT 7pm).
Comment 11 Daniel Drake (RETIRED) gentoo-dev 2006-02-22 11:07:41 UTC
Ok. You should also upgrade to the latest development kernel (currently 2.6.16-rc4) as the problem may have been fixed.

I'm going to close this bug for now as it sounds like we might be waiting weeks for a new crash screenshot. Please reopen when you do have one.
Comment 12 Thomas Beutin 2006-02-22 13:48:37 UTC
@Daniel: which sources do You mean? vanilla-sources-2.6.16-r4?
Comment 13 Daniel Drake (RETIRED) gentoo-dev 2006-02-22 14:05:40 UTC
vanilla-sources-2.6.16-rc4
Comment 14 Thomas Beutin 2006-05-22 06:56:24 UTC
It crashed again. I'll attach some info.
Comment 15 Thomas Beutin 2006-05-22 06:58:22 UTC
Created attachment 87249 [details]
console screen shot

New screenshot, as requested the kernel was compiled with CONFIG_KALLSYMS=y.
Comment 16 Thomas Beutin 2006-05-22 07:01:37 UTC
Created attachment 87250 [details]
stat output

I log once a minute the output from utime an the content from /proc/meminfo and /proc/vmstat by a cron job. This is the last log before the system crashed.
Comment 17 Duane Griffin 2007-05-10 14:57:20 UTC
Would it be possible to setup a serial console or netconsole to capture the full error message? You can find documentation on how to do so in Documentation/serial-console.txt and Documentation/networking/netconsole.txt, under your kernel source tree.

It would also be a good idea to try with the latest vanilla sources (currently 2.6.21.1). If you can reproduce with that, and get the full error message, then you will have a much better chance of getting help from LKML (assuming we can't identify the problem here).
Comment 18 Thomas Beutin 2007-05-11 08:41:37 UTC
I do not admister this system any longer, so i cannot provide more information, sorry. So You may close the bug.

But i had a very similar problem on my x86 notebook using an suspend2 kernel a while ago. I had random crashes every now and then after resuming with some USB devives (external mouse) not attached as they were before i suspended, but the system seemed to run ok after a real reboot. Some days later i tried to to emerge a new bash and the system crashed reproducible every time at the same point of compiling. The same occurs on other packages as well. The reason was a corrupt reiser3 /tmp file system (i got denied permissions on some files and dirs even as root). Repairing this (and by the way the other) reiser3 filesystems solved the problem. But i couldn't check it on the server; as i left the company the machine was running smooth for a couple of months (but without changing anything except rebooting a new kernel version after every crash).
Comment 19 Daniel Drake (RETIRED) gentoo-dev 2007-05-11 12:15:04 UTC
OK. Thanks for the update.

If your notebook still exhibits those problems on the latest version of a supported kernel (e.g. gentoo-sources) then please file a new bug.