Bug 147823

Summary:	run_posix_cpu_timers general protection fault
Product:	Gentoo Linux	Reporter:	Rick <rbunke>
Component:	[OLD] Core system	Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status:	RESOLVED CANTFIX
Severity:	normal
Priority:	High
Version:	unspecified
Hardware:	AMD64
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---

Description Rick 2006-09-16 09:47:53 UTC

wasn't sure if I send this to you or the the kernel mailing list, but I thought I would try you first since it was gentoo sources.

my computer keeps locking up hard with the following oops

general protection fault: 000 [1]
CPU 0
Modules linked in: it87 hwmon_vid eeprom i2c_isa uhci_hcd nvidia quickcam ohci_hcd wlan_scan_ap ath_pci ath_rate_sample wlan ath_hal
Pid: 0, comm: swapper Tainted: P    2.6.17-gentoo-r8 #2
RIP: 0010:[<ffffffff802915b1>] <ffffffff802915b1>{run_posix_cpu_timers+49}
RSP: 0018:ffffffff8076be98  EFLAGS: 00010046
RAX: 0000000000000092 RBX: 00ffffff8066a2e0 RCX: 00000000000f3b52
RDX: 0000024dd823fe04 RSI: ffffffff807c2980 RDI: 00ffffff8066a2e0
RBP: ffffffff8080def8 R08: 0000000000000000 R09: 00000000001b2c76
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8080def8
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8076beb8
FS:  00002ba96a98bf30(0000) GS:ffffffff80805000(0000) knIGS:00000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000000005003b
CR2: 0000000000550098 CR3: 0000000021af4000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff8080c000, task ffffffff8066a2e0)
Stack: 0000000000000002 ffff810001686130 0000000000000001 ffffffff8027ff62
       ffffffff8076beb8 ffffffff8076beb8 ffffffff8080def8 0000000000000000
Call Trace: <IRQ> <ffffffff8027ff62>{scheduler_tick+34}
       <ffffffff8026b240>{main_timer_handler+496} <ffffffff8026b475>{timer_interupt+21}
       <ffffffff8020fc1c>{handle_IRQ_event+44} <ffffffff8029956f>{__do_IRQ+143}
       <ffffffff8026a192>{do_IRQ+66} <ffffffff80268be0>{default_idle+61}
       <ffffffff8025eabc>{ret_from_intr+0} <EOI> <ffffffff80260f1f>{thread_return+0}
       <ffffffff80268c0a>{default_idle+42} <ffffffff80248ced>{cpu_idle+61}
       <ffffffff8080f84f>{start_kernel+495} <ffffffff8080f255>{_sinittext+597}

Code: 48 8b 93 00 02 00 00 48 85 d2 74 13 48 8b 83 c8 01 00 00 48
RIP: ffffffff802915b1>{run_posix_cpu_timers+49} RSP <ffffffff8076be98>
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

I had to write it down by hand. I think all correct.

Originally I was getting a bad RIP value but it always said preempt so I recompiled my kernel from the low-latency preempt setting to just the normal desktop preempt.  Now I get the same RIP value every time it happens, the comm: swapper is also the same but stack trace seems to vary.

Occasionally (randomly) gcc segfaults too.

I ran a memory test from the install cd, but it never found any errors.

Any one have any ideas?
 

#emerge --info
Portage 2.1.1 (default-linux/amd64/2006.0, gcc-4.1.1, glibc-2.4-r3, 2.6.17-gentoo-r8 x86_64)
=================================================================
System uname: 2.6.17-gentoo-r8 x86_64 AMD Athlon(tm) 64 Processor 3400+
Gentoo Base System version 1.12.4
Last Sync: Mon, 11 Sep 2006 23:30:08 +0000
ccache version 2.3 [enabled]
app-admin/eselect-compiler: [Not Present]
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.3
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS=" -march=k8 -O2 -pipe "
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS=" -march=k8 -O2 -pipe "
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp.gtlib.gatech.edu/pub/gentoo "
LINGUAS=""
MAKEOPTS=" -j2 "
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 X a52 aac aalib acl acpi aim alsa apache2 arts asf audiofile avi bash-completion berkdb bitmap-fonts bluetooth bonobo browserplugin bzip2 bzlib cairo cddb cdinstall cdparanoia cdr cjk cli crypt cscope cups curl curlwrappers dbus dedicated dga dio directfb divx4linux dlloader doc dri dts dv dvd dvdr dvdread dxr3 ecc editor eds elibc_glibc emboss encode erandom escreen esd ethereal etwin evo exif exscalibar fam fbcon ffmpeg fftw firefox flac flash fltk foomaticdb fortran fpx ftp gb gd gdbm gecko-sdk ggi gif gimpprint glitz glut gnome gnutils gnutls gpm graphviz gstreamer gtk gtk2 gtkhtml guile hal hbci icq ieee1394 imagemagick imlib imlib2 innodb input-devices-joystick input-devices-keyboard input-devices-mouse input_devices_evdev input_devices_keyboard input_devices_mouse insecure-savers ipv6 irmc isdnlog jabber jack java javascript jbig jikes joystick jpeg jpeg2k kde kerberos kernel_linux ladcca lcms ldap libcaca libedit libgda libwww lirc lirc_devices_hauppauge live lm_sensors lzo lzw lzw-tiff mad madwifi matroska mcal mikmod mime ming mjpeg mmap mmx2 mng mono motif mozdevelop mozilla mozsvg mp3 mpeg mpi mplayer msn musepack musicbrainz mysql mythtv nas ncurses network nls nocd nodrm nowin nptl nptlonly nvidia odbc offensive ofx ogg openal openexr opengl opie oscar pam pcre pda pdf pdflib perl plotutils png portaudio povray ppds pppd python qt qt3 qt4 quicktime rar readline reflection rtc ruby samba sasl sblive sdl server session shared sharedmem simplexml slang slp sndfile snmp soap sockets speex spell spl ssl subtitles svg szip tcl tcpd theora threads tiff timidity tk tokenizer toolbar tools truetype truetype-fonts type1-fonts unicode usb userland_GNU v4l v4l2 vcd video_cards_fbdev video_cards_nvidia video_cards_v4l video_cards_vesa video_cards_vga videos vim-pager vim-with-x visualization vorbis wifi wmf wxwindows xanim xcomposite xface xine xinerama xml xml2 xmlrpc xmms xorg xosd xpm xprint xrandr xscreensaver xsl xv xvid xvmc yahoo yaz zlib zvbi"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY

Comment 1 Daniel Drake (RETIRED) gentoo-dev

2006-09-16 20:11:59 UTC

Your kernel is tainted. Please reproduce this on a session where the closed-source nvidia module is not loaded (and has not been loaded during the current boot).

Comment 2 Rick 2006-09-19 16:06:00 UTC

it changed somewhat this time: I unloaded the nvidia module
got an oops but the kernel was still tainted because madwifi drivers have a precompiled portion in order stop people from fiddling with settings that might violate fcc regulations. 
So I unloaded atheros modules and got it to produce this oops:

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<ffffffff80279b65>{elf_core_dump+2325}
PGD 25805067 PUD 36416067 PMD 0
Oops: 0002 [1]
CPU 0
Modules linked in: it87 hwmon_vid eeprom 12c_isa uhci_hcd quickcam ohci_hcd
Pid: 0, comm: swapper Not Tainted 2.6.17-gento-r8 #2
RIP: 0010:[<ffffffff80279b65>] <ffffffff80279b65>{elf_core_dump+2325}
RSP: 0018:ffffffff8076bfa8  EFLAGS: 0010046
RAX: 0000000000000000 RBX: ffffffff8080def8 RCX: ffff81003ff80d5f
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffff80773298
RBP: 0000000008e00000 R08: ffffffff8080c000 R09: 0000000000000004
R10: ffff81003fbd77c0 R11: 0000000000000025 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00002b8c31481d80(0000) GS:ffffffff80805000(0000) knIGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000029d71000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff8080c000, task ffffffff8066a2e0)
Stack: ffffffff80268be0 ffffffff8025ecf2 ffffffff8080def8  <EOI> ffff81003e56f540
       ffffffff8066a2e0 000000749e1e5911 000000000000000a ffffffff805ed4fd
       ffffffff80260f1f 0000000000000025
Call Trace: <IRQ> <ffffffff80268be0>{default_idle+0}
       <ffffffff8025ecf2>{apic_timer_interrupt+98} <EOI> <ffffffff80260f1f>{thread_return+0}
       <ffffffff80268c0a>{default_idle+42} <ffffffff80248ced>{cpu_idle+61}
       <ffffffff8080f84f>{start_kernel+495} <ffffffff8080f255>{_sinittext+597}

Code: 00 00 00 65 48 8b 04 25 00 00 00 00 48 39 c7 75 35 48 8b 47
RIP <ffffffff80279b65>{elf_core_dump+2325} RSP <ffffffff8076bfa8>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Attempted to kill the idle task!

the comm is still swapper, but now it gives a null pointer error in elf_core_dump.

Comment 3 Daniel Drake (RETIRED) gentoo-dev

2006-09-20 15:11:38 UTC

This really smells like a hardware problem to me. In the last trace you posted the error actually occurred in elf_core_dump - this function is only called when a userspace process crashed. So it looks like not only did some program crash, the kernel then crashed trying to deal with the crash!

Comment 4 Rick 2006-09-20 17:07:15 UTC

To certain extent I agree with your essement.

I had kernel stability problems in the past while running gentoo on this system so I ran Suse for a few weeks on the same machine and it didn't seem to have trouble.  With Suse I couldn't get software configured the way I wanted it, with out recompiling many packages myself, so I switched back to gentoo.  Perhaps they have more conservative settings, or some things are disabled in their kernel that I enable in my kernel.

I've checked the ram with memtest on the livecd and all seems ok there.

Since the comm is always swapper do you think it might be a problem with the drive my swap file is on? Or is it impossable to tell from a kernel oops where trouble might lay in the hardware?

any suggestions or ideas?

I'll troubleshoot from a hardware perspective, remove un-nescarry drives, cards, and make sure there are safe defaults set in bios etc. to see if I can't discover a hardware issue.

Comment 5 Rick 2006-11-29 15:15:51 UTC

I ended up replacing the hardware and my new box is stable, so it nust have been a hardware issue.