Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 93297

Summary: consolelog crashes kernel (oops inside)
Product: Gentoo Linux Reporter: Navid Zamani <navid.zamani>
Component: [OLD] Core systemAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED NEEDINFO    
Severity: critical    
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Navid Zamani 2005-05-19 23:03:12 UTC
[Sorry, i get tons of crashes on this box with firefox since 1.0.4. this report
is about my other box, but i don't gonna fill it that detailed a 4th time. :(( ]



Reproducible: Sometimes
Steps to Reproduce:
0. be able to read the oops and save it as txt when it's happening. (nearly
impossible if the logger crashes. :_(
1. use metalog (<-- evil guy?; additionally logging different stuff on
vc/10,vc/11,vc/12), samba, my kernel, everything stable and acutal
2. try to copy creat very much events for samba (like: copy 20000 tiny-medium files)
3. do this for up to 30 min
4. wonder how to capture the million oopses flooding your active console,
preventing you from doing anything and slowing the whole system to death (if you
don't have /proc/sys/kernel/panic_on_oops = 1)
Actual Results:  
ksymoops 2.4.9 on i686 2.6.11-gentoo-r8.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.11-gentoo-r8/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address 00000000
c01f259e
*pde = 00000000
Oops: 0000 [#31]
CPU:    0
EIP:    0060:[<c01f259e>]  Tainted: P       VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286        (2.6.11-gentoo-r8)
eax: 00000000   ebx: 00005401     ecx: 00000000       edx: c8419240
esi: bfffe8fc   edi: c5b19000     ebp: 00005401       esp: c4565e60
ds: 007b   es: 007    ss: 0068
Stack: 00000000 cfd81984 c622e508 cfd81a24 c622e4c0 c9737c04 c0135c40 c9737c04
       c11f5940 c0144290 c11f5940 b7f01000 c4565ea8 c4565eb4 c01354e0 c4565000
       00000000 00000000 00000001 c4565000 c9761b7c b7f0142e c96a8040 c01445e9
Call Trace:
 [<c0135c40>] filemap_nopage+0x0/0x3a0
 [<c0144290>] do_no_page+0x1b0/0x300
 [<c01354e0>] file_read_actor+0x0/0xe0
 [<c01445e9>] handle_mm_fault+0xe9/0x190
 [<c01df052>] copy_to_user+0x42/0x60
 [<c015d20d>] cp_new_stat64+0xfd/0x120
 [<c01ed56a>] tty_ioctl+0x42a/0x580
 [<c01660ef>] do_ioctl+0x6f/0xa0
 [<c0166325>] vfs_ioctl+0x65/0x1e0 
 [<c01664e5>] sys_ioctl+0x45/0xa0
 [<c010271b>] syscall_call+0x7/0xb
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 55 57 56 53 81 ec b4 00 00 00 8b bc
24 c8 00 00 00 8b 9c 24 d0 00 00 00 8b 87 7c 09 00 00 <8b> 30 8b 04 b5 20 d3 44
c0 89 34 24 89 44 24 4c e8 0d 69 00 00


>>EIP; c01f259e <vt_ioctl+1e/1b80>   <=====

>>edx; c8419240 <pg0+7f98240/3fb7d400>
>>edi; c5b19000 <pg0+5698000/3fb7d400>
>>esp; c4565e60 <pg0+40e4e60/3fb7d400>

Trace; c0135c40 <filemap_nopage+0/3a0>
Trace; c0144290 <do_no_page+1b0/300>
Trace; c01354e0 <file_read_actor+0/e0>
Trace; c01445e9 <handle_mm_fault+e9/190>
Trace; c01df052 <copy_to_user+42/60>
Trace; c015d20d <cp_new_stat64+fd/120>
Trace; c01ed56a <tty_ioctl+42a/580>
Trace; c01660ef <do_ioctl+6f/a0>
Trace; c0166325 <vfs_ioctl+65/1e0>
Trace; c01664e5 <sys_ioctl+45/a0>
Trace; c010271b <syscall_call+7/b>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  c01f2573 <.text.lock.misc+dd/ea>
00000000 <_EIP>:
Code;  c01f2573 <.text.lock.misc+dd/ea>
   0:   90                        nop    
Code;  c01f2574 <.text.lock.misc+de/ea>
   1:   90                        nop    
Code;  c01f2575 <.text.lock.misc+df/ea>
   2:   90                        nop    
Code;  c01f2576 <.text.lock.misc+e0/ea>
   3:   90                        nop    
Code;  c01f2577 <.text.lock.misc+e1/ea>
   4:   90                        nop    
Code;  c01f2578 <.text.lock.misc+e2/ea>
   5:   90                        nop    
Code;  c01f2579 <.text.lock.misc+e3/ea>
   6:   90                        nop    
Code;  c01f257a <.text.lock.misc+e4/ea>
   7:   90                        nop    
Code;  c01f257b <.text.lock.misc+e5/ea>
   8:   90                        nop    
Code;  c01f257c <.text.lock.misc+e6/ea>
   9:   90                        nop    
Code;  c01f257d <.text.lock.misc+e7/ea>
   a:   90                        nop    
Code;  c01f257e <.text.lock.misc+e8/ea>
   b:   90                        nop    
Code;  c01f257f <.text.lock.misc+e9/ea>
   c:   90                        nop    
Code;  c01f2580 <vt_ioctl+0/1b80>
   d:   55                        push   %ebp
Code;  c01f2581 <vt_ioctl+1/1b80>
   e:   57                        push   %edi
Code;  c01f2582 <vt_ioctl+2/1b80>
   f:   56                        push   %esi
Code;  c01f2583 <vt_ioctl+3/1b80>
  10:   53                        push   %ebx
Code;  c01f2584 <vt_ioctl+4/1b80>
  11:   81 ec b4 00 00 00         sub    $0xb4,%esp
Code;  c01f258a <vt_ioctl+a/1b80>
  17:   8b bc 24 c8 00 00 00      mov    0xc8(%esp),%edi
Code;  c01f2591 <vt_ioctl+11/1b80>
  1e:   8b 9c 24 d0 00 00 00      mov    0xd0(%esp),%ebx
Code;  c01f2598 <vt_ioctl+18/1b80>
  25:   8b 87 7c 09 00 00         mov    0x97c(%edi),%eax

This decode from eip onwards should be reliable

Code;  c01f259e <vt_ioctl+1e/1b80>
00000000 <_EIP>:
Code;  c01f259e <vt_ioctl+1e/1b80>   <=====
   0:   8b 30                     mov    (%eax),%esi   <=====
Code;  c01f25a0 <vt_ioctl+20/1b80>
   2:   8b 04 b5 20 d3 44 c0      mov    0xc044d320(,%esi,4),%eax
Code;  c01f25a7 <vt_ioctl+27/1b80>
   9:   89 34 24                  mov    %esi,(%esp)
Code;  c01f25aa <vt_ioctl+2a/1b80>
   c:   89 44 24 4c               mov    %eax,0x4c(%esp)
Code;  c01f25ae <vt_ioctl+2e/1b80>
  10:   e8 0d 69 00 00            call   6922 <_EIP+0x6922>

Kernel panic - not syncing: Fatal exception

1 warning and 1 error issued.  Results may not be reliable.

Expected Results:  
normal operation

(i use jfs. no idea if that's relevant. lost data the last time (e.g.
/root/.viminfo corrupted)

Gentoo Base System version 1.4.16
Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130,
glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r8 i686)
=================================================================
System uname: 2.6.11-gentoo-r8 i686 AMD Athlon(tm) Processor
Python:              dev-lang/python-2.3.5 [2.3.5 (#1, May  3 2005, 02:29:15)]
dev-lang/python:     2.3.5
sys-apps/sandbox:    [Not Present]
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5
sys-devel/binutils:  2.15.92.0.2-r7
sys-devel/libtool:   1.5.16
virtual/os-headers:  2.6.8.1-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=athlon-tbird -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/share/config /var/bind /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=athlon-tbird -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/
http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/
http://mir.zyrianes.net/gentoo/ http://www.gigaload.org/gentoo.org/"
LANG="de_DE.utf8"
LC_ALL="de_DE.utf8"
LINGUAS="de"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="x86 3dnow aalib acl acpi alsa apache2 arts audiofile avi bash-completion
berkdb bitmap-fonts bluetooth bzlib crypt cups curl curlwrappers dedicated dio
directfb doc emboss encode exif fbcon fftw flac flash flatfile foomaticdb
fortran ftp gd gdbm gif gpm gstreamer gtk2 hardened hardenedphp imagemagick imap
imlib innodb ipv6 jack java jikes jpeg junit ladcca lcms libcaca libg++ libwww
mad mikmod mime ming mmap mmx mng motif mp3 mpeg mysql mysqli nas ncurses nls
nocd offensive ogg oggvorbis openal opengl oss pam pcre pdflib perl php png
portaudio posix ppds prelude python quicktime readline ruby samba sdl session
shared sharedmem slang sndfile snmp soap sockets sox spell spl ssl svg svga tcpd
threads tidy tiff tokenizer truetype truetype-fonts type1-fonts unicode usb
vhosts vorbis xml2 xmms xsl xv zlib fritzcapi_cards_fcpci linguas_de
userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CBUILD, CTARGET, LDFLAGS, PORTDIR_OVERLAY
Comment 1 Navid Zamani 2005-05-19 23:06:29 UTC
i forgot: why is ther no /proc/ksyms on my system? yesterday i asked someone on
#gentoo and he didn't had one too...

(sorry for my bad english, i'm only one of 600000 poeple nativley speaking
luxembugish ;)
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2005-05-21 05:10:42 UTC
Which was the last working good kernel? Have you tested your memory recently?
Comment 3 Navid Zamani 2005-05-21 06:27:39 UTC
there were less problems with 2.6.11-r4, and 2.6.10-r8 made no problems if i
remember it correctly. 2.6.9-* and earlyer worked fine, but then i had syslog-ng
installed instead of metalog.
Comment 4 Navid Zamani 2005-05-30 00:33:45 UTC
I can additionally say now, that it only happens when i'm on the console. It
it's running with all consoles closed and only having the windows-machine
interacting with it, backuping or emerging via cron, it always runs fine.

I just installed 2.6.11-gentoo-r9, and got a similar oops only 30 minutes later,
this time the kernel was not even tainted (with the fcpci module from fritzcapi).

Is anyone working on this? If yes. Is there a way to help you out? I'm a
programmer in java, can read c and c++ and write some other languages. I also
understand the basics of assembler and i'm good at debugging as long as it's not
debugging of assember-code. ;)
I only have no idea of finding the place in the kernel-sources that the
ksymoops-output seems to point to.. :(
If it will speed up the task simply tell me what there is to do...
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2005-05-31 16:22:46 UTC
Have you tested your memory recently?

It would also be useful if you could try and reproduce the problem on
development-sources-2.6.12_rc5
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2005-06-13 15:42:12 UTC
See comment #5
Comment 7 Navid Zamani 2005-06-16 11:18:11 UTC
sorry, the replies went to the wrong mail-adress so i did not know that you
added the comments.

yes. memtest86+ ran all night without an error.

but: is it safe to use this development-sources-2.6.12_rc5?
I don't want to have even more bugs. ;)
but i could try it for one day or so...
should i really do this?
Comment 8 Daniel Drake (RETIRED) gentoo-dev 2005-06-16 11:48:03 UTC
Yes, it should be fine. By the way, -rc6 is out now.
Comment 9 Navid Zamani 2005-06-29 06:02:55 UTC
Okay. the last gentoo-sources i can get trough emerge is 2.6.12-r1.
I'm running it right now, with the latest fcpci (from fritzcapi) and using the
latest splashutils. And yesterday i got some short oops-flood again. i think
there were about 30 oopses, and then it worked again. With the old kernel
sometimes the oopses continued to flood until the computer died. But i can't see
if it got better... Metalog still dies on the oops-flood. But it can log the
first ones...

The problem is that it only logs the first line of the oops. Is there a way to
let it log everything?

And: Is there any way how i could get this thing reproducible? (Like finding out
the oopsing routine and how to trigger it... ) Then i could find the source of
the problem...

I really want to track this ugly thing down... I never had that much problems
with any linux/unix flavour or other os. (windows 98 does not count as such. ;)
And i can't stand the fact that my win-xp-machine is more stable than a hardened
gentoo machine.. :(
Comment 10 Daniel Drake (RETIRED) gentoo-dev 2005-07-05 12:21:34 UTC
We can at least file a decent bugreport upstream if you can get a new oops logged.

Yes, you can log the entire debug/error message output, but that depends on your
system logger, and not the kernel. dmesg will always contain all messages,
perhaps you could redirect that to a file if you still have a working console.

I thought you already found how to reproduce it, by copying lots of files using
samba.

You should also try it without any non-standard modules such as fritzcapi.
Comment 11 Navid Zamani 2005-07-05 13:07:55 UTC
(In reply to comment #10)
> We can at least file a decent bugreport upstream if you can get a new oops logged.

Hmm... i got some but could not log them. :(

> Yes, you can log the entire debug/error message output, but that depends on
your system logger, and not the kernel.

And this is the best thing ont it: IT's metalog, wich stops working on the first
oops, flooding the console, and making it impossible to type anything. (well i'm
able to do it blindly now, so for the next oops... ;)

> desg will always contain all messages, perhaps you could redirect that to a
file if you still have a working console.

Good idea. :) I'll try that. Let's put the bug to NEEDINFO until i got that next
bug. The oopses got more seldom since 2.6.12-r1, so it could take weeks to get
one while i'm on the console.

> I thought you already found how to reproduce it, by copying lots of files sing
> samba.

I can't reproduce them anymore. Somehow .12-r1 is harder to get to an oops...
(Btw : I noticed that samba takes about 80-90% of my 80mhz-cpu when syncing a
profile with the client. Maybe it's a problem with metalog and a buffer
overflow. I have now disabled output buffering for that reason in metalog, so i
can test if that's the problem.)

> You should also try it without any non-standard modules such as fritzcapi.

I did that already and got oopses on the same day. So a re-enabled fritzcapi. ;)

The problem is that i need firtzcapi for my answering machine for my business,
so i can't leave it disabled for days...

I'll check back when i got that next oops. If not leave it to NEEDINFO for some
weeks... If i did not came back until then you can trash this bug if you want. ;)
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2005-07-08 09:22:43 UTC
Ok, marking NEEDINFO. Thanks.