[Sorry, i get tons of crashes on this box with firefox since 1.0.4. this report is about my other box, but i don't gonna fill it that detailed a 4th time. :(( ] Reproducible: Sometimes Steps to Reproduce: 0. be able to read the oops and save it as txt when it's happening. (nearly impossible if the logger crashes. :_( 1. use metalog (<-- evil guy?; additionally logging different stuff on vc/10,vc/11,vc/12), samba, my kernel, everything stable and acutal 2. try to copy creat very much events for samba (like: copy 20000 tiny-medium files) 3. do this for up to 30 min 4. wonder how to capture the million oopses flooding your active console, preventing you from doing anything and slowing the whole system to death (if you don't have /proc/sys/kernel/panic_on_oops = 1) Actual Results: ksymoops 2.4.9 on i686 2.6.11-gentoo-r8. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.11-gentoo-r8/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel NULL pointer dereference at virtual address 00000000 c01f259e *pde = 00000000 Oops: 0000 [#31] CPU: 0 EIP: 0060:[<c01f259e>] Tainted: P VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 (2.6.11-gentoo-r8) eax: 00000000 ebx: 00005401 ecx: 00000000 edx: c8419240 esi: bfffe8fc edi: c5b19000 ebp: 00005401 esp: c4565e60 ds: 007b es: 007 ss: 0068 Stack: 00000000 cfd81984 c622e508 cfd81a24 c622e4c0 c9737c04 c0135c40 c9737c04 c11f5940 c0144290 c11f5940 b7f01000 c4565ea8 c4565eb4 c01354e0 c4565000 00000000 00000000 00000001 c4565000 c9761b7c b7f0142e c96a8040 c01445e9 Call Trace: [<c0135c40>] filemap_nopage+0x0/0x3a0 [<c0144290>] do_no_page+0x1b0/0x300 [<c01354e0>] file_read_actor+0x0/0xe0 [<c01445e9>] handle_mm_fault+0xe9/0x190 [<c01df052>] copy_to_user+0x42/0x60 [<c015d20d>] cp_new_stat64+0xfd/0x120 [<c01ed56a>] tty_ioctl+0x42a/0x580 [<c01660ef>] do_ioctl+0x6f/0xa0 [<c0166325>] vfs_ioctl+0x65/0x1e0 [<c01664e5>] sys_ioctl+0x45/0xa0 [<c010271b>] syscall_call+0x7/0xb Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 55 57 56 53 81 ec b4 00 00 00 8b bc 24 c8 00 00 00 8b 9c 24 d0 00 00 00 8b 87 7c 09 00 00 <8b> 30 8b 04 b5 20 d3 44 c0 89 34 24 89 44 24 4c e8 0d 69 00 00 >>EIP; c01f259e <vt_ioctl+1e/1b80> <===== >>edx; c8419240 <pg0+7f98240/3fb7d400> >>edi; c5b19000 <pg0+5698000/3fb7d400> >>esp; c4565e60 <pg0+40e4e60/3fb7d400> Trace; c0135c40 <filemap_nopage+0/3a0> Trace; c0144290 <do_no_page+1b0/300> Trace; c01354e0 <file_read_actor+0/e0> Trace; c01445e9 <handle_mm_fault+e9/190> Trace; c01df052 <copy_to_user+42/60> Trace; c015d20d <cp_new_stat64+fd/120> Trace; c01ed56a <tty_ioctl+42a/580> Trace; c01660ef <do_ioctl+6f/a0> Trace; c0166325 <vfs_ioctl+65/1e0> Trace; c01664e5 <sys_ioctl+45/a0> Trace; c010271b <syscall_call+7/b> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; c01f2573 <.text.lock.misc+dd/ea> 00000000 <_EIP>: Code; c01f2573 <.text.lock.misc+dd/ea> 0: 90 nop Code; c01f2574 <.text.lock.misc+de/ea> 1: 90 nop Code; c01f2575 <.text.lock.misc+df/ea> 2: 90 nop Code; c01f2576 <.text.lock.misc+e0/ea> 3: 90 nop Code; c01f2577 <.text.lock.misc+e1/ea> 4: 90 nop Code; c01f2578 <.text.lock.misc+e2/ea> 5: 90 nop Code; c01f2579 <.text.lock.misc+e3/ea> 6: 90 nop Code; c01f257a <.text.lock.misc+e4/ea> 7: 90 nop Code; c01f257b <.text.lock.misc+e5/ea> 8: 90 nop Code; c01f257c <.text.lock.misc+e6/ea> 9: 90 nop Code; c01f257d <.text.lock.misc+e7/ea> a: 90 nop Code; c01f257e <.text.lock.misc+e8/ea> b: 90 nop Code; c01f257f <.text.lock.misc+e9/ea> c: 90 nop Code; c01f2580 <vt_ioctl+0/1b80> d: 55 push %ebp Code; c01f2581 <vt_ioctl+1/1b80> e: 57 push %edi Code; c01f2582 <vt_ioctl+2/1b80> f: 56 push %esi Code; c01f2583 <vt_ioctl+3/1b80> 10: 53 push %ebx Code; c01f2584 <vt_ioctl+4/1b80> 11: 81 ec b4 00 00 00 sub $0xb4,%esp Code; c01f258a <vt_ioctl+a/1b80> 17: 8b bc 24 c8 00 00 00 mov 0xc8(%esp),%edi Code; c01f2591 <vt_ioctl+11/1b80> 1e: 8b 9c 24 d0 00 00 00 mov 0xd0(%esp),%ebx Code; c01f2598 <vt_ioctl+18/1b80> 25: 8b 87 7c 09 00 00 mov 0x97c(%edi),%eax This decode from eip onwards should be reliable Code; c01f259e <vt_ioctl+1e/1b80> 00000000 <_EIP>: Code; c01f259e <vt_ioctl+1e/1b80> <===== 0: 8b 30 mov (%eax),%esi <===== Code; c01f25a0 <vt_ioctl+20/1b80> 2: 8b 04 b5 20 d3 44 c0 mov 0xc044d320(,%esi,4),%eax Code; c01f25a7 <vt_ioctl+27/1b80> 9: 89 34 24 mov %esi,(%esp) Code; c01f25aa <vt_ioctl+2a/1b80> c: 89 44 24 4c mov %eax,0x4c(%esp) Code; c01f25ae <vt_ioctl+2e/1b80> 10: e8 0d 69 00 00 call 6922 <_EIP+0x6922> Kernel panic - not syncing: Fatal exception 1 warning and 1 error issued. Results may not be reliable. Expected Results: normal operation (i use jfs. no idea if that's relevant. lost data the last time (e.g. /root/.viminfo corrupted) Gentoo Base System version 1.4.16 Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130, glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r8 i686) ================================================================= System uname: 2.6.11-gentoo-r8 i686 AMD Athlon(tm) Processor Python: dev-lang/python-2.3.5 [2.3.5 (#1, May 3 2005, 02:29:15)] dev-lang/python: 2.3.5 sys-apps/sandbox: [Not Present] sys-devel/autoconf: 2.13, 2.59-r6 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.5 sys-devel/binutils: 2.15.92.0.2-r7 sys-devel/libtool: 1.5.16 virtual/os-headers: 2.6.8.1-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O2 -march=athlon-tbird -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/bind /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O2 -march=athlon-tbird -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict" GENTOO_MIRRORS="ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://mir.zyrianes.net/gentoo/ http://www.gigaload.org/gentoo.org/" LANG="de_DE.utf8" LC_ALL="de_DE.utf8" LINGUAS="de" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="x86 3dnow aalib acl acpi alsa apache2 arts audiofile avi bash-completion berkdb bitmap-fonts bluetooth bzlib crypt cups curl curlwrappers dedicated dio directfb doc emboss encode exif fbcon fftw flac flash flatfile foomaticdb fortran ftp gd gdbm gif gpm gstreamer gtk2 hardened hardenedphp imagemagick imap imlib innodb ipv6 jack java jikes jpeg junit ladcca lcms libcaca libg++ libwww mad mikmod mime ming mmap mmx mng motif mp3 mpeg mysql mysqli nas ncurses nls nocd offensive ogg oggvorbis openal opengl oss pam pcre pdflib perl php png portaudio posix ppds prelude python quicktime readline ruby samba sdl session shared sharedmem slang sndfile snmp soap sockets sox spell spl ssl svg svga tcpd threads tidy tiff tokenizer truetype truetype-fonts type1-fonts unicode usb vhosts vorbis xml2 xmms xsl xv zlib fritzcapi_cards_fcpci linguas_de userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CBUILD, CTARGET, LDFLAGS, PORTDIR_OVERLAY
i forgot: why is ther no /proc/ksyms on my system? yesterday i asked someone on #gentoo and he didn't had one too... (sorry for my bad english, i'm only one of 600000 poeple nativley speaking luxembugish ;)
Which was the last working good kernel? Have you tested your memory recently?
there were less problems with 2.6.11-r4, and 2.6.10-r8 made no problems if i remember it correctly. 2.6.9-* and earlyer worked fine, but then i had syslog-ng installed instead of metalog.
I can additionally say now, that it only happens when i'm on the console. It it's running with all consoles closed and only having the windows-machine interacting with it, backuping or emerging via cron, it always runs fine. I just installed 2.6.11-gentoo-r9, and got a similar oops only 30 minutes later, this time the kernel was not even tainted (with the fcpci module from fritzcapi). Is anyone working on this? If yes. Is there a way to help you out? I'm a programmer in java, can read c and c++ and write some other languages. I also understand the basics of assembler and i'm good at debugging as long as it's not debugging of assember-code. ;) I only have no idea of finding the place in the kernel-sources that the ksymoops-output seems to point to.. :( If it will speed up the task simply tell me what there is to do...
Have you tested your memory recently? It would also be useful if you could try and reproduce the problem on development-sources-2.6.12_rc5
See comment #5
sorry, the replies went to the wrong mail-adress so i did not know that you added the comments. yes. memtest86+ ran all night without an error. but: is it safe to use this development-sources-2.6.12_rc5? I don't want to have even more bugs. ;) but i could try it for one day or so... should i really do this?
Yes, it should be fine. By the way, -rc6 is out now.
Okay. the last gentoo-sources i can get trough emerge is 2.6.12-r1. I'm running it right now, with the latest fcpci (from fritzcapi) and using the latest splashutils. And yesterday i got some short oops-flood again. i think there were about 30 oopses, and then it worked again. With the old kernel sometimes the oopses continued to flood until the computer died. But i can't see if it got better... Metalog still dies on the oops-flood. But it can log the first ones... The problem is that it only logs the first line of the oops. Is there a way to let it log everything? And: Is there any way how i could get this thing reproducible? (Like finding out the oopsing routine and how to trigger it... ) Then i could find the source of the problem... I really want to track this ugly thing down... I never had that much problems with any linux/unix flavour or other os. (windows 98 does not count as such. ;) And i can't stand the fact that my win-xp-machine is more stable than a hardened gentoo machine.. :(
We can at least file a decent bugreport upstream if you can get a new oops logged. Yes, you can log the entire debug/error message output, but that depends on your system logger, and not the kernel. dmesg will always contain all messages, perhaps you could redirect that to a file if you still have a working console. I thought you already found how to reproduce it, by copying lots of files using samba. You should also try it without any non-standard modules such as fritzcapi.
(In reply to comment #10) > We can at least file a decent bugreport upstream if you can get a new oops logged. Hmm... i got some but could not log them. :( > Yes, you can log the entire debug/error message output, but that depends on your system logger, and not the kernel. And this is the best thing ont it: IT's metalog, wich stops working on the first oops, flooding the console, and making it impossible to type anything. (well i'm able to do it blindly now, so for the next oops... ;) > desg will always contain all messages, perhaps you could redirect that to a file if you still have a working console. Good idea. :) I'll try that. Let's put the bug to NEEDINFO until i got that next bug. The oopses got more seldom since 2.6.12-r1, so it could take weeks to get one while i'm on the console. > I thought you already found how to reproduce it, by copying lots of files sing > samba. I can't reproduce them anymore. Somehow .12-r1 is harder to get to an oops... (Btw : I noticed that samba takes about 80-90% of my 80mhz-cpu when syncing a profile with the client. Maybe it's a problem with metalog and a buffer overflow. I have now disabled output buffering for that reason in metalog, so i can test if that's the problem.) > You should also try it without any non-standard modules such as fritzcapi. I did that already and got oopses on the same day. So a re-enabled fritzcapi. ;) The problem is that i need firtzcapi for my answering machine for my business, so i can't leave it disabled for days... I'll check back when i got that next oops. If not leave it to NEEDINFO for some weeks... If i did not came back until then you can trash this bug if you want. ;)
Ok, marking NEEDINFO. Thanks.