Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 198420 - IVTV Kernel Oops on 2.6.23 with Pentium 4 & Hyperthreading
Summary: IVTV Kernel Oops on 2.6.23 with Pentium 4 & Hyperthreading
Status: RESOLVED OBSOLETE
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://bugzilla.kernel.org/show_bug.c...
Whiteboard: watch-linux-bugzilla
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-08 04:05 UTC by Guy A. Paddock
Modified: 2011-06-28 09:53 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Vital system information on this bug, including oops log and system profile. (IVTV Crash Vitals.txt,34.12 KB, text/plain)
2007-11-08 04:12 UTC, Guy A. Paddock
Details
Latest oops as of this evening (Latest_Oops_nv-only.txt,1.56 KB, text/plain)
2007-11-09 02:02 UTC, Guy A. Paddock
Details
Oops while running 2.6.24-rc2-git2 (IVTV Oops - 2.6.24rc2-git2.txt,2.17 KB, text/plain)
2007-11-13 01:32 UTC, Guy A. Paddock
Details
Kernel config I'm using for 2.6.24-rc2-git2 (config,47.36 KB, text/plain)
2007-11-13 02:16 UTC, Guy A. Paddock
Details
A second kernel "Oops" with 2.6.24rc2-git2 (Second IVTV Oops - 2.6.24rc2-git2.txt,1.92 KB, text/plain)
2007-11-13 05:12 UTC, Guy A. Paddock
Details
The GDB output from "list *stream_enc_dma_append+0xcc" (GDB ivtv-irq.o Output.txt,500 bytes, text/plain)
2007-11-13 18:12 UTC, Guy A. Paddock
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guy A. Paddock 2007-11-08 04:05:02 UTC
I have a Pentium 4 2.8 GHz system with Hyperthreading; 2 PVR-500 capture cards, an Air2PC BCM3510 capture card, and an pcHDTV HD-3000 capture card; an nVidia GeForce 6600 GT video card; and SiS chipset.

Since 2.6.21, I have been seeing an intermittent kernel Oops whenever SMP is enabled in the kernel, regardless of whether or not the SMT/Hyperthreading scheduler is enabled. The issue does not appear when SMP is disabled (effectively leaving the HT feature of the CPU disabled). The stack trace and any pertinent system information are attached.

Reproducible: Sometimes

Steps to Reproduce:
Problem is random with no definite repro steps. It usually happens during recordings, but it can happen at the beginning, middle, or end of a recording. System can sometimes be running for days without an issue, sometimes it can only run 10 minutes. Xorg with "nvidia" video driver is usually running at the time, but due to the random nature of this issue, it has not been possible to test it without Xorg running.
Actual Results:  
System has a hard lock-up and is completely unresponsive to all input, including the Magic SysRq. A serial console receives the Oops details.

Expected Results:  
System should be stable with HT on.

Please see attached.
Comment 1 Guy A. Paddock 2007-11-08 04:12:31 UTC
Created attachment 135466 [details]
Vital system information on this bug, including oops log and system profile.
Comment 2 Jakub Moc (RETIRED) gentoo-dev 2007-11-08 09:19:00 UTC
If you get kernel crashes caused by proprietary nVidia drivers, you need to complain to nVidia since we can't fix that at all.
Comment 3 Doug Goldstein (RETIRED) gentoo-dev 2007-11-08 14:15:32 UTC
The crash happened within the ivtv driver, jakub.

However, the user is using 2.6.23 which uses the ivtv driver maintained by Linus and friends... not the ivtv driver maintained by the DVB/V4L/ivtv guys.
Comment 4 Daniel Drake (RETIRED) gentoo-dev 2007-11-08 15:08:54 UTC
Regardless, please do try to reproduce this without the binary nvidia driver, otherwise upstream are likely to ignore this bug. You should be able to use the 'nv' driver instead.

Also, your oops messages are missing the first line of the oops, which is rather important in understanding this crash. If you have it handy, please attach the complete message, or alternatively wait for the next one to happen and then post that one.
Comment 5 Guy A. Paddock 2007-11-08 20:00:27 UTC
Gah; the MythTV QT4 frontend is intolerably slow with the "nv" driver, but playback is acceptable. I'll run it like this for a day or so and see if I can get it to happen, but I somewhat doubt that video has something to do with the problem. Most often, it occurs when the system is not playing back anything or rendering anything other than the MythTV main menu. I am willing to concede that ivtv2 and nvidia are sharing the same IRQ, though, so it's possible...

Also, what is the first line of the Oops supposed to look like? In all of the "Oopses" I've gotten so far over serial, the first line is exactly as provided in the attachment -- "Oops: 0000 [#1]" with no other information before that, aside from whatever was on the screen before the Oops occurred.

One other observation: I saw that this still occured in 2.6.22 when APIC was disabled, both with "noapic" on the kernel command line and with APIC support not compiled in to the kernel. However, with APIC disabled, almost all the hardware got shifted to IRQ 10 (including all tuner cards, ethernet, sata, usb, and the video card), and I got an error about "irq 10: nobody cared!" I haven't tried disabling APIC in 2.6.23, but I'd assume the result would be the same.
Comment 6 Duane Griffin 2007-11-08 20:51:00 UTC
There have been a whole bunch of ivtv patches merged since 2.6.23. If you don't mind, it would probably be a good idea to confirm whether the latest development kernel still has the same problem. If you are up for it then I'd recommend using dev-util/ketchup to download the latest nightly snapshot.

"Gah; the MythTV QT4 frontend is intolerably slow with the "nv" driver, but
playback is acceptable."

Is your painter OpenGL? Trying switching it to something else, if so.
Comment 7 Guy A. Paddock 2007-11-09 02:02:34 UTC
Created attachment 135542 [details]
Latest oops as of this evening

This occurred without the proprietary nvidia module being loaded; "nvidia-drivers" was unmerged, the system was restarted, and Xorg was using the "nv" driver before this happened.
Comment 8 Guy A. Paddock 2007-11-09 02:04:38 UTC
The painter I'm using is the QT4 painter -- I switched it as soon as I started using the "nv" driver as the screens took about 5-10 seconds to fade in and out when the OpenGL painer was being used (I understand that the nv driver uses Mesa because it lacks HW accel).

The QT4 painter is STILL slow -- it takes about 1-2 seconds to respond to input while in the main menu, and 3-5 seconds to respond to input while in the program listing.
Comment 9 Guy A. Paddock 2007-11-09 14:02:18 UTC
Okay, I'm trying the latest development build kernel, linux-2.6-tip.

Is the menuconfig / uname after boot supposed to show that the version is different? It still seems to indicate that the version is "2.6.23-gentoo", even after I ran "ketchup 2.6-tip" on my source and it completed successfully.
Comment 10 Duane Griffin 2007-11-09 14:44:47 UTC
Couple of things, first you should be using 2.6-git for the latest development snapshot. 2.6-tip will get you the latest stable kernel release.

Second, uname after boot should show you a different version number, yes. How did you configure and install the kernel? Go into "General Setup" in menuconfig and check whether the "Local version" option is set to "gentoo". However, even if that is set, I would have expected it to show you "2.6.23.1-gentoo", so maybe you've not managed to boot into the new kernel for some reason.

What does your grub config look like?
Comment 11 Guy A. Paddock 2007-11-09 15:38:12 UTC
I'm going to try getting a clean source tree for 2.6.24rc1-git, instead of trying to shift to it from 2.6.23-gentoo. Some of the patches seem to fail otherwise. Thanks!
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2007-11-09 15:42:06 UTC
yes, i'd recommend that - don't use ketchup to bolt onto existing sources.
Comment 13 Guy A. Paddock 2007-11-12 02:50:47 UTC
So far so good; I have not yet seen this with 2.6.24-rc1-git15 or 2.6.24-rc2-git2. Although IO was extremely slow in 2.6.24-rc1-git15, it appears better in 2.6.24-rc1-git15.
Comment 14 Guy A. Paddock 2007-11-12 02:52:09 UTC
(In reply to comment #13)
> So far so good; I have not yet seen this with 2.6.24-rc1-git15 or
> 2.6.24-rc2-git2. Although IO was extremely slow in 2.6.24-rc1-git15, it appears
> better in 2.6.24-rc1-git15.
> 

Ahem; I meant it's better in 2.6.24-rc2-git2.
Comment 15 Guy A. Paddock 2007-11-13 00:17:33 UTC
Sadly, I have to report that the system has again frozen up. Unfortunately, apparently with the kernel set to use both the serial console and the first VT as the system console, it writes Oops output to the VT only, so I was not able to catch the Oops crash dump.I have modified the kernel command line to use ONLY the serial console as the system console, and hope that this issue recurs again for me to capture.

I can only surmise that the reason this did not happen over the weekend was that during the weekend there are many fewer analog recordings and many more HD recordings, so the ivtv recorders were not utilized enough to demonstrate the issue.
Comment 16 Guy A. Paddock 2007-11-13 01:32:17 UTC
Created attachment 135847 [details]
Oops while running 2.6.24-rc2-git2

This Oops indicates that this problem persists in 2.6.24-rc2-git2.
Comment 17 Guy A. Paddock 2007-11-13 02:16:47 UTC
Created attachment 135849 [details]
Kernel config I'm using for 2.6.24-rc2-git2

I'm attaching my kernel config, in case it helps.
Comment 18 Guy A. Paddock 2007-11-13 05:12:29 UTC
Created attachment 135851 [details]
A second kernel "Oops" with 2.6.24rc2-git2

Yet another kernel Oops with 2.6.24rc2-git2, this time with mythfrontend as the active process
Comment 19 Daniel Drake (RETIRED) gentoo-dev 2007-11-13 12:00:34 UTC
Without recompiling your kernel or changing config, please do this:

# emerge -n gdb
# cd /usr/src/linux-2.6.24-rc2-git2
# make CONFIG_DEBUG_INFO=y drivers/media/video/ivtv/ivtv-irq.o
# gdb drivers/media/video/ivtv/ivtv-irq.o
then inside gdb:
> list *stream_enc_dma_append+0xcc

and post the output.
Comment 20 Guy A. Paddock 2007-11-13 18:12:15 UTC
Created attachment 135913 [details]
The GDB output from "list *stream_enc_dma_append+0xcc"

The GDB output from "list *stream_enc_dma_append+0xcc".
Comment 21 Guy A. Paddock 2007-11-13 18:15:57 UTC
I've done some research on the IVTV mailing list archives and in the wiki, and it appears that freezes while using Hyperthreading or multiple CPU cores is a known issue, and the recommended workaround is to disable SMP. I'm not particularly fond of this idea -- it indicates that the IVTV drivers aren't SMP safe.

http://ivtvdriver.org/index.php/Troubleshooting#IVTV_may_cause_hangs_when_run_on_an_Intel_processor_with_Hyperthreading_enabled.
http://www.gossamer-threads.com/lists/ivtv/users/36044?do=post_view_flat#36044
Comment 22 Duane Griffin 2007-11-14 23:23:06 UTC
Looks like something is going badly wrong with that driver, but you have a stack trace and can reproduce it, so hopefully the maintainers will be able to get to the bottom of it.

Could you please open a bug on the kernel bugzilla, here:
http://bugzilla.kernel.org/

Describe the problem as you've done here, attach the logs and other information, and provide a link back to this bug. Make sure you include the range of kernel versions affected and the results from gdb. Could you also please go back into gdb as before and also add output from the following:

list *ivtv_irq_handler+0xc13

Once you've created the bug please update this bug's URL to point to it.

Thanks!
Comment 23 Duane Griffin 2007-11-16 22:47:44 UTC
Thanks very much for the detailed report and assistance in debugging, Guy. We'll keep an eye on the upstream report and update this one as and when needed.