I have a Pentium 4 2.8 GHz system with Hyperthreading; 2 PVR-500 capture cards, an Air2PC BCM3510 capture card, and an pcHDTV HD-3000 capture card; an nVidia GeForce 6600 GT video card; and SiS chipset. Since 2.6.21, I have been seeing an intermittent kernel Oops whenever SMP is enabled in the kernel, regardless of whether or not the SMT/Hyperthreading scheduler is enabled. The issue does not appear when SMP is disabled (effectively leaving the HT feature of the CPU disabled). The stack trace and any pertinent system information are attached. Reproducible: Sometimes Steps to Reproduce: Problem is random with no definite repro steps. It usually happens during recordings, but it can happen at the beginning, middle, or end of a recording. System can sometimes be running for days without an issue, sometimes it can only run 10 minutes. Xorg with "nvidia" video driver is usually running at the time, but due to the random nature of this issue, it has not been possible to test it without Xorg running. Actual Results: System has a hard lock-up and is completely unresponsive to all input, including the Magic SysRq. A serial console receives the Oops details. Expected Results: System should be stable with HT on. Please see attached.
Created attachment 135466 [details] Vital system information on this bug, including oops log and system profile.
If you get kernel crashes caused by proprietary nVidia drivers, you need to complain to nVidia since we can't fix that at all.
The crash happened within the ivtv driver, jakub. However, the user is using 2.6.23 which uses the ivtv driver maintained by Linus and friends... not the ivtv driver maintained by the DVB/V4L/ivtv guys.
Regardless, please do try to reproduce this without the binary nvidia driver, otherwise upstream are likely to ignore this bug. You should be able to use the 'nv' driver instead. Also, your oops messages are missing the first line of the oops, which is rather important in understanding this crash. If you have it handy, please attach the complete message, or alternatively wait for the next one to happen and then post that one.
Gah; the MythTV QT4 frontend is intolerably slow with the "nv" driver, but playback is acceptable. I'll run it like this for a day or so and see if I can get it to happen, but I somewhat doubt that video has something to do with the problem. Most often, it occurs when the system is not playing back anything or rendering anything other than the MythTV main menu. I am willing to concede that ivtv2 and nvidia are sharing the same IRQ, though, so it's possible... Also, what is the first line of the Oops supposed to look like? In all of the "Oopses" I've gotten so far over serial, the first line is exactly as provided in the attachment -- "Oops: 0000 [#1]" with no other information before that, aside from whatever was on the screen before the Oops occurred. One other observation: I saw that this still occured in 2.6.22 when APIC was disabled, both with "noapic" on the kernel command line and with APIC support not compiled in to the kernel. However, with APIC disabled, almost all the hardware got shifted to IRQ 10 (including all tuner cards, ethernet, sata, usb, and the video card), and I got an error about "irq 10: nobody cared!" I haven't tried disabling APIC in 2.6.23, but I'd assume the result would be the same.
There have been a whole bunch of ivtv patches merged since 2.6.23. If you don't mind, it would probably be a good idea to confirm whether the latest development kernel still has the same problem. If you are up for it then I'd recommend using dev-util/ketchup to download the latest nightly snapshot. "Gah; the MythTV QT4 frontend is intolerably slow with the "nv" driver, but playback is acceptable." Is your painter OpenGL? Trying switching it to something else, if so.
Created attachment 135542 [details] Latest oops as of this evening This occurred without the proprietary nvidia module being loaded; "nvidia-drivers" was unmerged, the system was restarted, and Xorg was using the "nv" driver before this happened.
The painter I'm using is the QT4 painter -- I switched it as soon as I started using the "nv" driver as the screens took about 5-10 seconds to fade in and out when the OpenGL painer was being used (I understand that the nv driver uses Mesa because it lacks HW accel). The QT4 painter is STILL slow -- it takes about 1-2 seconds to respond to input while in the main menu, and 3-5 seconds to respond to input while in the program listing.
Okay, I'm trying the latest development build kernel, linux-2.6-tip. Is the menuconfig / uname after boot supposed to show that the version is different? It still seems to indicate that the version is "2.6.23-gentoo", even after I ran "ketchup 2.6-tip" on my source and it completed successfully.
Couple of things, first you should be using 2.6-git for the latest development snapshot. 2.6-tip will get you the latest stable kernel release. Second, uname after boot should show you a different version number, yes. How did you configure and install the kernel? Go into "General Setup" in menuconfig and check whether the "Local version" option is set to "gentoo". However, even if that is set, I would have expected it to show you "2.6.23.1-gentoo", so maybe you've not managed to boot into the new kernel for some reason. What does your grub config look like?
I'm going to try getting a clean source tree for 2.6.24rc1-git, instead of trying to shift to it from 2.6.23-gentoo. Some of the patches seem to fail otherwise. Thanks!
yes, i'd recommend that - don't use ketchup to bolt onto existing sources.
So far so good; I have not yet seen this with 2.6.24-rc1-git15 or 2.6.24-rc2-git2. Although IO was extremely slow in 2.6.24-rc1-git15, it appears better in 2.6.24-rc1-git15.
(In reply to comment #13) > So far so good; I have not yet seen this with 2.6.24-rc1-git15 or > 2.6.24-rc2-git2. Although IO was extremely slow in 2.6.24-rc1-git15, it appears > better in 2.6.24-rc1-git15. > Ahem; I meant it's better in 2.6.24-rc2-git2.
Sadly, I have to report that the system has again frozen up. Unfortunately, apparently with the kernel set to use both the serial console and the first VT as the system console, it writes Oops output to the VT only, so I was not able to catch the Oops crash dump.I have modified the kernel command line to use ONLY the serial console as the system console, and hope that this issue recurs again for me to capture. I can only surmise that the reason this did not happen over the weekend was that during the weekend there are many fewer analog recordings and many more HD recordings, so the ivtv recorders were not utilized enough to demonstrate the issue.
Created attachment 135847 [details] Oops while running 2.6.24-rc2-git2 This Oops indicates that this problem persists in 2.6.24-rc2-git2.
Created attachment 135849 [details] Kernel config I'm using for 2.6.24-rc2-git2 I'm attaching my kernel config, in case it helps.
Created attachment 135851 [details] A second kernel "Oops" with 2.6.24rc2-git2 Yet another kernel Oops with 2.6.24rc2-git2, this time with mythfrontend as the active process
Without recompiling your kernel or changing config, please do this: # emerge -n gdb # cd /usr/src/linux-2.6.24-rc2-git2 # make CONFIG_DEBUG_INFO=y drivers/media/video/ivtv/ivtv-irq.o # gdb drivers/media/video/ivtv/ivtv-irq.o then inside gdb: > list *stream_enc_dma_append+0xcc and post the output.
Created attachment 135913 [details] The GDB output from "list *stream_enc_dma_append+0xcc" The GDB output from "list *stream_enc_dma_append+0xcc".
I've done some research on the IVTV mailing list archives and in the wiki, and it appears that freezes while using Hyperthreading or multiple CPU cores is a known issue, and the recommended workaround is to disable SMP. I'm not particularly fond of this idea -- it indicates that the IVTV drivers aren't SMP safe. http://ivtvdriver.org/index.php/Troubleshooting#IVTV_may_cause_hangs_when_run_on_an_Intel_processor_with_Hyperthreading_enabled. http://www.gossamer-threads.com/lists/ivtv/users/36044?do=post_view_flat#36044
Looks like something is going badly wrong with that driver, but you have a stack trace and can reproduce it, so hopefully the maintainers will be able to get to the bottom of it. Could you please open a bug on the kernel bugzilla, here: http://bugzilla.kernel.org/ Describe the problem as you've done here, attach the logs and other information, and provide a link back to this bug. Make sure you include the range of kernel versions affected and the results from gdb. Could you also please go back into gdb as before and also add output from the following: list *ivtv_irq_handler+0xc13 Once you've created the bug please update this bug's URL to point to it. Thanks!
Thanks very much for the detailed report and assistance in debugging, Guy. We'll keep an eye on the upstream report and update this one as and when needed.