Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 392567 - sys-kernel/gentoo-sources-3.1.3: kernel NULL pointer dereference in sym53c8xx
Summary: sys-kernel/gentoo-sources-3.1.3: kernel NULL pointer dereference in sym53c8xx
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://git.kernel.org/?p=linux/kernel...
Whiteboard:
Keywords:
Depends on:
Blocks: 395535
  Show dependency tree
 
Reported: 2011-11-30 07:00 UTC by Martin von Gagern
Modified: 2012-03-04 21:18 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (sys-kernel:gentoo-sources-3.1.3.emerge--info,6.70 KB, text/plain)
2011-11-30 07:00 UTC, Martin von Gagern
Details
make log (kernel-make.log,193.29 KB, text/plain)
2011-11-30 18:48 UTC, Martin von Gagern
Details
Stackj trace screen photo (gentoo392567a.png,72.64 KB, image/png)
2011-11-30 19:29 UTC, Martin von Gagern
Details
Fix NULL ptr dereferencei n sym538cxx_2 (sym538cx-fix-null-ptr-deref.patch,536 bytes, patch)
2011-12-04 00:35 UTC, Stratos Psomadakis (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2011-11-30 07:00:16 UTC
Created attachment 294281 [details]
emerge --info

Upgrading from gentoo-sources 3.1.1 to 3.1.3, my system now fails to boot, printing a kernel stack trace instead. Took a set of photos, but manually copying three screens full of mostly hex data will take some time, so I'd like to postpone that until someone actually has a use for it. The essence appears to be this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000018
IP: [<...>] sym53c8xx_slave_destroy+0x68/0x190 [sym53c8xx]
...
Pid: 791, comm: modprobe Not tainted 3.1.3-gentoo #1 ...
...
Call Trace:
[<...>] ? __scsi_remove_device+0x64/0xd0
[<...>] ? scsi_alloc_sdev+0x21c/0x260
...
[<...>] ? sys_init_module+...
[<...>] ? system_call_fastpath+...
...
note: modprobe[791] exited with preempt_count 1
udevd[661]: '/sbin/modprobe -bv pci:v00001000d0000000Fsv00000000sd00000000bc01sc00i00' [791] terminated by signal 9 (Killed)

As there appears to be no change to the code base of the sym53c8xx module, I guess some other and unrelated modification causes this problem.

The above was the second boot that failed for me; I'm not sure if the one before failed for the same reason, though, as scrollback didn't work in that case. So there might be different symptoms to the same underlying problem. I'm back to 3.1.1 for now.
Comment 1 Göktürk Yüksek archtester gentoo-dev 2011-11-30 15:17:39 UTC
(In reply to comment #0)
> BUG: unable to handle kernel NULL pointer dereference at 0000000000018
Can you recompile the kernel and look at the build log to see any compiler warnings pointing to this problem?
Comment 2 Martin von Gagern 2011-11-30 18:48:57 UTC
Created attachment 294377 [details]
make log

(In reply to comment #1)
> Can you recompile the kernel and look at the build log to see any compiler
> warnings pointing to this problem?

Output from "make" in the kernel source tree, after a previous "make clean".
I see nothing suspicious there. There are a few warnings, but in what appears to be unrelated code.
Comment 3 Martin von Gagern 2011-11-30 19:29:20 UTC
Created attachment 294381 [details]
Stackj trace screen photo

Won't type this yet, but here is the stack trace as montaged from my photographs.
Comment 4 Martin von Gagern 2011-12-01 09:13:19 UTC
OK, had a look at the code, and have located where this issue occurs.
Quoting sym_glue.c starting at line 835:

static void sym53c8xx_slave_destroy(struct scsi_device *sdev)
{
	struct sym_hcb *np = sym_get_hcb(sdev->host);
	struct sym_tcb *tp = &np->target[sdev->id];
	struct sym_lcb *lp = sym_lp(tp, sdev->lun);
	unsigned long flags;

	spin_lock_irqsave(np->s.host->host_lock, flags);

	if (lp->busy_itlq || lp->busy_itl) {

At this point, lp (stored in R14) is NULL, so accessing its members will fail. The address 0x18 mentioned in the error message is exactly
&((struct sym_lcb*)NULL)->busy_itlq. sym_lp appears to be a macro from sym_hidp.h, which on my machine takes the multi-LUN form.

Currently I've got the impression that something in the scsi subsystem tries to destroy a device which hasn't been created before. But that's a wild guess.

Will try to reproduce this with vanilla-sources next.
Comment 5 Martin von Gagern 2011-12-01 21:12:50 UTC
(In reply to comment #4)
> Will try to reproduce this with vanilla-sources next.

Could reproduce with vanilla-sources-3.1.4. Will try to git bisect this, although I can't promise I won't loose patience with all those reboots.
Comment 6 Martin von Gagern 2011-12-01 23:05:47 UTC
(In reply to comment #5)
> Will try to git bisect this,
> although I can't promise I won't loose patience with all those reboots.

Ten reboots later, I got the culprit:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf6f111b5e891b4cfbd4f966488fd824543ba2aa

Usually I'd take this to bugzilla.kernel.org, but that seems to be down at the moment. Feel free to forward this upstream once services are restored, or to some mailing list if you think that more useful than bugzilla.
Comment 7 Stratos Psomadakis (RETIRED) gentoo-dev 2011-12-04 00:35:33 UTC
Created attachment 294661 [details, diff]
Fix NULL ptr dereferencei n sym538cxx_2

Martin, can you test the attached patch? I have also sent it upstream. If it works, and if you want, a Reported/Tested-by tag could be added in the patch (either this one, if it works and gets accepted, or subsequent patches if needed).

Thanks.
Comment 8 Martin von Gagern 2011-12-04 14:27:12 UTC
(In reply to comment #7)
> Martin, can you test the attached patch?

Applied, system boots, dmesg looks sane:

sym53c8xx 0000:03:07.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[...]
sym0: <875> rev 0x3 at pci 0000:03:07.0 irq 22
sym0: Symbios NVRAM, ID 7, Fast-20, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: SCSI BUS has been reset.
scsi6 : sym-2.2.3
scsi target6:0:0: Scan at boot disabled in NVRAM
scsi: killing requests for dead queue
scsi target6:0:1: Multiple LUNs disabled in NVRAM

> If it works, and if you want, a Reported/Tested-by tag could be added in the
> patch

Yes, please do.

Thanks for managing the upstream side of this, and for coming up with this patch!
Comment 9 Stratos Psomadakis (RETIRED) gentoo-dev 2011-12-21 13:08:54 UTC
I'm going to bump the upstream thread / patch, and if there's no response, I think we can add it to genpatches (until/if it's get merged upstream).
Comment 10 Mike Pagano gentoo-dev 2012-03-04 21:18:07 UTC
This has made it to 3.0, 3.1 and 3.2 kernels