Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 350753 - app-cdr/cdemud-1.3.0 + sys-fs/vhba-20101015 + 2.6.37 = kernel oops and hard lock
Summary: app-cdr/cdemud-1.3.0 + sys-fs/vhba-20101015 + 2.6.37 = kernel oops and hard lock
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Marcelo Goes (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-05 23:40 UTC by Alexandre Rostovtsev (RETIRED)
Modified: 2011-03-02 23:35 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
locking patch to 1.2.1 (vhba-1.2.1-kernel-2.6.36.patch,1.37 KB, patch)
2011-01-06 00:46 UTC, Rafał Mużyło
Details | Diff
patch for kernel 2.6.37 compatibility (vhba-20101015-scsi-host-lock-push-down.patch,1.93 KB, patch)
2011-01-06 09:07 UTC, Alexandre Rostovtsev (RETIRED)
Details | Diff
Ebuild for 2.6.37 patch (vhba-20101015.ebuild,1.45 KB, text/plain)
2011-01-09 23:49 UTC, Thomas Axelsson
Details
vhba-20101015.ebuild.patch (vhba-20101015.ebuild.patch,734 bytes, patch)
2011-01-13 22:08 UTC, Michael Weber (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandre Rostovtsev (RETIRED) gentoo-dev 2011-01-05 23:40:06 UTC
cdemud-1.3.0 and vhba-20101015 on a system with the 2.6.27 kernel (specifically, an ~amd64 system with gentoo-sources-2.6.27) leads to a kernel oops, immediately followed by a kernel panic and hard lock.

In detail, to reproduce this bug:
1. Boot the machine
2. # modprobe vhba
3. # cdemud -d -c /dev/vhba_ctl -n 1
4. Observe the following delightful message:

[  223.877334] BUG: unable to handle kernel NULL pointer dereference at 0000000000000404
[  223.878143] IP: [<ffffffffa0044689>] 0xffffffffa0044689
[  223.878143] PGD 2230ef067 PUD 22318d067 PMD 0 
[  223.878143] Oops: 0000 [#1] PREEMPT SMP 
[  223.878143] last sysfs file: /sys/devices/platform/coretemp.3/name
[  223.878143] CPU 2 
[  223.878143] Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss vhba coretemp cpufreq_userspace cpufreq_powersave cpufreq_conservative snd_hda_codec_analog snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_zc3xx gspca_main snd_hda_intel firewire_ohci rtc_cmos snd_hda_codec pl2303 videodev firewire_core rtc_core usbserial forcedeth snd_hwdep usblp i2c_nforce2 tpm_tis v4l1_compat v4l2_compat_ioctl32 wacom tpm i2c_core tpm_bios snd_pcm asus_atk0110 snd_timer snd rtc_lib mac_hid snd_page_alloc tg3 libphy e1000 fuse nfs auth_rpcgss nfs_acl lockd sunrpc raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log scsi_wait_scan hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_gyration usb_storage sx8
[  223.878143] 
[  223.878143] Pid: 6703, comm: cdemud Not tainted 2.6.37-gentoo #3 P5N32-E SLI PLUS/System Product Name
[  223.878143] RIP: 0010:[<ffffffffa0044689>]  [<ffffffffa0044689>] 0xffffffffa0044689
[  223.878143] RSP: 0018:ffff88022540be28  EFLAGS: 00010202
[  223.894467] RAX: 0000000000000404 RBX: ffff88022261e660 RCX: ffff880225dc2d60
[  223.894467] RDX: ffff880221ffa808 RSI: ffff88022540be98 RDI: 00007ff0c1300ae0
[  223.894467] RBP: ffff88022540bef8 R08: 0000000000000000 R09: 0000000000000000
[  223.894467] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880221ffa804
[  223.894467] R13: 0000000000000020 R14: ffff88022261e000 R15: ffff88022540be68
[  223.894467] FS:  00007ff0c1321710(0000) GS:ffff8800cfd00000(0000) knlGS:0000000000000000
[  223.894467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.894467] CR2: 0000000000000404 CR3: 0000000223a7b000 CR4: 00000000000006e0
[  223.894467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  223.894467] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  223.894467] Process cdemud (pid: 6703, threadinfo ffff88022540a000, task ffff880225dc2d60)
[  223.894467] Stack:
[  223.894467]  ffff880225409170 ffff88022540bf58 ffff88022540bf48 ffff88022540bf48
[  223.894467]  00007ff0c1300ae0 0000000000020200 0000000000000282 ffff880221ffa818
[  223.894467]  0000000000000000 ffff880225dc2d60 ffffffff81076f90 ffff88022540be80
[  223.894467] Call Trace:
[  223.894467]  [<ffffffff81076f90>] ? autoremove_wake_function+0x0/0x40
[  223.894467]  [<ffffffff8104b0f1>] ? get_parent_ip+0x11/0x50
[  223.894467]  [<ffffffff816a579d>] ? sub_preempt_count+0x9d/0xd0
[  223.894467]  [<ffffffff81143073>] vfs_read+0xc3/0x180
[  223.894467]  [<ffffffff81143181>] sys_read+0x51/0x90
[  223.894467]  [<ffffffff8100282b>] system_call_fastpath+0x16/0x1b
[  223.894467] Code: 5c 41 5d 41 5e 41 5f c9 c3 49 8b 46 30 48 8d 75 a0 89 45 a0 48 8b bd 50 ff ff ff 49 8b 06 8b 80 84 00 00 00 89 45 a4 49 8b 46 50 <48> 8b 10 48 89 55 a8 ba 20 00 00 00 48 8b 40 08 48 89 45 b0 41 
[  223.894467] RIP  [<ffffffffa0044689>] 0xffffffffa0044689
[  223.894467]  RSP <ffff88022540be28>
[  223.894467] CR2: 0000000000000404
[  223.895566] ---[ end trace 8acbe39c45856c9a ]---

5. One or two seconds later, the kernel panics and the machine hard locks. It does not respond to magic sysrq, and must be put out of its misery via the power button.

Note that these versions of cdemud and vhba worked under 2.6.36 without any problems.
Comment 1 Alexandre Rostovtsev (RETIRED) gentoo-dev 2011-01-05 23:43:16 UTC
On the first two lines of the above comment, replace "2.6.27" with "2.6.37". I apologize for the typo.
Comment 2 Rafał Mużyło 2011-01-06 00:46:14 UTC
Created attachment 258993 [details, diff]
locking patch to 1.2.1

Could you see if 1.2.1 with this patch works ?
Comment 3 Alexandre Rostovtsev (RETIRED) gentoo-dev 2011-01-06 02:31:29 UTC
(In reply to comment #2)
> Created an attachment (id=258993) [details]
> locking patch to 1.2.1
> 
> Could you see if 1.2.1 with this patch works ?

This patch needs #include <linux/smp_lock.h> (otherwise it doesn't compile with 2.6.37), and it does not fix the bug: vhba-1.2.1 with the patch and with the correct #include produces the same oops followed by panic and hard lock.

I also tried porting the patch to vhba-20101015; that doesn't help either.
Comment 4 Alexandre Rostovtsev (RETIRED) gentoo-dev 2011-01-06 09:07:07 UTC
Created attachment 259009 [details, diff]
patch for kernel 2.6.37 compatibility

These oopses/panics are caused by the SCSI host lock push-down changes introduced in 2.6.37 (see http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f281233d3eba15fb225d21ae2e228fd4553d824a
for more details).

This patch adds some #ifdefs to enable compatibility both with 2.6.37 and older API. As far as I can tell, with the patch applied, cdemud-1.3.0 and vhba-20101015 work fine on 2.6.37.

I have also submitted this patch upstream at https://sourceforge.net/tracker/?func=detail&aid=3152330&group_id=93175&atid=603425
Comment 5 Thomas Axelsson 2011-01-09 23:49:14 UTC
Created attachment 259423 [details]
Ebuild for 2.6.37 patch

Added the 2.6.37 patch to the vhba-20101015 ebuild (put patch in files/). The patch applies nicely and no kernel oopses arises when loading vhba and using cdemu. Thanks!
Comment 6 Rafał Mużyło 2011-01-10 03:16:54 UTC
Alexandre, your patch indeed seems to work.
Thanks.
Comment 7 Michael Weber (RETIRED) gentoo-dev 2011-01-13 22:08:01 UTC
Created attachment 259748 [details, diff]
vhba-20101015.ebuild.patch

Crash + Fix verfiedin x86_64 + gentoo-sources-2.6.37 + cdemud-1.2.0
Please commit it into tree asap
Comment 8 PM 2011-02-13 18:10:46 UTC
This is a bug that causes a kernel panic after simply starting up the god damn thing. How on earth is this patch not in the tree already? I just lost a whole day of CPU time because of this. If the patch works, just put it in the tree, it's not that much work for fixing a critical kernel-crashing bug
Comment 9 Kevin McCarthy (RETIRED) gentoo-dev 2011-03-02 23:35:23 UTC
Sorry for the delay, but Marcelo seems to be away. I hope he doesn't mind that I've committed Alexanre's patch to sys-fs/vhba-20101015-r1 during his absence.

Closing this bug as fixed.