| Summary: | =sys-kernel/gentoo-sources-3.4.9 - 'rfkill block bluetooth' crashes kernel with "general protection fault" + ALSA | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | sphakka <marcoep> |
| Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | ||
| Priority: | Normal | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| URL: | https://bugzilla.redhat.com/show_bug.cgi?id=839401 | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
Kernel crash trace
Kernel-2.6.11 (gentoo sources) bug trace |
||
|
Description
sphakka
2012-09-15 17:04:38 UTC
Created attachment 323912 [details]
Kernel crash trace
Does it happen on newer kernels (3.5 / 3.6_rc) too? I managed to reproduce the bug. Firstly, note that after the crash: - a Magic SysRq REISUB wasn't enough to reboot since the keyboard started playing tricks -- for the curious ones, the 'm' key was non functional... So I had to turn my laptop off and on again. - my BT device got permanently blocked: I had to call $ rfkill unblock bluetooth that looks weird... Steps to reproduce: 0. make sure the BT device is ublocked (call rfkill unblock if needed); 1. connect to the BT device, f.i. to its 'audio sink' service; 2. call "$ rfkill block bluetooth". According to RedHat's bug this also happens on kernel 3.5.0, but I can't confirm with gentoo-sources. I've only one box and prefer to stay stable as much as possible; will try newer kernels as soon as they get stable. > I managed to reproduce the bug.
Which kernel version did you reproduce this on?
Does this still happen on a stable gentoo-sources-3.6.11 kernel and a development git-sources-3.8_rc3?
Thanks for reminding me. I tried with gentoo-sources-2.6.11 and the bug is still there: the system doesn't crash immediately, though it's unstable and SysRq is still needed to reboot :-( Created attachment 336274 [details]
Kernel-2.6.11 (gentoo sources) bug trace
The other bug you linked is unrelated since it crashes in a different process with a different stack trace. However, The most recent relevant comment I found that seems relevant is: > commit 49dfbb9129c4edb318578de35cc45c555df37884 > Author: Jaganath Kanakkassery <jaganath.k@samsung.com> > Date: Thu Jul 19 12:54:04 2012 +0530 > > Bluetooth: Fix socket not getting freed if l2cap channel create fails > > If l2cap_chan_create() fails then it will return from l2cap_sock_kill > since zapped flag of sk is reset. > > Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> The sad story, however, is that this patch was introduced in 3.6-rc2; which means that this patch is present in 3.6.11 so it's not the solution. Upon closer inspection this patch was in l2cap_sock_alloc and not in l2cap_conn_del (second function on the stack trace). This patch is however a good reference point, we know that it was bad at this time. So, let's see if they changed something to the function it crashed in; we can easily reveal this with `git diff 49dfbb9129c4edb318578de35cc45c555df37884..HEAD -- l2cap_sock.c` >@@ -823,7 +845,7 @@ static void l2cap_sock_kill(struct sock *sk) > > /* Kill poor orphan */ > >- l2cap_chan_destroy(l2cap_pi(sk)->chan); >+ l2cap_chan_put(l2cap_pi(sk)->chan); > sock_set_flag(sk, SOCK_DEAD); > sock_put(sk); > } Ah, we see in the second trace that l2cap_sock_kill calls for l2cap_chan_destroy; however, this has since been changed to a new function l2cap_chan_put. We can now use git blame to figure out when this l2cap_chan_put function was added. > 4af66c69 (Jaganath Kanakkassery 2012-07-13 18:17:55 +0530 848) After expanding that commit with git log, we get: > commit 4af66c691f4e5c2db9bb00793669a548e9db1974 > Author: Jaganath Kanakkassery <jaganath.k@samsung.com> > Date: Fri Jul 13 18:17:55 2012 +0530 > > Bluetooth: Free the l2cap channel list only when refcount is zero > > Move the l2cap channel list chan->global_l under the refcnt > protection and free it based on the refcnt. > > Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> > Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> > Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> > Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> This sounds way more like a fix to your actual problem; notice the "protection" keyword, which comes close to your "general protection fault". There's an odd thing though, this commit was applied in 3.6-rc2 as well. So, then why does it still show it like this? >void l2cap_chan_put(struct l2cap_chan *c) >{ > BT_DBG("chan %p orig refcnt %d", c, atomic_read(&c->kref.refcount)); > > kref_put(&c->kref, l2cap_chan_destroy); >} Since this only contains one function call, there is a high chance the compiler optimizes this function away. So, bummer, this one did not fix it either. So, I can only come to the conclusion that you should try a newer version like gentoo-sources-3.7.3 or git-sources-3.8_rc4 to see if something else that we can't catch right away fixes it, if it still appears on those then please report the bug upstream at http://bugzilla.kernel.org and leave a link to that bug here such that we can follow along. Good luck! Please try the latest kernel, gentoo-sources-3.8.7 or git-sources-3.9_rc6. Good news! The problem has gone with 3.7.10-gentoo-r1: rfkill works flawlessly even with processes using an active connection. For now I mark this WFM; I'll post more when the 3.8 branch becomes stable, unless someone can already confirm the fix. Sounds good, thanks for testing and keeping us up-to-date; I'm going to mark this as FIXED as WORKSFORME means it works for maintainers and should work for users, if this does appear not FIXED with 3.8 or later you can always change back the bug to CONFIRMED. :) |