Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 474432 - >=sys-kernel/gentoo-sources-3.8 - Ripup of bluetooth rfcomm causes oops/machine hang.
Summary: >=sys-kernel/gentoo-sources-3.8 - Ripup of bluetooth rfcomm causes oops/machi...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://www.gossamer-threads.com/lists...
Whiteboard: [PATCH?] linux-3.8-regression http://...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-22 23:55 UTC by Ben
Modified: 2014-05-01 17:40 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
WARN/stack dump/oops in the bluetooth rfcomm code introduced in raw kernel 3.8.x (trimmed.txt,19.63 KB, text/plain)
2013-06-22 23:55 UTC, Ben
Details
part one of 3.12.6 userspace bug of rfcomm (rfc3.patch,1.29 KB, patch)
2014-01-04 04:43 UTC, Ben
Details | Diff
part two of 3.12.6 patch to fix userspace differences in rfcomm (modman.patch,1.52 KB, patch)
2014-01-04 04:44 UTC, Ben
Details | Diff
Patch for inclusion - Part 1 (2450_RFCOMM-release-on-TTY-close.patch,1.40 KB, patch)
2014-01-17 20:16 UTC, Mike Pagano
Details | Diff
Patch for inclusion 2/4 (2451_rfcomm-get-device-function-call-move.patch,1.24 KB, patch)
2014-01-17 20:32 UTC, Mike Pagano
Details | Diff
Patch for inclusion 3/4 (2452_connection-wait-on-RFCOMM-open.patch,2.76 KB, patch)
2014-01-17 20:32 UTC, Mike Pagano
Details | Diff
Patch for inclusion 42/4 (2453-remove_unused_rfcomm_carrier_raised_def.patch,581 bytes, patch)
2014-01-17 20:32 UTC, Mike Pagano
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben 2013-06-22 23:55:54 UTC
Created attachment 351696 [details]
WARN/stack dump/oops in the bluetooth rfcomm code introduced in raw kernel 3.8.x

This is an *UPSTREAM* bug and also some collation of known information of the nature of this bug.

A bug that was introduced upstream by the bluetooth developers in 3.8.x which remains in 3.9.x will cause the machine to crash with an oops when rfcomm is disconnected while a tty is connected.  This is unexpected behavior.  While in 3.10-rc5 the behavior changed, the bug still exists.

The initial method to trigger this bug was listed in

http://forums.gentoo.org/viewtopic-t-961421-highlight-.html

In brief, set up any bluetooth rfcomm connection and then rip up the bluetooth connection (/etc/init.d/bluetooth stop, rfcomm release, use blueman to disconnect Dial Up Networking/Serial).  (I believe that pulling the bluetooth USB device from the plug also will trigger this issue, but I'd call that nonnatural behavior.)  The kernel will then stomp over another kernel structure and cause the kernel to get corrupted, making other subsystem oops.

As Gentoo appears to not have bluetooth setup for networkmanager, it should not be affected unless someone is using rfcomm directly to communicate with a bluetooth serial device, say over minicom for a bluetooth device or using pppd directly to access a bluetooth modem.  I hit the bug because I have a /etc/portage/patches/net-misc/networkmanager patch file to allow bluetooth rfcomm links.

As far as I can tell and from reports/tests upstream, this is probably due to bluetooth rfcomm not following standard tty procedures ripping up connected applications if the bluetooth link is torn down without cleaning up the tty.  A patch to expose the bad rfcomm behavior was posted on LKML on 2013 May 15, which also prevents the machine from hanging/crashing by stopping the memory corruption.  It does not fix the problem, merely instruments it (and also prevents other subsystems from dying, causing potential data loss).

The patch that Peter Hurley wrote was:

diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 6d9e0b2..a4f4fa9 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -140,6 +140,10 @@ EXPORT_SYMBOL(tty_port_destroy);
  static void tty_port_destructor(struct kref *kref)
  {
      struct tty_port *port = container_of(kref, struct tty_port, kref);
+
+    /* check if last port ref was dropped before tty release */
+    if (WARN_ON(port->itty))
+        return;
      if (port->xmit_buf)
          free_page((unsigned long)port->xmit_buf);
      tty_port_destroy(port);


Attached is the warnings and errors generated when I disable rfcomm from blueman with the above patch showing the correct trace.  Without the above patch, corruption will tend to make other functions show incorrect information and tends to completely crash/hang the machine shortly after disconnection.
Comment 1 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-07-01 17:10:04 UTC
Seems the upstream patch has been rejected; and on Launchpad, people wait too.
Comment 2 Ben 2013-07-01 17:52:03 UTC
There seems to have been a flame war on this on LKML on what to do when the illegal situation arises.  Alex's original patch was to BUG() when the problem occurs, got rejected, but Peter suggested to WARN() on the issue.  Either should be fine to notify in syslog/console trace when the improper procedure to tear down the tty occurs.

Either way, it's still just instrumenting the bug.  There still appears to be no true fixes are in sight.  On June 25 LKML there was a message reply to the subject "BUG: tty: memory corruption through tty_release/tty_ldisc_release" that indicates the bug can also be triggered by having the link open and suspend/resume the machine (ouch).

I suppose it's just because not many people use BT else this would be a fairly serious bug...
Comment 3 Ben 2013-07-09 07:19:52 UTC
Saw a message come by on linux-bluetooth/linux-serial mail lists, dated Jul 6 2013 from Gianluka Anzolin who submitted a patch versus 3.10 which may fix this issue, but the same doubts exist - people aren't sure how this piece of software really works :(

Though it was suggested that it stopped the crash from happening.
Comment 4 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-08-12 22:02:23 UTC
(In reply to Ben from comment #3)
> Saw a message come by on linux-bluetooth/linux-serial mail lists, dated Jul
> 6 2013 from Gianluka Anzolin who submitted a patch versus 3.10 which may fix
> this issue, but the same doubts exist - people aren't sure how this piece of
> software really works :(
> 
> Though it was suggested that it stopped the crash from happening.

Do you have a link to this particular message?

Thank you for finding it back in advance.
Comment 5 Ben 2013-08-12 23:43:35 UTC
Here's the latest thread I saw on the mailing list: I hope Peter's mail is indicating this will be committed into a kernel soon:

http://marc.info/?l=linux-bluetooth&m=137511052832458&w=2
Comment 7 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-08-20 15:17:30 UTC
(In reply to Jussi Saarinen from comment #6)
> Gianluca Anzolin's patches have now been merged to bluetooth-next.
> Eventually they should find their way to mainline and stable kernels.
> 
> ...

Cool, will see if these apply on top of genpatches.
Comment 8 Jussi Saarinen 2013-08-28 11:06:02 UTC
The patch series is apparently "too extensive to consider for -stable" [1]. So another solution is required for stable kernels. Gianluca's fix should eventually end up in mainline though (3.12 hopefully).

[1] http://marc.info/?l=linux-bluetooth&m=137762583515880&w=2
Comment 9 Jussi Saarinen 2013-09-02 21:27:31 UTC
Gianluca Anzolin writes on bluetooth-linux mailing list that though his tty refcount patch series is needed, more work is required to fix the problem. If I understood his mailing list message correctly, the system locks up when the device is released even after his patches have been applied.

Source:

http://marc.info/?l=linux-bluetooth&m=137788497602145&w=2
Comment 10 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-09-03 17:32:59 UTC
Thank you for reporting the status of the patches, we will await them.
Comment 11 Jussi Saarinen 2013-09-06 04:28:01 UTC
Gianluca Azolin's patches were merged to net-next day before yesterday. And yesterday they were merged to Linus' master branch. So patches will be in 3.12 rc1.
Comment 13 Ben 2013-10-01 18:35:55 UTC
I've tried 3.12-rc2, as far as I can tell just opening and closing the rfcomm in BT seems to no longer crash the box - however, some characteristics changed and NetworkManager no longer accepts /dev/rfcomm* as a valid communications device as before, so I can't fully test it.

A bit tricky - using the same userland,

Linux-3.5.7-gentoo - works flawlessly
Linux-3.8.13-gentoo - crashes when BT rfcomm is closed
Linux-3.12-rc2 (raw from kernel.org) - blueman able to set up rfcomm but networkmanager does not notice that rfcomm was setup.  Supposedly NM should notify blueman via dbus that it acknowledges the device, but blueman times out waiting and NM does not recognize the device.  In trying to diagnose the problem I tried using busybox microcom on /dev/rfcomm0 directly.  I was able to send my modem AT commands like on previous kernels, indicating the bluetooth link indeed works.  I was able to shutdown the rfcomm link on blueman as well after sending those bytes through.  No crash - a positive sign... but it may depend on the number of bytes sent.

I'll need to see why NM does not like the rfcomm device now, but I am no NM or dbus expert...
Comment 14 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-11-16 16:24:03 UTC
Seems that is indeed in all 3.12 related versions.

 # git tag --contains cc998ff8811530be521f6b316f37ab7676a07938
v3.12
v3.12-rc1
v3.12-rc2
v3.12-rc3
v3.12-rc4
v3.12-rc5
v3.12-rc6
v3.12-rc7

Can you try to see if the rest of the odd behavior has since been fixed?
Comment 15 Ben 2013-11-19 17:12:31 UTC
Drat. 3.12-release still reports that the "connection is unusable" in blueman whereas 3.6.11 (last kernel I have built that works)... still works...

hmm...needs more debug now...
Comment 16 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-12-09 13:33:55 UTC
(In reply to Ben from comment #15)
> Drat. 3.12-release still reports that the "connection is unusable" in
> blueman whereas 3.6.11 (last kernel I have built that works)... still
> works...
> 
> hmm...needs more debug now...

Aw, then this isn't the right patch to backport; can you check if this still happens in more recent versions? (gentoo-sources and git-sources)
Comment 17 Ben 2013-12-09 15:42:20 UTC
Yes, 3.12-release has the same behavior as the release candidates - they no longer crash the machine but it was not fixed correctly/completely (meaning that behavior is not quite correct.)  Will have to check future versions to find one that behaves correctly.

Debugging networkmanager will be ugly... Sigh.
Comment 18 Ben 2014-01-04 04:41:45 UTC
I think we finally have a winner patchset here.

On Linux-Bluetooth there are two patches that showed up that, when patched against 3.12.6, seemingly completely fixes the longstanding problem.  I don't know when these will show up in mainline.

The patch names are "rfc3.patch" and "modman.patch" from Gianluca Anzolin.  I'll attach the patches here.

Thanks for all of Gentoo staff for tolerating bugs like this.  I've been posting in the Ubuntu forums about this and all I get is flak.
Comment 19 Ben 2014-01-04 04:43:27 UTC
Created attachment 366926 [details, diff]
part one of 3.12.6 userspace bug of rfcomm
Comment 20 Ben 2014-01-04 04:44:13 UTC
Created attachment 366928 [details, diff]
part two of 3.12.6 patch to fix userspace differences in rfcomm
Comment 21 Mike Pagano gentoo-dev 2014-01-17 20:16:40 UTC
Created attachment 368038 [details, diff]
Patch for inclusion - Part 1

Ben, can you test the 4 part series of which this is part 1? I backported the four from upstream.
Comment 22 Mike Pagano gentoo-dev 2014-01-17 20:32:01 UTC
Created attachment 368044 [details, diff]
Patch for inclusion 2/4
Comment 23 Mike Pagano gentoo-dev 2014-01-17 20:32:22 UTC
Created attachment 368046 [details, diff]
Patch for inclusion 3/4
Comment 24 Mike Pagano gentoo-dev 2014-01-17 20:32:37 UTC
Created attachment 368048 [details, diff]
Patch for inclusion 42/4
Comment 25 Ben 2014-01-17 20:55:56 UTC
What kernel version will these patches be against?  I'm not sure what patches that went into 3.12.6 fixed the large part of the issue (i.e. the crashing) ...
Comment 26 Mike Pagano gentoo-dev 2014-01-17 21:49:10 UTC
Sorry, please apply against 3.12.8.
Comment 27 Ben 2014-01-19 01:01:09 UTC
The patches apply against 3.12.8 fine and NetworkManager finds the rfcomm interface just fine once more.

(Now I just need to move to systemd/gnome3...ugh...)

Thanks!
Comment 28 Mike Pagano gentoo-dev 2014-05-01 17:40:38 UTC
Ben, it's been awhile and I can't imagine the appropriate patches aren't in 3.14.
Please comment if that is not the case and there are still issues.