Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 480356 - app-emulation/qemu-1.5.2-r1 - segmentation fault in qemu-system-x86 when connecting to monitor device
Summary: app-emulation/qemu-1.5.2-r1 - segmentation fault in qemu-system-x86 when conn...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Doug Goldstein (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-09 12:35 UTC by Another Mortal
Modified: 2013-08-30 14:54 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
the line-swap (pty_chr_state.patch,429 bytes, patch)
2013-08-18 20:29 UTC, Another Mortal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Another Mortal 2013-08-09 12:35:27 UTC
I got a mysterious segfault when trying to connect to the monitor device
on a KVM virtual machine that used to work fine with qemu-1.4.2 (just
upgraded, about to downgrade back again).

Possibly interesting command-line options include:
-vga none -nographic -serial pty -monitor pty


The VM comes up and works fine.  I can connect to the serial console fine,
but when trying to connect to qemu monitor on a pty results in a segfault.
(And, the VM just dies.)

Relevant line from the kernel logs:

Aug 09 09:58:54 [kernel] [333854.034805] qemu-system-x86[26443]: segfault at 7fff84bf0fe8 ip 00007efffd458718 sp 00007fff84bf0ff0 error 6 in libc-2.15.so[7efffd411000+19e000]

I'm using socat to connect (socat -,raw,echo=0,escape=0x1d /dev/pts/X,raw,echo=0).

Interestingly, using a unix socket instead of a pty for the monitor works fine.

I've also tried downgrading to qemu-1.4.2, and with that version '-monitor pty'
works as expected.

This is on hardened/SELinux, in case that makes a difference.
Comment 1 Doug Goldstein (RETIRED) gentoo-dev 2013-08-12 01:20:30 UTC
Got the fix queued up in my pending stable branch, I'll spin a 1.5.2-r2 from there shortly.
Comment 2 Another Mortal 2013-08-17 14:07:07 UTC
Just tested 1.5.2-r2 (on another box): same problem...

Aug 17 16:02:59 eleven kernel: [ 6303.023961] qemu-system-x86[25859]: segfault at 7fff3be63f7c ip 00007f36a6879d62 sp 00007fff3be63e70 error 6 in libc-2.17.so[7f36a6830000+1a2000]
Comment 3 Another Mortal 2013-08-17 14:10:26 UTC
This seems to be specific to '-monitor pty',
since '-monitor' works with e.g. 'stdio' or
'unix:', and 'pty' works fine with '-serial'.
Comment 4 Doug Goldstein (RETIRED) gentoo-dev 2013-08-17 18:07:19 UTC
(In reply to Another Mortal from comment #2)
> Just tested 1.5.2-r2 (on another box): same problem...
> 
> Aug 17 16:02:59 eleven kernel: [ 6303.023961] qemu-system-x86[25859]:
> segfault at 7fff3be63f7c ip 00007f36a6879d62 sp 00007fff3be63e70 error 6 in
> libc-2.17.so[7f36a6830000+1a2000]

Can you run qemu via gdb and provide a backtrace? Otherwise it'll be near impossible to track this down.
Comment 5 Another Mortal 2013-08-18 17:10:05 UTC
Well, I doubt this will be very useful, but here it comes:

# gdb /usr/bin/qemu-system-x86_64  
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/bin/qemu-system-x86_64...(no debugging symbols found)...done.
(gdb) run  -cdrom /tmp/mini.iso -boot d -monitor pty
Starting program: /usr/bin/qemu-system-x86_64 -cdrom /tmp/mini.iso -boot d -monitor pty
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
char device redirected to /dev/pts/5 (label compat_monitor0)
[New Thread 0x7fffeebb3700 (LWP 10718)]
[New Thread 0x7fffee1d0700 (LWP 10719)]

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7fffee1d0700 (LWP 10719)]
0x000055555586bdd8 in ?? ()
(gdb) c
Continuing.
[Thread 0x7fffeebb3700 (LWP 10718) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fc6900 (LWP 10714)]
0x00007ffff4b1edc8 in vfprintf () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff4b1edc8 in vfprintf () from /lib64/libc.so.6
#1  0x00007ffff4bd88f1 in __vasprintf_chk () from /lib64/libc.so.6
#2  0x00007ffff7932ffb in g_vasprintf () from /usr/lib64/libglib-2.0.so.0
#3  0x00007ffff7910e1d in g_strdup_vprintf () from /usr/lib64/libglib-2.0.so.0
#4  0x00005555557d0b7b in ?? ()
#5  0x00005555557d0d14 in ?? ()
#6  0x00005555557d5e4d in ?? ()
--- cut ---


I suppose a debug build would be needed to get something reasonable here...
(BTW, I got the same info just running gdb on the resulting core file.)
Comment 6 Another Mortal 2013-08-18 20:23:25 UTC
Alright; so, I built qemu with USE=debug FEATURES=nostrip...

Here's what gdb says when trying to run qemu via the debugger:

---------
# gdb /usr/bin/qemu-system-x86_64
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/bin/qemu-system-x86_64...done.
(gdb) run -cdrom /tmp/mini.iso -boot d -monitor pty
Starting program: /usr/bin/qemu-system-x86_64 -cdrom /tmp/mini.iso -boot d -monitor pty
Warning:
Cannot insert breakpoint -1.
Error accessing memory address 0x319330: Input/output error.

(gdb)
--------

I've never seen this before and don't have time to investigate further.

Instead, I ran qemu directly and looked at the core with gdb.
Something is royally messed up as the backtrace continues forever
(well, I gave up after a few *thousand* lines), but here's the first
few tens of lines:

------------------------
# gdb /usr/bin/qemu-system-x86_64 -c core                                                                                  
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/bin/qemu-system-x86_64...done.

warning: core file may not match specified executable file.
[New LWP 10010]
[New LWP 10012]
[New LWP 10011]

warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `qemu-system-x86_64 -cdrom /tmp/mini.iso -boot d -monitor pty'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f02387addc8 in vfprintf () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f02387addc8 in vfprintf () from /lib64/libc.so.6
#1  0x00007f02388678f1 in __vasprintf_chk () from /lib64/libc.so.6
#2  0x00007f023b5c1ffb in g_vasprintf () from /usr/lib64/libglib-2.0.so.0
#3  0x00007f023b59fe1d in g_strdup_vprintf () from /usr/lib64/libglib-2.0.so.0
#4  0x00007f023bf0acfb in monitor_vprintf (ap=<optimized out>, fmt=<optimized out>, mon=0x7f023cc72cd0)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:332
#5  monitor_vprintf (mon=0x7f023cc72cd0, fmt=<optimized out>, ap=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:321
#6  0x00007f023bf0ae94 in monitor_printf (mon=<optimized out>, 
    fmt=fmt@entry=0x7f023c012448 "QEMU %s monitor - type 'help' for more information\n")
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:341
#7  0x00007f023bf0ffcd in monitor_event (opaque=0x7f023cc72cd0, event=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:4698
#8  0x00007f023be62e7e in pty_chr_state (chr=chr@entry=0x7f023cc681d0, connected=connected@entry=1)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1152
#9  0x00007f023be62f58 in pty_chr_update_read_handler (chr=0x7f023cc681d0)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1128
#10 0x00007f023be63025 in pty_chr_write (chr=<optimized out>, buf=<optimized out>, len=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1070
#11 0x00007f023bf09bb9 in monitor_flush (mon=0x7f023cc72cd0)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:285
#12 monitor_flush (mon=0x7f023cc72cd0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:271
#13 0x00007f023bf09d34 in monitor_puts (mon=mon@entry=0x7f023cc72cd0, str=0x7f023d154626 "", 
    str@entry=0x7f023d1545f0 "QEMU 1.5.2 monitor - type 'help' for more information\n")
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:316
#14 0x00007f023bf0ad09 in monitor_vprintf (ap=<optimized out>, fmt=<optimized out>, mon=0x7f023cc72cd0)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:333
#15 monitor_vprintf (mon=0x7f023cc72cd0, fmt=<optimized out>, ap=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:321
#16 0x00007f023bf0ae94 in monitor_printf (mon=<optimized out>, 
    fmt=fmt@entry=0x7f023c012448 "QEMU %s monitor - type 'help' for more information\n")
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:341
#17 0x00007f023bf0ffcd in monitor_event (opaque=0x7f023cc72cd0, event=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:4698
#18 0x00007f023be62e7e in pty_chr_state (chr=chr@entry=0x7f023cc681d0, connected=connected@entry=1)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1152
#19 0x00007f023be62f58 in pty_chr_update_read_handler (chr=0x7f023cc681d0)
    at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1128
---Type <return> to continue, or q <return> to quit---q
(gdb)
------------------------


Now, since the loop (hopefully, the notation is self-explanatory):
   qemu-char.c:1070,1128,1152;monitor.c:4698,341,321,333,316,271,285
seems to repeat "forever" in the backtrace (except in the beginning,
where after monitor.c:321 the process/thread doesn't make it to 333,
as it dies on the line before in a call to g_strdup_vprintf), I got
curious about just *how* far back this behavior continues..

Well...  after 'set pagination off' and a few minutes of putting up
with a barely responsive machine, I got to the bottom of the trace:

---------------
#163697 0x00007f023bf0ffcd in monitor_event (opaque=0x7f023cc72cd0, event=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:4698
#163698 0x00007f023be62e7e in pty_chr_state (chr=chr@entry=0x7f023cc681d0, connected=connected@entry=1) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1152
#163699 0x00007f023be62f58 in pty_chr_update_read_handler (chr=0x7f023cc681d0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1128
#163700 0x00007f023be63025 in pty_chr_write (chr=<optimized out>, buf=<optimized out>, len=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1070
#163701 0x00007f023bf09bb9 in monitor_flush (mon=0x7f023cc72cd0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:285
#163702 monitor_flush (mon=0x7f023cc72cd0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:271
#163703 0x00007f023bf09d34 in monitor_puts (mon=mon@entry=0x7f023cc72cd0, str=0x7f023cd6afd6 "", str@entry=0x7f023cd6afa0 "QEMU 1.5.2 monitor - type 'help' for more information\n") at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:316
#163704 0x00007f023bf0ad09 in monitor_vprintf (ap=<optimized out>, fmt=<optimized out>, mon=0x7f023cc72cd0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:333
#163705 monitor_vprintf (mon=0x7f023cc72cd0, fmt=<optimized out>, ap=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:321
#163706 0x00007f023bf0ae94 in monitor_printf (mon=<optimized out>, fmt=fmt@entry=0x7f023c012448 "QEMU %s monitor - type 'help' for more information\n") at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:341
#163707 0x00007f023bf0ffcd in monitor_event (opaque=0x7f023cc72cd0, event=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/monitor.c:4698
#163708 0x00007f023be62e7e in pty_chr_state (chr=chr@entry=0x7f023cc681d0, connected=connected@entry=1) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1152
#163709 0x00007f023be62f58 in pty_chr_update_read_handler (chr=0x7f023cc681d0) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1128
#163710 0x00007f023be62fb5 in pty_chr_timer (opaque=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/qemu-char.c:1041
#163711 0x00007f023b5841fb in ?? () from /usr/lib64/libglib-2.0.so.0
#163712 0x00007f023b5835e5 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#163713 0x00007f023be39b38 in glib_pollfds_poll () at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/main-loop.c:187
#163714 os_host_main_loop_wait (timeout=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/main-loop.c:232
#163715 main_loop_wait (nonblocking=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/main-loop.c:464
#163716 0x00007f023bd17f01 in main_loop () at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/vl.c:2029
#163717 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /var/tmp/portage/app-emulation/qemu-1.5.2-r2/work/qemu-1.5.2/vl.c:4419
(gdb)
---------------


So, this "infinite stack-eating loop" is set off by 'pty_chr_timer' and
it pretty much seems to continue until it quite simply runs out of stack
space, all the while thinking that the pty is not connected...


pty_chr_state _reasonably(?)_ only sets s->connected=1 *after* a successful
call to monitor_event (via qemu_chr_be_event)

monitor_event in its turn wants to display the prompt right away (since it
was told, it just got connected)

as a result, monitor_flush calls pty_chr_write (via qemu_chr_fe_write)

now, pty_chr_write thinking it's *NOT* connected (s->connected == 0)
calls pty_chr_state again (via pty_chr_update_read_handler).


I'm not quite sure what's "the right thing" to do to break this loop..
Personally, I'd just swap the lines:
            qemu_chr_be_generic_open(chr);
            s->connected = 1;
in pty_chr_state, but I have no idea if that might cause any side effects.

In 1.4.2, qemu_chr_generic_open sets a timer to fire the backend event 1ms
later (allowing time for s->connected to be set), possibly to avoid this
nightmare.  That may or may not be needed to avoid other race conditions.
I honestly don't know.

I built qemu with that change and the '-monitor pty' issue has disappeared.
Hopefully, no other issues got introduced.  Time will tell...
(-serial pty does work...)
Comment 7 Another Mortal 2013-08-18 20:29:37 UTC
Created attachment 356398 [details, diff]
the line-swap

A convenience gesture to those who want to test this,
by dropping the rather trivial patch to
  /etc/portage/patches/app-emulation/qemu-1.5.2-r2
Comment 8 Another Mortal 2013-08-19 09:40:37 UTC
I've just checked 1.6.0, and those 2 lines _are_ swapped there as well!
Comment 9 Doug Goldstein (RETIRED) gentoo-dev 2013-08-27 17:38:42 UTC
Can you give qemu-1.5.3 in tree a shot?
Comment 10 Another Mortal 2013-08-30 14:54:07 UTC
(In reply to Doug Goldstein from comment #9)
> Can you give qemu-1.5.3 in tree a shot?

Works for me.  (And, so does qemu-1.6.0 ...)