Skip to content

Conversation

andreiw
Copy link

@andreiw andreiw commented Oct 26, 2015

These can only be generated if the thread is not in HV
mode, even with VPM.

5.7.2 of PowerISA_v2.07: "If the thread is not in hypervisor state,
and either address translation is enabled and VPM1=1, or address
translation is disabled and VPM0=1, conditions that would have caused
a Data Storage or an Instruction Storage interrupt if the affected
memory were not virtu- alized instead cause a Hypervisor Data Storage
or a Hypervisor Instruction Storage interrupt respectively."

6.5.16: "A Hypervisor Data Storage interrupt occurs when no higher
priority exception exists, the thread is not in hypervisor state, and
..."

Signed-off-by: Andrei Warkentin [email protected]

ozbenh pushed a commit that referenced this pull request Nov 6, 2015
qpci_msix_pending() writes on pba region, causing qemu to SEGV:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7ffff7fba8c0 (LWP 25882)]
  0x0000000000000000 in ?? ()
  (gdb) bt
  #0  0x0000000000000000 in  ()
  #1  0x00005555556556c5 in memory_region_oldmmio_write_accessor (mr=0x5555579f3f80, addr=0, value=0x7fffffffbf68, size=4, shift=0, mask=4294967295, attrs=...) at /home/elmarco/src/qemu/memory.c:434
  #2  0x00005555556558e1 in access_with_adjusted_size (addr=0, value=0x7fffffffbf68, size=4, access_size_min=1, access_size_max=4, access=0x55555565563e <memory_region_oldmmio_write_accessor>, mr=0x5555579f3f80, attrs=...) at /home/elmarco/src/qemu/memory.c:506
  qemu#3  0x00005555556581eb in memory_region_dispatch_write (mr=0x5555579f3f80, addr=0, data=0, size=4, attrs=...) at /home/elmarco/src/qemu/memory.c:1176
  qemu#4  0x000055555560b6f9 in address_space_rw (as=0x555555eff4e0 <address_space_memory>, addr=3759147008, attrs=..., buf=0x7fffffffc1b0 "", len=4, is_write=true) at /home/elmarco/src/qemu/exec.c:2439
  qemu#5  0x000055555560baa2 in cpu_physical_memory_rw (addr=3759147008, buf=0x7fffffffc1b0 "", len=4, is_write=1) at /home/elmarco/src/qemu/exec.c:2534
  qemu#6  0x000055555564c005 in cpu_physical_memory_write (addr=3759147008, buf=0x7fffffffc1b0, len=4) at /home/elmarco/src/qemu/include/exec/cpu-common.h:80
  qemu#7  0x000055555564cd9c in qtest_process_command (chr=0x55555642b890, words=0x5555578de4b0) at /home/elmarco/src/qemu/qtest.c:378
  qemu#8  0x000055555564db77 in qtest_process_inbuf (chr=0x55555642b890, inbuf=0x55555641b340) at /home/elmarco/src/qemu/qtest.c:569
  qemu#9  0x000055555564dc07 in qtest_read (opaque=0x55555642b890, buf=0x7fffffffc2e0 "writel 0xe0100800 0x0\n", size=22) at /home/elmarco/src/qemu/qtest.c:581
  qemu#10 0x000055555574ce3e in qemu_chr_be_write (s=0x55555642b890, buf=0x7fffffffc2e0 "writel 0xe0100800 0x0\n", len=22) at qemu-char.c:306
  qemu#11 0x0000555555751263 in tcp_chr_read (chan=0x55555642bcf0, cond=G_IO_IN, opaque=0x55555642b890) at qemu-char.c:2876
  qemu#12 0x00007ffff64c9a8a in g_main_context_dispatch (context=0x55555641c400) at gmain.c:3122

(without this patch, this can be reproduced with the ivshmem qtest)

Implement an empty mmio write to avoid the crash.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 6, 2015
A new gdb commands are added:

  qemu handlers

     That dumps an AioContext list (by default qemu_aio_context)
     possibly including a backtrace for cases it knows about
     (with the verbose option).  Intended to help find why something
     is hanging waiting for IO.

  Use 'qemu handlers --verbose iohandler_ctx'  to find out why
your incoming migration is stuck.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Message-id: [email protected]

V2:
  Merge into one command with optional handlers arg, and only do
    backtrace in verbose mode

 (gdb) qemu handlers
 ----
 {pfd = {fd = 6, events = 25, revents = 0}, io_read = 0x55869656ffd0
 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque =
 0x558698c4ce08, node = {le_next = 0x0, le_prev = 0x558698c4cdc0}}

 (gdb) qemu handlers iohandler_ctx
 ----
 {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380
 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque =
 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev =
 0x558698c4c1d0}}
 ----
 {pfd = {fd = 4, events = 25, revents = 0}, io_read = 0x55869657b330
 <sigfd_handler>, io_write = 0x0, deleted = 0, opaque = 0x4, node =
 {le_next = 0x558698c4c260, le_prev = 0x558699f72508}}
 ----
 {pfd = {fd = 5, events = 25, revents = 0}, io_read = 0x55869656ffd0
 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque =
 0x558698c4c218, node = {le_next = 0x0, le_prev = 0x558698c4ccc8}}
 ----
 (gdb) qemu handlers --verbose iohandler_ctx
 ----
 {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380
 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque =
 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev =
 0x558698c4c1d0}}
 #0  0x0000558696581820 in qemu_coroutine_switch
 (from_=from_@entry=0x558698cb3cf0, to_=to_@entry=0x7f421c37eac8,
 action=action@entry=COROUTINE_YIELD) at
 /home/dgilbert/git/qemu/coroutine-ucontext.c:177
 #1  0x0000558696580c00 in qemu_coroutine_yield () at
 /home/dgilbert/git/qemu/qemu-coroutine.c:145
 #2  0x00005586965814f5 in yield_until_fd_readable (fd=9) at
 /home/dgilbert/git/qemu/qemu-coroutine-io.c:90
 qemu#3  0x0000558696523937 in socket_get_buffer (opaque=0x55869a3dc620,
 buf=0x558698c505a0 "", pos=<optimized out>, size=32768) at
 /home/dgilbert/git/qemu/migration/qemu-file-unix.c:101
 qemu#4  0x0000558696521fac in qemu_fill_buffer (f=0x558698c50570) at
 /home/dgilbert/git/qemu/migration/qemu-file.c:227
 qemu#5  0x0000558696522989 in qemu_peek_byte (f=0x558698c50570, offset=0)
     at /home/dgilbert/git/qemu/migration/qemu-file.c:507
 qemu#6  0x0000558696522bf4 in qemu_get_be32 (f=0x558698c50570) at
 /home/dgilbert/git/qemu/migration/qemu-file.c:520
 qemu#7  0x0000558696522bf4 in qemu_get_be32 (f=f@entry=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/qemu-file.c:604
 qemu#8  0x0000558696347e5c in qemu_loadvm_state (f=f@entry=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/savevm.c:1821
 qemu#9  0x000055869651de8c in process_incoming_migration_co
 (opaque=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/migration.c:336
 qemu#10 0x000055869658188a in coroutine_trampoline (i0=<optimized out>,
 i1=<optimized out>)
     at /home/dgilbert/git/qemu/coroutine-ucontext.c:80
 qemu#11 0x00007f420f05df10 in __start_context () at /lib64/libc.so.6
 qemu#12 0x00007ffc40815f50 in  ()
 qemu#13 0x0000000000000000 in  ()

  ----
Signed-off-by: Stefan Hajnoczi <[email protected]>
@ozbenh ozbenh force-pushed the powernv branch 3 times, most recently from 554592d to de26bb4 Compare November 11, 2015 00:24
@ozbenh ozbenh force-pushed the powernv branch 4 times, most recently from 7643b1c to 7ecd38c Compare November 24, 2015 02:06
ozbenh and others added 19 commits May 12, 2016 11:32
We don't use the resulting accessors and this gets in the way of
the split I/D TLB work.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
We rework the way the MMU indices are calculated, providing separate
indices for I and D side based on MSR:IR and MSR:DR respectively,
and thus no longer need to flush the TLB on context changes. This also
adds correct support for HV as a separate address space.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.

However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.

Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...

Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.

This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.

This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[clg: added a 'CPUPPCState *' variable in h_remove() and
      h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <[email protected]>
We don't give them a KVM reg number yet as no current KVM version
supports HV mode.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[clg: SPRs AMOR,DAWR,DARWX were already included in commit f401dd3]
Signed-off-by: Cédric Le Goater <[email protected]>
This helper is only used by the various instructions that can alter
MSR and not interrupts. Add a comment to that effect to the interrupt
code as well in case somebody wants to change this

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
This reworks emulation of the various "rfi" variants. I removed
some masking bits that I couldn't make sense of, the only bit that
I am aware we should mask here is POW, the CPU's MSR mask should
take care of the rest.

This also fixes some problems when running 32-bit userspace under
a 64-bit kernel.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
We use an env. flag which is set to the initial value of MSR_HVB in
the msr_mask. We also adjust the POWER8 mask to set SHV.

Also use this to adjust ctx.hv so that it is *set* when the processor
doesn't have an HV mode (970 with Apple mode for example), thus enabling
hypervisor instructions/SPRs.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Not that anything remotely recent supports tlbia but ...

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Otherwise it will trip on the forms used in recent architecture.

Ideally, we should have different handlers for different architecture
levels but our current implementation of TLB flushing is dumb enough
that this will do for now.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Otherwise tight loops at smt_low for example, which OPAL does,
eat so much CPU that we can't boot a kernel anymore. With that,
I can boot 8 CPUs just fine with powernv.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
This will enable decoding of hrfid

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Also use it to clamp the max SMT mode and ensure that the cpu_dt_id
are offset by that value in order to preserve consistency with the
HW implementations.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
And move the code adjusting the MSR mask and calling kvmppc_set_papr()
to it. This allows us to add a few more things such as disabling setting
of MSR:HV and appropriate LPCR bits which will be used when fixing
the exception model.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Properly implement LPES0/1 handling for HV vs. !HV mode and fix AIL
implementation.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
We were initializing unused ones and missing some

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Under some circumstances, we need to direct ISI and DSI interrupts
at the hypervisor, turning them into HISI/HDSI, and using different
SPRs (HDSISR and HDAR) depending on the combination of MSR_DR and
the corresponding VPM bits in LPCR.

This moves part of the code into helpers that are fixed to select
the right exception type and registers. On pre-P7 processors, LPCR
is 0 which provides the old behaviour of directing the interrupts
at the supervisor.

Thanks to Andrei Warkentin for finding a bug when HV=1

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: David Gibson <[email protected]>
Recent server processors use the Hypervisor Emulation Assistance
interrupt for illegal instructions and *some* type of SPR accesses.

Also the code was always generating inval instructions even for priv
violations due to setting the wrong flags

Finally, the checking for PR/HV was open coded everywhere.

This reworks it all, using little helper macros for checking, and
adding the HV interrupt (which gets converted back to program check
in the slow path of excp_helper.c on CPUs that don't want it).

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Those instructions are only available in hypervisor real mode and
allow cache inhibited garded access to devices in that mode.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
ozbenh pushed a commit that referenced this pull request Aug 7, 2016
The vnc_server_info_get will allocate the VncServerInfo
struct and then call vnc_init_basic_info_from_server_addr
to populate the basic fields. If this returns an error
though, the qapi_free_VncServerInfo call will then crash
because the VncServerInfo struct instance was not properly
NULL-initialized and thus contains random stack garbage.

 #0  0x00007f1987c8e6f5 in raise () at /lib64/libc.so.6
 #1  0x00007f1987c902fa in abort () at /lib64/libc.so.6
 #2  0x00007f1987ccf600 in __libc_message () at /lib64/libc.so.6
 qemu#3  0x00007f1987cd7d4a in _int_free () at /lib64/libc.so.6
 qemu#4  0x00007f1987cdb2ac in free () at /lib64/libc.so.6
 qemu#5  0x00007f198b654f6e in g_free () at /lib64/libglib-2.0.so.0
 qemu#6  0x0000559193cdcf54 in visit_type_str (v=v@entry=
     0x5591972f14b0, name=name@entry=0x559193de1e29 "host", obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899d80)
     at qapi/qapi-visit-core.c:255
 qemu#7  0x0000559193cca8f3 in visit_type_VncBasicInfo_members (v=v@entry=
     0x5591972f14b0, obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899dc0) at qapi-visit.c:12307
 qemu#8  0x0000559193ccb523 in visit_type_VncServerInfo_members (v=v@entry=
     0x5591972f14b0, obj=0x5591961dbfa0, errp=errp@entry=0x7fffd7899e00) at qapi-visit.c:12632
 qemu#9  0x0000559193ccb60b in visit_type_VncServerInfo (v=v@entry=
     0x5591972f14b0, name=name@entry=0x0, obj=obj@entry=0x7fffd7899e48, errp=errp@entry=0x0) at qapi-visit.c:12658
 qemu#10 0x0000559193cb53d8 in qapi_free_VncServerInfo (obj=<optimized out>) at qapi-types.c:3970
 qemu#11 0x0000559193c1e6ba in vnc_server_info_get (vd=0x7f1951498010) at ui/vnc.c:233
 qemu#12 0x0000559193c24275 in vnc_connect (vs=0x559197b2f200, vs=0x559197b2f200, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:284
 qemu#13 0x0000559193c24275 in vnc_connect (vd=vd@entry=0x7f1951498010, sioc=sioc@entry=0x559196bf9c00, skipauth=skipauth@entry=tru e, websocket=websocket@entry=false) at ui/vnc.c:3039
 qemu#14 0x0000559193c25806 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>)
     at ui/vnc.c:3877
 qemu#15 0x0000559193a90c28 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7fffd7899f90)
     at qmp-marshal.c:105
 qemu#16 0x000055919399c2b7 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>)
     at /home/berrange/src/virt/qemu/monitor.c:3971
 qemu#17 0x0000559193ce3307 in json_message_process_token (lexer=0x559194ab0838, input=0x559194a6d940, type=JSON_RCURLY, x=111, y=1 2) at qobject/json-streamer.c:105
 qemu#18 0x0000559193cfa90d in json_lexer_feed_char (lexer=lexer@entry=0x559194ab0838, ch=125 '}', flush=flush@entry=false)
     at qobject/json-lexer.c:319
 qemu#19 0x0000559193cfaa1e in json_lexer_feed (lexer=0x559194ab0838, buffer=<optimized out>, size=<optimized out>)
     at qobject/json-lexer.c:369
 qemu#20 0x0000559193ce33c9 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>)
     at qobject/json-streamer.c:124
 qemu#21 0x000055919399a85b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
     at /home/berrange/src/virt/qemu/monitor.c:3987
 qemu#22 0x0000559193a87d00 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x559194a7d900)
     at qemu-char.c:2895
 qemu#23 0x00007f198b64f703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
 qemu#24 0x0000559193c484b3 in main_loop_wait () at main-loop.c:213
 qemu#25 0x0000559193c484b3 in main_loop_wait (timeout=<optimized out>) at main-loop.c:258
 qemu#26 0x0000559193c484b3 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
 qemu#27 0x0000559193964c55 in main () at vl.c:1908
 qemu#28 0x0000559193964c55 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4603

This was introduced in

  commit 98481bf
  Author: Eric Blake <[email protected]>
  Date:   Mon Oct 26 16:34:45 2015 -0600

    vnc: Hoist allocation of VncBasicInfo to callers

which added error reporting for vnc_init_basic_info_from_server_addr
but didn't change the g_malloc calls to g_malloc0.

Signed-off-by: Daniel P. Berrange <[email protected]>
Message-id: [email protected]
Signed-off-by: Gerd Hoffmann <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN:

=================================================================
==27907==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4120 byte(s) in 1 object(s) allocated from:
    #0 0x7f913458ce50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f9133fd641d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x5561c6643c95 in qdict_crumple_test_recursive /home/elmarco/src/qq/tests/check-block-qdict.c:438
    qemu#3 0x7f9133ff7c49  (/lib64/libglib-2.0.so.0+0x73c49)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
[Screwed up in commit 2860b2b]
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN, during make check...

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x7f8e27262c48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f8e26a5f3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    #2 0x555ab67078a8 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:67
    qemu#3 0x555ab67071e4 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:24
    qemu#4 0x555ab6713fbf in qstring_from_escaped_str /home/elmarco/src/qq/qobject/json-parser.c:144
    qemu#5 0x555ab671738c in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:506
    qemu#6 0x555ab67179c3 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:569
    qemu#7 0x555ab6715123 in parse_pair /home/elmarco/src/qq/qobject/json-parser.c:306
    qemu#8 0x555ab6715483 in parse_object /home/elmarco/src/qq/qobject/json-parser.c:357
    qemu#9 0x555ab671798b in parse_value /home/elmarco/src/qq/qobject/json-parser.c:561
    qemu#10 0x555ab6717a6b in json_parser_parse_err /home/elmarco/src/qq/qobject/json-parser.c:592
    qemu#11 0x555ab4fd4dcf in handle_qmp_command /home/elmarco/src/qq/monitor.c:4257
    qemu#12 0x555ab6712c4d in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:105
    qemu#13 0x555ab67e01e2 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:323
    qemu#14 0x555ab67e0af6 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:373
    qemu#15 0x555ab6713010 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:124
    qemu#16 0x555ab4fd58ec in monitor_qmp_read /home/elmarco/src/qq/monitor.c:4337
    qemu#17 0x555ab6559df2 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175
    qemu#18 0x555ab6559e95 in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187
    qemu#19 0x555ab6560127 in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    qemu#20 0x555ab65d9c73 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    qemu#21 0x7f8e26a598ac in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x4c8ac)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
[Screwed up in commit b273145]
Cc: [email protected]
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    qemu#3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    qemu#4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    qemu#5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    qemu#6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    qemu#7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    qemu#8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    qemu#9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    qemu#10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    qemu#11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    qemu#12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    qemu#13 0x00007f96fe858760 in ?? ()
    qemu#14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.

The backtrace is:
    (gdb) bt
    #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
    #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
    #2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
    qemu#3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
    qemu#4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
    qemu#5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
        at migration/qemu-file-channel.c:83
    qemu#6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
    qemu#7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
    qemu#8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
    qemu#9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
    qemu#10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
    qemu#11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
    qemu#12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
    qemu#13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6

This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().

Signed-off-by: Lidong Chen <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    qemu#3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    qemu#4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    qemu#5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    qemu#6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    qemu#7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    qemu#8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    qemu#9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    qemu#10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    qemu#11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    qemu#12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    qemu#13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    #2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    qemu#3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    qemu#4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    qemu#5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    qemu#6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
tests/cdrom-test -p /x86_64/cdrom/boot/megasas

Produces the following ASAN leak.

==25700==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    #2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818
    qemu#3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698
    qemu#4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574
    qemu#5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955
    qemu#6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119
    qemu#7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170
    qemu#8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527
    qemu#9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594
    qemu#10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473
    qemu#11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255
    qemu#12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294
    qemu#13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384
    qemu#14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395
    qemu#15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729
    qemu#16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969
    qemu#17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215
    qemu#18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504
    qemu#19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593)

Move the qemu_sglist_destroy() from megasas_complete_command() to
megasas_unmap_frame(), so map/unmap are balanced.

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Marc-André Lureau <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN:

=================================================================
==5378==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x7f788f83bc48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7f788c9923c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    #2 0x5622a1fe37bc in coroutine_trampoline /home/elmarco/src/qq/util/coroutine-ucontext.c:116
    qemu#3 0x7f788a15d75f in __correctly_grouped_prefixwc (/lib64/libc.so.6+0x4c75f)

(Broken in commit 4c8158e.)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-id: [email protected]
Signed-off-by: Max Reitz <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
We need to set cs->halted to 1 before calling ppc_set_compat. The reason
is that ppc_set_compat kicks up the new thread created to manage the
hotplugged KVM virtual CPU and the code drives directly to KVM_RUN
ioctl. When cs->halted is 1, the code:

int kvm_cpu_exec(CPUState *cpu)
...
     if (kvm_arch_process_async_events(cpu)) {
         atomic_set(&cpu->exit_request, 0);
         return EXCP_HLT;
     }
...

returns before it reaches KVM_RUN, giving time to the main thread to
finish its job. Otherwise we can fall in a deadlock because the KVM
thread will issue the KVM_RUN ioctl while the main thread is setting up
KVM registers. Depending on how these jobs are scheduled we'll end up
freezing QEMU.

The following output shows kvm_vcpu_ioctl sleeping because it cannot get
the mutex and never will.
PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr.

STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL

PID: 61564  TASK: c000003e981e0780  CPU: 48  COMMAND: "qemu-system-ppc"
 #0 [c000003e982679a0] __schedule at c000000000b10a44
 #1 [c000003e98267a60] schedule at c000000000b113a8
 #2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910
 qemu#3 [c000003e98267ab0] __mutex_lock at c000000000b132ec
 qemu#4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm]
 qemu#5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30
 qemu#6 [c000003e98267dc0] ksys_ioctl at c000000000408674
 qemu#7 [c000003e98267e10] sys_ioctl at c0000000004086f8
 qemu#8 [c000003e98267e30] system_call at c00000000000b488

crash> struct -x kvm.vcpus 0xc000003da0000000
vcpus = {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc000003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...}

crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000
  mutex.owner = {
    counter = 0xc000003a23a5c881 <- flag 1: waiters
  },

crash> bt 0xc000003a23a5c880
PID: 61579  TASK: c000003a23a5c880  CPU: 9   COMMAND: "CPU 4/KVM"
(active)

crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000
  mutex.wait_list = {
    next = 0xc000003e98267b10,
    prev = 0xc000003e98267b10
  },

crash> struct -x mutex_waiter.task 0xc000003e98267b10
  task = 0xc000003e981e0780

The following command-line was used to reproduce the problem (note: gdb
and trace can change the results).

 $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \
     -enable-kvm -m 4096 \
     -smp 4,maxcpus=8,sockets=1,cores=2,threads=4 \
     -display none -nographic \
     -drive file=disk1.qcow2,format=qcow2
 ...
 (qemu) device_add host-spapr-cpu-core,core-id=4
[no interaction is possible after it, only SIGKILL to take the terminal
back]

Signed-off-by: Jose Ricardo Ziviani <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Signed-off-by: David Gibson <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN doing some manual testing:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561
    qemu#3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630
    qemu#4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038
    qemu#5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498
    qemu#6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371
    qemu#7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393
    qemu#8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354
    qemu#9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175
    qemu#10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187
    qemu#11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66
    qemu#12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84
    qemu#13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
In qemu_laio_process_completions_and_submit, the AioContext is acquired
before the ioq_submit iteration and after qemu_laio_process_completions,
but the latter is not thread safe either.

This change avoids a number of random crashes when the Main Thread and
an IO Thread collide processing completions for the same AioContext.
This is an example of such crash:

 - The IO Thread is trying to acquire the AioContext at aio_co_enter,
   which evidences that it didn't lock it before:

Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
 #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
 #2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
    at ../nptl/pthread_mutex_lock.c:114
 qemu#3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
 qemu#4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
 qemu#5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
 qemu#6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
 qemu#10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
    at util/aio-posix.c:693
 qemu#11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
 qemu#12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
 qemu#13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

 - The Main Thread is also processing completions from the same
   AioContext, and crashes due to failed assertion at util/iov.c:78:

Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
 #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
 #1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
 #2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
    at assert.c:92
 qemu#3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
 qemu#4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
 qemu#5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
 qemu#6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
 qemu#8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
    at block/linux-aio.c:237
 qemu#9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
 qemu#10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
 qemu#11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
 qemu#12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
    at util/aio-posix.c:604
 qemu#13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
 qemu#14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
 qemu#15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
 qemu#16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
 qemu#17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
 qemu#18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
 qemu#19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
    at block/commit.c:92
 qemu#20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
 qemu#21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
 qemu#22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
 qemu#23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
 qemu#24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
 qemu#25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
 qemu#26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
 qemu#27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
 qemu#28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
 qemu#29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
 qemu#30 0x00005631fbd81412 in main_loop () at vl.c:1866
 qemu#31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4647

 - A closer examination shows that s->io_q.in_flight appears to have
   gone backwards:

(gdb) frame 7
 qemu#7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
    at block/linux-aio.c:222
222	            qemu_laio_process_completion(laiocb);
(gdb) p s
$2 = (LinuxAioState *) 0x7fdfc0297670
(gdb) p *s
$3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
    in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
      sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
(gdb) p/x s->io_q.in_flight
$4 = 0xfffffff0

Signed-off-by: Sergio Lopez <[email protected]>
Signed-off-by: Kevin Wolf <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN while running:

$ tests/migration-test -p /x86_64/migration/postcopy/recovery

=================================================================
==18034==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 33864 byte(s) in 1 object(s) allocated from:
    #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183
    qemu#3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159
    qemu#4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78
    qemu#5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295
    qemu#6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111
    qemu#7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91
    qemu#8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108
    qemu#9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158
    qemu#10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89
    qemu#11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Recently, the test case has started failing because some job related
functions want to drop the AioContext lock even though it hasn't been
taken:

    (gdb) bt
    #0  0x00007f51c067c9fb in raise () from /lib64/libc.so.6
    #1  0x00007f51c067e77d in abort () from /lib64/libc.so.6
    #2  0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
    qemu#3  0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
    qemu#4  0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516
    qemu#5  0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738
    qemu#6  0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986
    qemu#7  0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941
    qemu#8  0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148
    qemu#9  0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
    qemu#10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
    qemu#11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
    qemu#12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783
    qemu#13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402
    qemu#14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
    qemu#15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478
    qemu#16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    qemu#19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0
    qemu#20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
    qemu#21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581

It is yet unclear whether this should really be considered a bug in the
test case or whether blk_unref() should work for callers that haven't
taken the AioContext lock, but in order to fix the build tests quickly,
just take the AioContext lock around blk_unref().

Signed-off-by: Kevin Wolf <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
A socket chardev may not have associated address (when adding client
fd manually for example). But on disconnect, updating socket filename
expects an address and may lead to this crash:

  Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  388	    switch (addr->type) {
  (gdb) bt
  #0  0x0000555555d8c70c in SocketAddress_to_str (prefix=0x555556043062 "disconnected:", addr=0x0, is_listen=false, is_telnet=false) at /home/elmarco/src/qq/chardev/char-socket.c:388
  #1  0x0000555555d8c8aa in update_disconnected_filename (s=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:419
  #2  0x0000555555d8c959 in tcp_chr_disconnect (chr=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:438
  qemu#3  0x0000555555d8cba1 in tcp_chr_hup (channel=0x555556b75690, cond=G_IO_HUP, opaque=0x555556b1ed00) at /home/elmarco/src/qq/chardev/char-socket.c:482
  qemu#4  0x0000555555da596e in qio_channel_fd_source_dispatch (source=0x555556bb68b0, callback=0x555555d8cb58 <tcp_chr_hup>, user_data=0x555556b1ed00) at /home/elmarco/src/qq/io/channel-watch.c:84

Replace filename with a generic "disconnected:socket" in this case.

Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
ozbenh pushed a commit that referenced this pull request Nov 1, 2018
Spotted by ASAN:
=================================================================
==11893==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1120 byte(s) in 28 object(s) allocated from:
    #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48)
    #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5)
    #2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66
    qemu#3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23
    qemu#4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143
    qemu#5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484
    qemu#6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547
    qemu#7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573
    qemu#8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92
    qemu#9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292
    qemu#10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339
    qemu#11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121
    qemu#12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69
    qemu#13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83
    qemu#14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30
    qemu#15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781
    qemu#16 0x7fd05101bc49 in test_case_run gtestutils.c:2255
    qemu#17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
Signed-off-by: Markus Armbruster <[email protected]>
shenki pushed a commit to open-power/qemu that referenced this pull request Dec 11, 2018
The way we determine if we can start the incoming migration was
changed to use migration_has_all_channels() in:

  commit 428d890
  Author: Juan Quintela <[email protected]>
  Date:   Mon Jul 24 13:06:25 2017 +0200

    migration: Create migration_has_all_channels

This method in turn calls multifd_recv_all_channels_created()
which is hardcoded to always return 'true' when multifd is
not in use. This is a latent bug...

...activated in a following commit where that return result
ends up acting as the flag to indicate whether it is possible
to start processing the migration:

  commit 36c2f8b
  Author: Juan Quintela <[email protected]>
  Date:   Wed Mar 7 08:40:52 2018 +0100

    migration: Delay start of migration main routines

This means that if channel initialization fails with normal
migration, it'll never notice and attempt to start the
incoming migration regardless and crash on a NULL pointer.

This can be seen, for example, if a client connects to a server
requiring TLS, but has an invalid x509 certificate:

qemu-system-x86_64: The certificate hasn't got a known issuer
qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed.

 #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
 ozbenh#1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
 ozbenh#2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
 qemu#3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
 qemu#4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386
 qemu#5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
 qemu#6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
 qemu#7  0x0000000000000000 in  ()

To handle the non-multifd case, we check whether mis->from_src_file
is non-NULL. With this in place, the migration server drops the
rejected client and stays around waiting for another, hopefully
valid, client to arrive.

Signed-off-by: Daniel P. Berrangé <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Juan Quintela <[email protected]>
Reviewed-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Juan Quintela <[email protected]>
shenki pushed a commit to open-power/qemu that referenced this pull request Dec 11, 2018
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.

Instead, use some TLS variables to check for recursion and queue the
events.

Fixes:
 (gdb) bt
 #0  0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
 ozbenh#1  0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
 ozbenh#2  0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
 qemu#3  0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
 qemu#4  0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
 qemu#5  0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
 qemu#6  0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
 qemu#7  0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
 qemu#8  0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
 qemu#9  0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
 qemu#10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
 qemu#11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
 qemu#12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
 qemu#13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
 qemu#14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 qemu#15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
 qemu#16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
 qemu#17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
 qemu#18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
 qemu#19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
 qemu#20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
 qemu#21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
 qemu#22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
 qemu#23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
 qemu#24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
 qemu#25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
 qemu#26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
 qemu#27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
 qemu#28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
 qemu#29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
 qemu#30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644

Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)

Signed-off-by: Marc-André Lureau <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Markus Armbruster <[email protected]>
[*_no_recurse renamed to *_no_reenter, local variables reordered]
Signed-off-by: Markus Armbruster <[email protected]>
daxtens pushed a commit to daxtens/qemu that referenced this pull request Jan 7, 2020
Currently offloads disabled by guest via the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
command are not preserved on VM migration.
Instead all offloads reported by guest features (via VIRTIO_PCI_GUEST_FEATURES)
get enabled.
What happens is: first the VirtIONet::curr_guest_offloads gets restored and offloads
are getting set correctly:

 #0  qemu_set_offload (nc=0x555556a11400, csum=1, tso4=0, tso6=0, ecn=0, ufo=0) at net/net.c:474
 ozbenh#1  virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
 ozbenh#2  virtio_net_post_load_device (opaque=0x555557701ca0, version_id=11) at hw/net/virtio-net.c:2334
 qemu#3  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577c80 <vmstate_virtio_net_device>, opaque=0x555557701ca0, version_id=11)
     at migration/vmstate.c:168
 qemu#4  virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2197
 qemu#5  virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
 qemu#6  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
 qemu#7  vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
 qemu#8  qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
 qemu#9  qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
 qemu#10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
 qemu#11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449

However later on the features are getting restored, and offloads get reset to
everything supported by features:

 #0  qemu_set_offload (nc=0x555556a11400, csum=1, tso4=1, tso6=1, ecn=0, ufo=0) at net/net.c:474
 ozbenh#1  virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
 ozbenh#2  virtio_net_set_features (vdev=0x555557701ca0, features=5104441767) at hw/net/virtio-net.c:773
 qemu#3  virtio_set_features_nocheck (vdev=0x555557701ca0, val=5104441767) at hw/virtio/virtio.c:2052
 qemu#4  virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2220
 qemu#5  virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
 qemu#6  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
 qemu#7  vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
 qemu#8  qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
 qemu#9  qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
 qemu#10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
 qemu#11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449

Fix this by preserving the state in saved_guest_offloads field and
pushing out offload initialization to the new post load hook.

Cc: [email protected]
Signed-off-by: Mikhail Sennikovsky <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
daxtens pushed a commit to daxtens/qemu that referenced this pull request Jan 7, 2020
Guests can crash QEMU when writting to PnP registers:

  $ echo 'writeb 0x800ff042 69' | qemu-system-sparc -M leon3_generic -S -bios /etc/magic -qtest stdio
  [I 1571938309.932255] OPENED
  [R +0.063474] writeb 0x800ff042 69
  Segmentation fault (core dumped)

  (gdb) bt
  #0  0x0000000000000000 in  ()
  ozbenh#1  0x0000555f4bcdf0bc in memory_region_write_with_attrs_accessor (mr=0x555f4d7be8c0, addr=66, value=0x7fff07d00f08, size=1, shift=0, mask=255, attrs=...) at memory.c:503
  ozbenh#2  0x0000555f4bcdf185 in access_with_adjusted_size (addr=66, value=0x7fff07d00f08, size=1, access_size_min=1, access_size_max=4, access_fn=0x555f4bcdeff4 <memory_region_write_with_attrs_accessor>, mr=0x555f4d7be8c0, attrs=...) at memory.c:539
  qemu#3  0x0000555f4bce2243 in memory_region_dispatch_write (mr=0x555f4d7be8c0, addr=66, data=69, op=MO_8, attrs=...) at memory.c:1489
  qemu#4  0x0000555f4bc80b20 in flatview_write_continue (fv=0x555f4d92c400, addr=2148528194, attrs=..., buf=0x7fff07d01120 "E", len=1, addr1=66, l=1, mr=0x555f4d7be8c0) at exec.c:3161
  qemu#5  0x0000555f4bc80c65 in flatview_write (fv=0x555f4d92c400, addr=2148528194, attrs=..., buf=0x7fff07d01120 "E", len=1) at exec.c:3201
  qemu#6  0x0000555f4bc80fb0 in address_space_write (as=0x555f4d7aa460, addr=2148528194, attrs=..., buf=0x7fff07d01120 "E", len=1) at exec.c:3291
  qemu#7  0x0000555f4bc8101d in address_space_rw (as=0x555f4d7aa460, addr=2148528194, attrs=..., buf=0x7fff07d01120 "E", len=1, is_write=true) at exec.c:3301
  qemu#8  0x0000555f4bcdb388 in qtest_process_command (chr=0x555f4c2ed7e0 <qtest_chr>, words=0x555f4db0c5d0) at qtest.c:432

Instead of crashing, log the access as unimplemented.

Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: KONRAD Frederic <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Apr 13, 2021
We can't know the caller read enough data in the memory pointed
by ext_hdr to cast it as a ip6_ext_hdr_routing.
Declare rt_hdr on the stack and fill it again from the iovec.

Since we already checked there is enough data in the iovec buffer,
simply add an assert() call to consume the bytes_read variable.

This fix a 2 bytes buffer overrun in eth_parse_ipv6_hdr() reported
by QEMU fuzzer:

  $ cat << EOF | ./qemu-system-i386 -M pc-q35-5.0 \
    -accel qtest -monitor none \
    -serial none -nographic -qtest stdio
  outl 0xcf8 0x80001010
  outl 0xcfc 0xe1020000
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  write 0x25 0x1 0x86
  write 0x26 0x1 0xdd
  write 0x4f 0x1 0x2b
  write 0xe1020030 0x4 0x190002e1
  write 0xe102003a 0x2 0x0807
  write 0xe1020048 0x4 0x12077cdd
  write 0xe1020400 0x4 0xba077cdd
  write 0xe1020420 0x4 0x190002e1
  write 0xe1020428 0x4 0x3509d807
  write 0xe1020438 0x1 0xe2
  EOF
  =================================================================
  ==2859770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdef904902 at pc 0x561ceefa78de bp 0x7ffdef904820 sp 0x7ffdef904818
  READ of size 1 at 0x7ffdef904902 thread T0
      #0 0x561ceefa78dd in _eth_get_rss_ex_dst_addr net/eth.c:410:17
      ozbenh#1 0x561ceefa41fb in eth_parse_ipv6_hdr net/eth.c:532:17
      ozbenh#2 0x561cef7de639 in net_tx_pkt_parse_headers hw/net/net_tx_pkt.c:228:14
      qemu#3 0x561cef7dbef4 in net_tx_pkt_parse hw/net/net_tx_pkt.c:273:9
      qemu#4 0x561ceec29f22 in e1000e_process_tx_desc hw/net/e1000e_core.c:730:29
      qemu#5 0x561ceec28eac in e1000e_start_xmit hw/net/e1000e_core.c:927:9
      qemu#6 0x561ceec1baab in e1000e_set_tdt hw/net/e1000e_core.c:2444:9
      qemu#7 0x561ceebf300e in e1000e_core_write hw/net/e1000e_core.c:3256:9
      qemu#8 0x561cef3cd4cd in e1000e_mmio_write hw/net/e1000e.c:110:5

  Address 0x7ffdef904902 is located in stack of thread T0 at offset 34 in frame
      #0 0x561ceefa320f in eth_parse_ipv6_hdr net/eth.c:486

    This frame has 1 object(s):
      [32, 34) 'ext_hdr' (line 487) <== Memory access at offset 34 overflows this variable
  HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
        (longjmp and C++ exceptions *are* supported)
  SUMMARY: AddressSanitizer: stack-buffer-overflow net/eth.c:410:17 in _eth_get_rss_ex_dst_addr
  Shadow bytes around the buggy address:
    0x10003df188d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df188e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df188f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18910: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  =>0x10003df18920:[02]f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x10003df18970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Stack left redzone:      f1
    Stack right redzone:     f3
  ==2859770==ABORTING

Add the corresponding qtest case with the fuzzer reproducer.

FWIW GCC 11 similarly reported:

  net/eth.c: In function 'eth_parse_ipv6_hdr':
  net/eth.c:410:15: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds]
    410 |     if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) {
        |          ~~~~~^~~~~~~
  net/eth.c:485:24: note: while referencing 'ext_hdr'
    485 |     struct ip6_ext_hdr ext_hdr;
        |                        ^~~~~~~
  net/eth.c:410:38: error: array subscript 'struct ip6_ext_hdr_routing[0]' is partly outside array bounds of 'struct ip6_ext_hdr[1]' [-Werror=array-bounds]
    410 |     if ((rthdr->rtype == 2) && (rthdr->segleft == 1)) {
        |                                 ~~~~~^~~~~~~~~
  net/eth.c:485:24: note: while referencing 'ext_hdr'
    485 |     struct ip6_ext_hdr ext_hdr;
        |                        ^~~~~~~

Cc: [email protected]
Buglink: https://bugs.launchpad.net/qemu/+bug/1879531
Reported-by: Alexander Bulekov <[email protected]>
Reported-by: Miroslav Rezanina <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Miroslav Rezanina <[email protected]>
Fixes: eb70002 ("net_pkt: Extend packet abstraction as required by e1000e functionality")
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Apr 13, 2021
Incoming enabled bitmaps are busy, because we do
bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps
being migrated are not marked busy, and user can remove them during the
incoming migration. Then we may crash in cancel_incoming_locked() when
try to remove the bitmap that was already removed by user, like this:

 #0  qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20
   "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77
 ozbenh#1  bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9)
   at ../block/dirty-bitmap.c:64
 ozbenh#2  bdrv_release_dirty_bitmap (bitmap=0x5596810e9570)
   at ../block/dirty-bitmap.c:362
 qemu#3  cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>)
   at ../migration/block-dirty-bitmap.c:918
 qemu#4  dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0
   <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194
 qemu#5  vmstate_load (f=0x559681d02b10, se=0x559680fb5810)
   at ../migration/savevm.c:908
 qemu#6  qemu_loadvm_section_part_end (f=0x559681d02b10,
   mis=0x559680fb4a30) at ../migration/savevm.c:2473
 qemu#7  qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30)
   at ../migration/savevm.c:2626
 qemu#8  postcopy_ram_listen_thread (opaque=0x0)
   at ../migration/savevm.c:1871
 qemu#9  qemu_thread_start (args=0x5596817ccd10)
   at ../util/qemu-thread-posix.c:521
 qemu#10 start_thread () at /lib64/libpthread.so.0
 qemu#11 clone () at /lib64/libc.so.6

Note bs pointer taken from bitmap: it's definitely bad aligned. That's
because we are in use after free, bitmap is already freed.

So, let's make disabled bitmaps (being migrated) busy during incoming
migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>
Signed-off-by: Stefan Hajnoczi <[email protected]>
Message-Id: <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Apr 13, 2021
When building with --enable-sanitizers we get:

  Direct leak of 32 byte(s) in 2 object(s) allocated from:
      #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf)
      ozbenh#1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958)
      ozbenh#2 0x561847f02ca2 in usb_packet_init hw/usb/core.c:531:5
      qemu#3 0x561848df4df4 in usb_ehci_init hw/usb/hcd-ehci.c:2575:5
      qemu#4 0x561847c119ac in ehci_sysbus_init hw/usb/hcd-ehci-sysbus.c:73:5
      qemu#5 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9
      qemu#6 0x56184a5bd955 in object_init_with_type qom/object.c:371:9
      qemu#7 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5
      qemu#8 0x56184a5a24d5 in object_initialize qom/object.c:536:5
      qemu#9 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5
      qemu#10 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10
      qemu#11 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5
      qemu#12 0x561849542d18 in npcm7xx_init hw/arm/npcm7xx.c:427:5

Similarly to commit d710e1e ("usb: ehci: fix memory leak in
ehci"), fix by calling usb_ehci_finalize() to free the USBPacket.

Fixes: 7341ea0
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Gerd Hoffmann <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Apr 13, 2021
When building with --enable-sanitizers we get:

  Direct leak of 16 byte(s) in 1 object(s) allocated from:
      #0 0x5618479ec7cf in malloc (qemu-system-aarch64+0x233b7cf)
      ozbenh#1 0x7f675745f958 in g_malloc (/lib64/libglib-2.0.so.0+0x58958)
      ozbenh#2 0x561847c2dcc9 in xlnx_dp_init hw/display/xlnx_dp.c:1259:5
      qemu#3 0x56184a5bdab8 in object_init_with_type qom/object.c:375:9
      qemu#4 0x56184a5a2bda in object_initialize_with_type qom/object.c:517:5
      qemu#5 0x56184a5a24d5 in object_initialize qom/object.c:536:5
      qemu#6 0x56184a5a2f6c in object_initialize_child_with_propsv qom/object.c:566:5
      qemu#7 0x56184a5a2e60 in object_initialize_child_with_props qom/object.c:549:10
      qemu#8 0x56184a5a3a1e in object_initialize_child_internal qom/object.c:603:5
      qemu#9 0x5618495aa431 in xlnx_zynqmp_init hw/arm/xlnx-zynqmp.c:273:5

The RX/TX FIFOs are created in xlnx_dp_init(), add xlnx_dp_finalize()
to destroy them.

Fixes: 58ac482 ("introduce xlnx-dp")
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Reviewed-by: Alistair Francis <[email protected]>
Message-id: [email protected]
Signed-off-by: Peter Maydell <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Apr 13, 2021
g_hash_table_add always retains ownership of the pointer passed in as
the key. Its return status merely indicates whether the added entry was
new, or replaced an existing entry. Thus key must never be freed after
this method returns.

Spotted by ASAN:

==2407186==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020003ac4f0 at pc 0x7ffff766659c bp 0x7fffffffd1d0 sp 0x7fffffffc980
READ of size 1 at 0x6020003ac4f0 thread T0
    #0 0x7ffff766659b  (/lib64/libasan.so.6+0x8a59b)
    ozbenh#1 0x7ffff6bfa843 in g_str_equal ../glib/ghash.c:2303
    ozbenh#2 0x7ffff6bf8167 in g_hash_table_lookup_node ../glib/ghash.c:493
    qemu#3 0x7ffff6bf9b78 in g_hash_table_insert_internal ../glib/ghash.c:1598
    qemu#4 0x7ffff6bf9c32 in g_hash_table_add ../glib/ghash.c:1689
    qemu#5 0x5555596caad4 in module_load_one ../util/module.c:233
    qemu#6 0x5555596ca949 in module_load_one ../util/module.c:225
    qemu#7 0x5555596ca949 in module_load_one ../util/module.c:225
    qemu#8 0x5555596cbdf4 in module_load_qom_all ../util/module.c:349

Typical C bug...

Fixes: 9062912 ("module: use g_hash_table_add()")
Cc: [email protected]
Signed-off-by: Marc-André Lureau <[email protected]>
Reviewed-by: Daniel P. Berrangé <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Jul 28, 2021
modinfo runs the preprocessor and therefore needs all generated input files
to be there.  The "depends" clause does not work in Meson 0.55.3, so for
now use "input".

Part ozbenh#2: Update the rule for target-specific modules too.

Signed-off-by: Gerd Hoffmann <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Jul 28, 2021
…m' into staging

Bugfixes.

# gpg: Signature made Sat 24 Jul 2021 07:11:18 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "[email protected]"
# gpg: Good signature from "Paolo Bonzini <[email protected]>" [full]
# gpg:                 aka "Paolo Bonzini <[email protected]>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini-gitlab/tags/for-upstream:
  qom: use correct field name when getting/setting alias properties
  qapi: introduce forwarding visitor
  gitlab: only let pages be published from default branch
  MAINTAINERS: Add memory_mapping.h and memory_mapping.c to "Memory API"
  MAINTAINERS: Add Peter Xu and myself as co-maintainer of "Memory API"
  MAINTAINERS: Replace Eduardo as "Host Memory Backends" maintainer
  i386: do not call cpudef-only models functions for max, host, base
  target/i386: Added consistency checks for CR3
  meson: fix dependencies for modinfo ozbenh#2

Signed-off-by: Peter Maydell <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Dec 17, 2021
When trying to use the pc-dimm device on a non-NUMA machine, we get:

  $ qemu-system-arm -M none -cpu max -S \
      -object memory-backend-file,id=mem1,size=1M,mem-path=/tmp/1m \
      -device pc-dimm,id=dimm1,memdev=mem1
  Segmentation fault (core dumped)

  (gdb) bt
  #0  pc_dimm_realize (dev=0x555556da3e90, errp=0x7fffffffcd10) at hw/mem/pc-dimm.c:184
  ozbenh#1  0x0000555555fe1f8f in device_set_realized (obj=0x555556da3e90, value=true, errp=0x7fffffffce18) at hw/core/qdev.c:531
  ozbenh#2  0x0000555555feb4a9 in property_set_bool (obj=0x555556da3e90, v=0x555556e54420, name=0x5555563c3c41 "realized", opaque=0x555556a704f0, errp=0x7fffffffce18) at qom/object.c:2257

To avoid that crash, restrict the pc-dimm NUMA check to machines
supporting NUMA, and do not allow the use of 'node' property on
non-NUMA machines.

Suggested-by: Igor Mammedov <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
legoater pushed a commit to open-power/qemu that referenced this pull request Dec 17, 2021
Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==287878==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000344
  ==287878==The signal is caused by a WRITE memory access.
  ==287878==Hint: address points to the zero page.
      #0 0x564b2e5bac27 in blk_inc_in_flight block/block-backend.c:1346:5
      ozbenh#1 0x564b2e5bb228 in blk_pwritev_part block/block-backend.c:1317:5
      ozbenh#2 0x564b2e5bcd57 in blk_pwrite block/block-backend.c:1498:11
      qemu#3 0x564b2ca1cdd3 in fdctrl_write_data hw/block/fdc.c:2221:17
      qemu#4 0x564b2ca1b2f7 in fdctrl_write hw/block/fdc.c:829:9
      qemu#5 0x564b2dc49503 in portio_write softmmu/ioport.c:201:9

Add the reproducer for CVE-2021-20196.

Suggested-by: Alexander Bulekov <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Signed-off-by: Philippe Mathieu-Daudé <[email protected]>
Message-id: [email protected]
Signed-off-by: John Snow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants