Skip to content

Commit 8c4ef20

Browse files
David Marchandelmarco
authored andcommitted
docs: update ivshmem device spec
Add some notes on the parts needed to use ivshmem devices: more specifically, explain the purpose of an ivshmem server and the basic concept to use the ivshmem devices in guests. Move some parts of the documentation and re-organise it. Signed-off-by: David Marchand <[email protected]> Reviewed-by: Claudio Fontana <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Marc-André Lureau <[email protected]>
1 parent 1e21feb commit 8c4ef20

File tree

1 file changed

+93
-31
lines changed

1 file changed

+93
-31
lines changed

docs/specs/ivshmem_device_spec.txt

Lines changed: 93 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,103 @@
22
Device Specification for Inter-VM shared memory device
33
------------------------------------------------------
44

5-
The Inter-VM shared memory device is designed to share a region of memory to
6-
userspace in multiple virtual guests. The memory region does not belong to any
7-
guest, but is a POSIX memory object on the host. Optionally, the device may
8-
support sending interrupts to other guests sharing the same memory region.
5+
The Inter-VM shared memory device is designed to share a memory region (created
6+
on the host via the POSIX shared memory API) between multiple QEMU processes
7+
running different guests. In order for all guests to be able to pick up the
8+
shared memory area, it is modeled by QEMU as a PCI device exposing said memory
9+
to the guest as a PCI BAR.
10+
The memory region does not belong to any guest, but is a POSIX memory object on
11+
the host. The host can access this shared memory if needed.
12+
13+
The device also provides an optional communication mechanism between guests
14+
sharing the same memory object. More details about that in the section 'Guest to
15+
guest communication' section.
916

1017

1118
The Inter-VM PCI device
1219
-----------------------
1320

14-
*BARs*
21+
From the VM point of view, the ivshmem PCI device supports three BARs.
22+
23+
- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
24+
not used.
25+
- BAR1 is used for MSI-X when it is enabled in the device.
26+
- BAR2 is used to access the shared memory object.
27+
28+
It is your choice how to use the device but you must choose between two
29+
behaviors :
30+
31+
- basically, if you only need the shared memory part, you will map BAR2.
32+
This way, you have access to the shared memory in guest and can use it as you
33+
see fit (memnic, for example, uses it in userland
34+
http://dpdk.org/browse/memnic).
35+
36+
- BAR0 and BAR1 are used to implement an optional communication mechanism
37+
through interrupts in the guests. If you need an event mechanism between the
38+
guests accessing the shared memory, you will most likely want to write a
39+
kernel driver that will handle interrupts. See details in the section 'Guest
40+
to guest communication' section.
41+
42+
The behavior is chosen when starting your QEMU processes:
43+
- no communication mechanism needed, the first QEMU to start creates the shared
44+
memory on the host, subsequent QEMU processes will use it.
45+
46+
- communication mechanism needed, an ivshmem server must be started before any
47+
QEMU processes, then each QEMU process connects to the server unix socket.
48+
49+
For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
50+
51+
52+
Guest to guest communication
53+
----------------------------
54+
55+
This section details the communication mechanism between the guests accessing
56+
the ivhsmem shared memory.
1557

16-
The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support
17-
registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is
18-
used to map the shared memory object from the host. The size of BAR2 is
19-
specified when the guest is started and must be a power of 2 in size.
58+
*ivshmem server*
2059

21-
*Registers*
60+
This server code is available in qemu.git/contrib/ivshmem-server.
2261

23-
The device currently supports 4 registers of 32-bits each. Registers
24-
are used for synchronization between guests sharing the same memory object when
25-
interrupts are supported (this requires using the shared memory server).
62+
The server must be started on the host before any guest.
63+
It creates a shared memory object then waits for clients to connect on a unix
64+
socket.
2665

27-
The server assigns each VM an ID number and sends this ID number to the QEMU
28-
process when the guest starts.
66+
For each client (QEMU process) that connects to the server:
67+
- the server assigns an ID for this client and sends this ID to him as the first
68+
message,
69+
- the server sends a fd to the shared memory object to this client,
70+
- the server creates a new set of host eventfds associated to the new client and
71+
sends this set to all already connected clients,
72+
- finally, the server sends all the eventfds sets for all clients to the new
73+
client.
74+
75+
The server signals all clients when one of them disconnects.
76+
77+
The client IDs are limited to 16 bits because of the current implementation (see
78+
Doorbell register in 'PCI device registers' subsection). Hence only 65536
79+
clients are supported.
80+
81+
All the file descriptors (fd to the shared memory, eventfds for each client)
82+
are passed to clients using SCM_RIGHTS over the server unix socket.
83+
84+
Apart from the current ivshmem implementation in QEMU, an ivshmem client has
85+
been provided in qemu.git/contrib/ivshmem-client for debug.
86+
87+
*QEMU as an ivshmem client*
88+
89+
At initialisation, when creating the ivshmem device, QEMU gets its ID from the
90+
server then makes it available through BAR0 IVPosition register for the VM to
91+
use (see 'PCI device registers' subsection).
92+
QEMU then uses the fd to the shared memory to map it to BAR2.
93+
eventfds for all other clients received from the server are stored to implement
94+
BAR0 Doorbell register (see 'PCI device registers' subsection).
95+
Finally, eventfds assigned to this QEMU process are used to send interrupts in
96+
this VM.
97+
98+
*PCI device registers*
99+
100+
From the VM point of view, the ivshmem PCI device supports 4 registers of
101+
32-bits each.
29102

30103
enum ivshmem_registers {
31104
IntrMask = 0,
@@ -49,8 +122,8 @@ bit to 0 and unmasked by setting the first bit to 1.
49122
IVPosition Register: The IVPosition register is read-only and reports the
50123
guest's ID number. The guest IDs are non-negative integers. When using the
51124
server, since the server is a separate process, the VM ID will only be set when
52-
the device is ready (shared memory is received from the server and accessible via
53-
the device). If the device is not ready, the IVPosition will return -1.
125+
the device is ready (shared memory is received from the server and accessible
126+
via the device). If the device is not ready, the IVPosition will return -1.
54127
Applications should ensure that they have a valid VM ID before accessing the
55128
shared memory.
56129

@@ -59,8 +132,8 @@ Doorbell register. The doorbell register is 32-bits, logically divided into
59132
two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
60133
16-bits are the interrupt vector to trigger. The semantics of the value
61134
written to the doorbell depends on whether the device is using MSI or a regular
62-
pin-based interrupt. In short, MSI uses vectors while regular interrupts set the
63-
status register.
135+
pin-based interrupt. In short, MSI uses vectors while regular interrupts set
136+
the status register.
64137

65138
Regular Interrupts
66139

@@ -71,7 +144,7 @@ interrupt in the destination guest.
71144

72145
Message Signalled Interrupts
73146

74-
A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
147+
An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
75148
written to the Doorbell register must be between 0 and the maximum number of
76149
vectors the guest supports. The lower 16 bits written to the doorbell is the
77150
MSI vector that will be raised in the destination guest. The number of MSI
@@ -83,14 +156,3 @@ interrupt itself should be communicated via the shared memory region. Devices
83156
supporting multiple MSI vectors can use different vectors to indicate different
84157
events have occurred. The semantics of interrupt vectors are left to the
85158
user's discretion.
86-
87-
88-
Usage in the Guest
89-
------------------
90-
91-
The shared memory device is intended to be used with the provided UIO driver.
92-
Very little configuration is needed. The guest should map BAR0 to access the
93-
registers (an array of 32-bit ints allows simple writing) and map BAR2 to
94-
access the shared memory region itself. The size of the shared memory region
95-
is specified when the guest (or shared memory server) is started. A guest may
96-
map the whole shared memory region or only part of it.

0 commit comments

Comments
 (0)