1#!/usr/bin/env python
2# -*- coding: utf-8 -*-
3
4"""
5Legacy migration stream information.
6
7Documentation and record structures for legacy migration, for both libxc
8and libxl.
9"""
10
11"""
12Libxc:
13
14SAVE/RESTORE/MIGRATE PROTOCOL
15=============================
16
17The general form of a stream of chunks is a header followed by a
18body consisting of a variable number of chunks (terminated by a
19chunk with type 0) followed by a trailer.
20
21For a rolling/checkpoint (e.g. remus) migration then the body and
22trailer phases can be repeated until an external event
23(e.g. failure) causes the process to terminate and commit to the
24most recent complete checkpoint.
25
26HEADER
27------
28
29unsigned long        : p2m_size
30
31extended-info (PV-only, optional):
32
33  If first unsigned long == ~0UL then extended info is present,
34  otherwise unsigned long is part of p2m. Note that p2m_size above
35  does not include the length of the extended info.
36
37  extended-info:
38
39    unsigned long    : signature == ~0UL
40    uint32_t	        : number of bytes remaining in extended-info
41
42    1 or more extended-info blocks of form:
43    char[4]          : block identifier
44    uint32_t         : block data size
45    bytes            : block data
46
47    defined extended-info blocks:
48    "vcpu"		: VCPU context info containing vcpu_guest_context_t.
49                       The precise variant of the context structure
50                       (e.g. 32 vs 64 bit) is distinguished by
51                       the block size.
52    "extv"           : Presence indicates use of extended VCPU context in
53                       tail, data size is 0.
54
55p2m (PV-only):
56
57  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
58
59BODY PHASE - Format A (for live migration or Remus without compression)
60----------
61
62A series of chunks with a common header:
63  int              : chunk type
64
65If the chunk type is +ve then chunk contains guest memory data, and the
66type contains the number of pages in the batch:
67
68    unsigned long[]  : PFN array, length == number of pages in batch
69                       Each entry consists of XEN_DOMCTL_PFINFO_*
70                       in bits 31-28 and the PFN number in bits 27-0.
71    page data        : PAGE_SIZE bytes for each page marked present in PFN
72                       array
73
74If the chunk type is -ve then chunk consists of one of a number of
75metadata types.  See definitions of XC_SAVE_ID_* below.
76
77If chunk type is 0 then body phase is complete.
78
79
80BODY PHASE - Format B (for Remus with compression)
81----------
82
83A series of chunks with a common header:
84  int              : chunk type
85
86If the chunk type is +ve then chunk contains array of PFNs corresponding
87to guest memory and type contains the number of PFNs in the batch:
88
89    unsigned long[]  : PFN array, length == number of pages in batch
90                       Each entry consists of XEN_DOMCTL_PFINFO_*
91                       in bits 31-28 and the PFN number in bits 27-0.
92
93If the chunk type is -ve then chunk consists of one of a number of
94metadata types.  See definitions of XC_SAVE_ID_* below.
95
96If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
97chunk consists of compressed page data, in the following format:
98
99    unsigned long        : Size of the compressed chunk to follow
100    compressed data :      variable length data of size indicated above.
101                           This chunk consists of compressed page data.
102                           The number of pages in one chunk depends on
103                           the amount of space available in the sender's
104                           output buffer.
105
106Format of compressed data:
107  compressed_data = <deltas>*
108  delta           = <marker, run*>
109  marker          = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
110  RUNFLAG         = 0
111  SKIPFLAG        = 1 << 7
112  RUNLEN          = 7-bit unsigned value indicating number of WORDS in the run
113  run             = string of bytes of length sizeof(WORD) * RUNLEN
114
115   If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data following
116  the marker is copied into the target page at the appropriate offset indicated by
117  the offset_ptr
118   If marker contains SKIPFLAG, then the offset_ptr is advanced
119  by RUNLEN * sizeof(WORD).
120
121If chunk type is 0 then body phase is complete.
122
123There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA,
124containing compressed pages. The compressed chunks are collated to form
125one single compressed chunk for the entire iteration. The number of pages
126present in this final compressed chunk will be equal to the total number
127of valid PFNs specified by the +ve chunks.
128
129At the sender side, compressed pages are inserted into the output stream
130in the same order as they would have been if compression logic was absent.
131
132Until last iteration, the BODY is sent in Format A, to maintain live
133migration compatibility with receivers of older Xen versions.
134At the last iteration, if Remus compression was enabled, the sender sends
135a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the
136BODY in Format B from the next iteration onwards.
137
138An example sequence of chunks received in Format B:
139    +16                              +ve chunk
140    unsigned long[16]                PFN array
141    +100                             +ve chunk
142    unsigned long[100]               PFN array
143    +50                              +ve chunk
144    unsigned long[50]                PFN array
145
146    XC_SAVE_ID_COMPRESSED_DATA       TAG
147      N                              Length of compressed data
148      N bytes of DATA                Decompresses to 166 pages
149
150    XC_SAVE_ID_*                     other xc save chunks
151    0                                END BODY TAG
152
153Corner case with checkpoint compression:
154    At sender side, after pausing the domain, dirty pages are usually
155  copied out to a temporary buffer. After the domain is resumed,
156  compression is done and the compressed chunk(s) are sent, followed by
157  other XC_SAVE_ID_* chunks.
158    If the temporary buffer gets full while scanning for dirty pages,
159  the sender stops buffering of dirty pages, compresses the temporary
160  buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA.
161  The sender then resumes the buffering of dirty pages and continues
162  scanning for the dirty pages.
163    For e.g., assume that the temporary buffer can hold 4096 pages and
164  there are 5000 dirty pages. The following is the sequence of chunks
165  that the receiver will see:
166
167    +1024                       +ve chunk
168    unsigned long[1024]         PFN array
169    +1024                       +ve chunk
170    unsigned long[1024]         PFN array
171    +1024                       +ve chunk
172    unsigned long[1024]         PFN array
173    +1024                       +ve chunk
174    unsigned long[1024]         PFN array
175
176    XC_SAVE_ID_COMPRESSED_DATA  TAG
177     N                          Length of compressed data
178     N bytes of DATA            Decompresses to 4096 pages
179
180    +4                          +ve chunk
181    unsigned long[4]            PFN array
182
183    XC_SAVE_ID_COMPRESSED_DATA  TAG
184     M                          Length of compressed data
185     M bytes of DATA            Decompresses to 4 pages
186
187    XC_SAVE_ID_*                other xc save chunks
188    0                           END BODY TAG
189
190    In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with
191  +ve chunks arbitrarily. But at the receiver end, the following condition
192  always holds true until the end of BODY PHASE:
193   num(PFN entries +ve chunks) >= num(pages received in compressed form)
194
195TAIL PHASE
196----------
197
198Content differs for PV and HVM guests.
199
200HVM TAIL:
201
202 "Magic" pages:
203    uint64_t         : I/O req PFN
204    uint64_t         : Buffered I/O req PFN
205    uint64_t         : Store PFN
206 Xen HVM Context:
207    uint32_t         : Length of context in bytes
208    bytes            : Context data
209 Qemu context:
210    char[21]         : Signature:
211      "QemuDeviceModelRecord" : Read Qemu save data until EOF
212      "DeviceModelRecord0002" : uint32_t length field followed by that many
213                                bytes of Qemu save data
214      "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002".
215
216PV TAIL:
217
218 Unmapped PFN list   : list of all the PFNs that were not in map at the close
219    unsigned int     : Number of unmapped pages
220    unsigned long[]  : PFNs of unmapped pages
221
222 VCPU context data   : A series of VCPU records, one per present VCPU
223                       Maximum and present map supplied in XC_SAVE_ID_VCPUINFO
224    bytes:           : VCPU context structure. Size is determined by size
225                       provided in extended-info header
226    bytes[128]       : Extended VCPU context (present IFF "extv" block
227                       present in extended-info header)
228
229 Shared Info Page    : 4096 bytes of shared info page
230"""
231
232CHUNK_end                       = 0
233CHUNK_enable_verify_mode        = -1
234CHUNK_vcpu_info                 = -2
235CHUNK_hvm_ident_pt              = -3
236CHUNK_hvm_vm86_tss              = -4
237CHUNK_tmem                      = -5
238CHUNK_tmem_extra                = -6
239CHUNK_tsc_info                  = -7
240CHUNK_hvm_console_pfn           = -8
241CHUNK_last_checkpoint           = -9
242CHUNK_hvm_acpi_ioports_location = -10
243CHUNK_hvm_viridian              = -11
244CHUNK_compressed_data           = -12
245CHUNK_enable_compression        = -13
246CHUNK_hvm_generation_id_addr    = -14
247CHUNK_hvm_paging_ring_pfn       = -15
248CHUNK_hvm_monitor_ring_pfn      = -16
249CHUNK_hvm_sharing_ring_pfn      = -17
250CHUNK_toolstack                 = -18
251CHUNK_hvm_ioreq_server_pfn      = -19
252CHUNK_hvm_nr_ioreq_server_pages = -20
253
254chunk_type_to_str = {
255    CHUNK_end                       : "end",
256    CHUNK_enable_verify_mode        : "enable_verify_mode",
257    CHUNK_vcpu_info                 : "vcpu_info",
258    CHUNK_hvm_ident_pt              : "hvm_ident_pt",
259    CHUNK_hvm_vm86_tss              : "hvm_vm86_tss",
260    CHUNK_tmem                      : "tmem",
261    CHUNK_tmem_extra                : "tmem_extra",
262    CHUNK_tsc_info                  : "tsc_info",
263    CHUNK_hvm_console_pfn           : "hvm_console_pfn",
264    CHUNK_last_checkpoint           : "last_checkpoint",
265    CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location",
266    CHUNK_hvm_viridian              : "hvm_viridian",
267    CHUNK_compressed_data           : "compressed_data",
268    CHUNK_enable_compression        : "enable_compression",
269    CHUNK_hvm_generation_id_addr    : "hvm_generation_id_addr",
270    CHUNK_hvm_paging_ring_pfn       : "hvm_paging_ring_pfn",
271    CHUNK_hvm_monitor_ring_pfn      : "hvm_monitor_ring_pfn",
272    CHUNK_hvm_sharing_ring_pfn      : "hvm_sharing_ring_pfn",
273    CHUNK_toolstack                 : "toolstack",
274    CHUNK_hvm_ioreq_server_pfn      : "hvm_ioreq_server_pfn",
275    CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages",
276}
277
278# Up to 1024 pages (4MB) at a time
279MAX_BATCH = 1024
280
281# Maximum #VCPUs currently supported for save/restore
282MAX_VCPU_ID = 4095
283
284
285"""
286Libxl:
287
288Legacy "toolstack" record layout:
289
290Version 1:
291  uint32_t version
292  QEMU physmap data:
293    uint32_t count
294    libxl__physmap_info * count
295
296The problem is that libxl__physmap_info was declared as:
297
298struct libxl__physmap_info {
299    uint64_t phys_offset;
300    uint64_t start_addr;
301    uint64_t size;
302    uint32_t namelen;
303    char name[];
304};
305
306Which has 4 bytes of padding at the end in a 64bit build, thus not the
307same between 32 and 64bit builds.
308
309Because of the pointer arithmatic used to construct the record, the 'name' was
310shifted up to start at the padding, leaving the erronious 4 bytes at the end
311of the name string, after the NUL terminator.
312
313Instead, the information described here has been changed to fit in a new
314EMULATOR_XENSTORE_DATA record made of NUL terminated strings.
315"""
316