1#!/usr/bin/env python 2# -*- coding: utf-8 -*- 3 4""" 5Legacy migration stream information. 6 7Documentation and record structures for legacy migration, for both libxc 8and libxl. 9""" 10 11""" 12Libxc: 13 14SAVE/RESTORE/MIGRATE PROTOCOL 15============================= 16 17The general form of a stream of chunks is a header followed by a 18body consisting of a variable number of chunks (terminated by a 19chunk with type 0) followed by a trailer. 20 21For a rolling/checkpoint (e.g. remus) migration then the body and 22trailer phases can be repeated until an external event 23(e.g. failure) causes the process to terminate and commit to the 24most recent complete checkpoint. 25 26HEADER 27------ 28 29unsigned long : p2m_size 30 31extended-info (PV-only, optional): 32 33 If first unsigned long == ~0UL then extended info is present, 34 otherwise unsigned long is part of p2m. Note that p2m_size above 35 does not include the length of the extended info. 36 37 extended-info: 38 39 unsigned long : signature == ~0UL 40 uint32_t : number of bytes remaining in extended-info 41 42 1 or more extended-info blocks of form: 43 char[4] : block identifier 44 uint32_t : block data size 45 bytes : block data 46 47 defined extended-info blocks: 48 "vcpu" : VCPU context info containing vcpu_guest_context_t. 49 The precise variant of the context structure 50 (e.g. 32 vs 64 bit) is distinguished by 51 the block size. 52 "extv" : Presence indicates use of extended VCPU context in 53 tail, data size is 0. 54 55p2m (PV-only): 56 57 consists of p2m_size bytes comprising an array of xen_pfn_t sized entries. 58 59BODY PHASE - Format A (for live migration or Remus without compression) 60---------- 61 62A series of chunks with a common header: 63 int : chunk type 64 65If the chunk type is +ve then chunk contains guest memory data, and the 66type contains the number of pages in the batch: 67 68 unsigned long[] : PFN array, length == number of pages in batch 69 Each entry consists of XEN_DOMCTL_PFINFO_* 70 in bits 31-28 and the PFN number in bits 27-0. 71 page data : PAGE_SIZE bytes for each page marked present in PFN 72 array 73 74If the chunk type is -ve then chunk consists of one of a number of 75metadata types. See definitions of XC_SAVE_ID_* below. 76 77If chunk type is 0 then body phase is complete. 78 79 80BODY PHASE - Format B (for Remus with compression) 81---------- 82 83A series of chunks with a common header: 84 int : chunk type 85 86If the chunk type is +ve then chunk contains array of PFNs corresponding 87to guest memory and type contains the number of PFNs in the batch: 88 89 unsigned long[] : PFN array, length == number of pages in batch 90 Each entry consists of XEN_DOMCTL_PFINFO_* 91 in bits 31-28 and the PFN number in bits 27-0. 92 93If the chunk type is -ve then chunk consists of one of a number of 94metadata types. See definitions of XC_SAVE_ID_* below. 95 96If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the 97chunk consists of compressed page data, in the following format: 98 99 unsigned long : Size of the compressed chunk to follow 100 compressed data : variable length data of size indicated above. 101 This chunk consists of compressed page data. 102 The number of pages in one chunk depends on 103 the amount of space available in the sender's 104 output buffer. 105 106Format of compressed data: 107 compressed_data = <deltas>* 108 delta = <marker, run*> 109 marker = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker] 110 RUNFLAG = 0 111 SKIPFLAG = 1 << 7 112 RUNLEN = 7-bit unsigned value indicating number of WORDS in the run 113 run = string of bytes of length sizeof(WORD) * RUNLEN 114 115 If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data following 116 the marker is copied into the target page at the appropriate offset indicated by 117 the offset_ptr 118 If marker contains SKIPFLAG, then the offset_ptr is advanced 119 by RUNLEN * sizeof(WORD). 120 121If chunk type is 0 then body phase is complete. 122 123There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA, 124containing compressed pages. The compressed chunks are collated to form 125one single compressed chunk for the entire iteration. The number of pages 126present in this final compressed chunk will be equal to the total number 127of valid PFNs specified by the +ve chunks. 128 129At the sender side, compressed pages are inserted into the output stream 130in the same order as they would have been if compression logic was absent. 131 132Until last iteration, the BODY is sent in Format A, to maintain live 133migration compatibility with receivers of older Xen versions. 134At the last iteration, if Remus compression was enabled, the sender sends 135a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the 136BODY in Format B from the next iteration onwards. 137 138An example sequence of chunks received in Format B: 139 +16 +ve chunk 140 unsigned long[16] PFN array 141 +100 +ve chunk 142 unsigned long[100] PFN array 143 +50 +ve chunk 144 unsigned long[50] PFN array 145 146 XC_SAVE_ID_COMPRESSED_DATA TAG 147 N Length of compressed data 148 N bytes of DATA Decompresses to 166 pages 149 150 XC_SAVE_ID_* other xc save chunks 151 0 END BODY TAG 152 153Corner case with checkpoint compression: 154 At sender side, after pausing the domain, dirty pages are usually 155 copied out to a temporary buffer. After the domain is resumed, 156 compression is done and the compressed chunk(s) are sent, followed by 157 other XC_SAVE_ID_* chunks. 158 If the temporary buffer gets full while scanning for dirty pages, 159 the sender stops buffering of dirty pages, compresses the temporary 160 buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA. 161 The sender then resumes the buffering of dirty pages and continues 162 scanning for the dirty pages. 163 For e.g., assume that the temporary buffer can hold 4096 pages and 164 there are 5000 dirty pages. The following is the sequence of chunks 165 that the receiver will see: 166 167 +1024 +ve chunk 168 unsigned long[1024] PFN array 169 +1024 +ve chunk 170 unsigned long[1024] PFN array 171 +1024 +ve chunk 172 unsigned long[1024] PFN array 173 +1024 +ve chunk 174 unsigned long[1024] PFN array 175 176 XC_SAVE_ID_COMPRESSED_DATA TAG 177 N Length of compressed data 178 N bytes of DATA Decompresses to 4096 pages 179 180 +4 +ve chunk 181 unsigned long[4] PFN array 182 183 XC_SAVE_ID_COMPRESSED_DATA TAG 184 M Length of compressed data 185 M bytes of DATA Decompresses to 4 pages 186 187 XC_SAVE_ID_* other xc save chunks 188 0 END BODY TAG 189 190 In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with 191 +ve chunks arbitrarily. But at the receiver end, the following condition 192 always holds true until the end of BODY PHASE: 193 num(PFN entries +ve chunks) >= num(pages received in compressed form) 194 195TAIL PHASE 196---------- 197 198Content differs for PV and HVM guests. 199 200HVM TAIL: 201 202 "Magic" pages: 203 uint64_t : I/O req PFN 204 uint64_t : Buffered I/O req PFN 205 uint64_t : Store PFN 206 Xen HVM Context: 207 uint32_t : Length of context in bytes 208 bytes : Context data 209 Qemu context: 210 char[21] : Signature: 211 "QemuDeviceModelRecord" : Read Qemu save data until EOF 212 "DeviceModelRecord0002" : uint32_t length field followed by that many 213 bytes of Qemu save data 214 "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002". 215 216PV TAIL: 217 218 Unmapped PFN list : list of all the PFNs that were not in map at the close 219 unsigned int : Number of unmapped pages 220 unsigned long[] : PFNs of unmapped pages 221 222 VCPU context data : A series of VCPU records, one per present VCPU 223 Maximum and present map supplied in XC_SAVE_ID_VCPUINFO 224 bytes: : VCPU context structure. Size is determined by size 225 provided in extended-info header 226 bytes[128] : Extended VCPU context (present IFF "extv" block 227 present in extended-info header) 228 229 Shared Info Page : 4096 bytes of shared info page 230""" 231 232CHUNK_end = 0 233CHUNK_enable_verify_mode = -1 234CHUNK_vcpu_info = -2 235CHUNK_hvm_ident_pt = -3 236CHUNK_hvm_vm86_tss = -4 237CHUNK_tmem = -5 238CHUNK_tmem_extra = -6 239CHUNK_tsc_info = -7 240CHUNK_hvm_console_pfn = -8 241CHUNK_last_checkpoint = -9 242CHUNK_hvm_acpi_ioports_location = -10 243CHUNK_hvm_viridian = -11 244CHUNK_compressed_data = -12 245CHUNK_enable_compression = -13 246CHUNK_hvm_generation_id_addr = -14 247CHUNK_hvm_paging_ring_pfn = -15 248CHUNK_hvm_monitor_ring_pfn = -16 249CHUNK_hvm_sharing_ring_pfn = -17 250CHUNK_toolstack = -18 251CHUNK_hvm_ioreq_server_pfn = -19 252CHUNK_hvm_nr_ioreq_server_pages = -20 253 254chunk_type_to_str = { 255 CHUNK_end : "end", 256 CHUNK_enable_verify_mode : "enable_verify_mode", 257 CHUNK_vcpu_info : "vcpu_info", 258 CHUNK_hvm_ident_pt : "hvm_ident_pt", 259 CHUNK_hvm_vm86_tss : "hvm_vm86_tss", 260 CHUNK_tmem : "tmem", 261 CHUNK_tmem_extra : "tmem_extra", 262 CHUNK_tsc_info : "tsc_info", 263 CHUNK_hvm_console_pfn : "hvm_console_pfn", 264 CHUNK_last_checkpoint : "last_checkpoint", 265 CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location", 266 CHUNK_hvm_viridian : "hvm_viridian", 267 CHUNK_compressed_data : "compressed_data", 268 CHUNK_enable_compression : "enable_compression", 269 CHUNK_hvm_generation_id_addr : "hvm_generation_id_addr", 270 CHUNK_hvm_paging_ring_pfn : "hvm_paging_ring_pfn", 271 CHUNK_hvm_monitor_ring_pfn : "hvm_monitor_ring_pfn", 272 CHUNK_hvm_sharing_ring_pfn : "hvm_sharing_ring_pfn", 273 CHUNK_toolstack : "toolstack", 274 CHUNK_hvm_ioreq_server_pfn : "hvm_ioreq_server_pfn", 275 CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages", 276} 277 278# Up to 1024 pages (4MB) at a time 279MAX_BATCH = 1024 280 281# Maximum #VCPUs currently supported for save/restore 282MAX_VCPU_ID = 4095 283 284 285""" 286Libxl: 287 288Legacy "toolstack" record layout: 289 290Version 1: 291 uint32_t version 292 QEMU physmap data: 293 uint32_t count 294 libxl__physmap_info * count 295 296The problem is that libxl__physmap_info was declared as: 297 298struct libxl__physmap_info { 299 uint64_t phys_offset; 300 uint64_t start_addr; 301 uint64_t size; 302 uint32_t namelen; 303 char name[]; 304}; 305 306Which has 4 bytes of padding at the end in a 64bit build, thus not the 307same between 32 and 64bit builds. 308 309Because of the pointer arithmatic used to construct the record, the 'name' was 310shifted up to start at the padding, leaving the erronious 4 bytes at the end 311of the name string, after the NUL terminator. 312 313Instead, the information described here has been changed to fit in a new 314EMULATOR_XENSTORE_DATA record made of NUL terminated strings. 315""" 316