1% libxenlight Domain Image Format 2% Andrew Cooper <<andrew.cooper3@citrix.com>> 3 Wen Congyang <<wency@cn.fujitsu.com>> 4 Yang Hongyang <<hongyang.yang@easystack.cn>> 5% Revision 2 6 7Introduction 8============ 9 10For the purposes of this document, `xl` is used as a representation of any 11implementer of the `libxl` API. `xl` should be considered completely 12interchangeable with alternates, such as `libvirt` or `xenopsd-xl`. 13 14Purpose 15------- 16 17The _domain image format_ is the context of a running domain used for 18snapshots of a domain or for transferring domains between hosts during 19migration. 20 21There are a number of problems with the domain image format used in Xen 4.5 22and earlier (the _legacy format_) 23 24* There is no `libxl` context information. `xl` is required to send certain 25 pieces of `libxl` context itself. 26 27* The contents of the stream is passed directly through `libxl` to `libxc`. 28 The legacy `libxc` format contained some information which belonged at the 29 `libxl` level, resulting in awkward layer violation to return the 30 information back to `libxl`. 31 32* The legacy `libxc` format was inextensible, causing inextensibility in the 33 legacy `libxl` handling. 34 35This design addresses the above points, allowing for a completely 36self-contained, extensible stream with each layer responsible for its own 37appropriate information. 38 39 40Not Yet Included 41---------------- 42 43The following features are not yet fully specified and will be 44included in a future draft. 45 46* ARM 47 48 49Overview 50======== 51 52The image format consists of a _Header_, followed by 1 or more _Records_. 53Each record consists of a type and length field, followed by any type-specific 54data. 55 56\clearpage 57 58Header 59====== 60 61The header identifies the stream as a `libxl` stream, including the version of 62this specification that it complies with. 63 64All fields in this header shall be in _big-endian_ byte order, regardless of 65the setting of the endianness bit. 66 67 0 1 2 3 4 5 6 7 octet 68 +-------------------------------------------------+ 69 | ident | 70 +-----------------------+-------------------------+ 71 | version | options | 72 +-----------------------+-------------------------+ 73 74-------------------------------------------------------------------- 75Field Description 76----------- -------------------------------------------------------- 77ident 0x4c6962786c466d74 ("LibxlFmt" in ASCII). 78 79version 0x00000002. The version of this specification. 80 81options bit 0: Endianness. 0 = little-endian, 1 = big-endian. 82 83 bit 1: Legacy Format. If set, this stream was created by 84 the legacy conversion tool. 85 86 bits 2-31: Reserved. 87-------------------------------------------------------------------- 88 89The endianness shall be 0 (little-endian) for images generated on an 90i386, x86_64, or arm host. 91 92\clearpage 93 94 95Record Overview 96=============== 97 98A record has a record header, type specific data and a trailing footer. If 99`length` is not a multiple of 8, the body is padded with zeroes to align the 100end of the record on an 8 octet boundary. 101 102 0 1 2 3 4 5 6 7 octet 103 +-----------------------+-------------------------+ 104 | type | body_length | 105 +-----------+-----------+-------------------------+ 106 | body... | 107 ... 108 | | padding (0 to 7 octets) | 109 +-----------+-------------------------------------+ 110 111-------------------------------------------------------------------- 112Field Description 113----------- ------------------------------------------------------- 114type 0x00000000: END 115 116 0x00000001: LIBXC_CONTEXT 117 118 0x00000002: EMULATOR_XENSTORE_DATA 119 120 0x00000003: EMULATOR_CONTEXT 121 122 0x00000004: CHECKPOINT_END 123 124 0x00000005: CHECKPOINT_STATE 125 126 0x00000006 - 0x7FFFFFFF: Reserved for future _mandatory_ 127 records. 128 129 0x80000000 - 0xFFFFFFFF: Reserved for future _optional_ 130 records. 131 132body_length Length in octets of the record body. 133 134body Content of the record. 135 136padding 0 to 7 octets of zeros to pad the whole record to a multiple 137 of 8 octets. 138-------------------------------------------------------------------- 139 140\clearpage 141 142Emulator Records 143---------------- 144 145Several records are specifically for emulators, and have a common sub header. 146 147 0 1 2 3 4 5 6 7 octet 148 +------------------------+------------------------+ 149 | emulator_id | index | 150 +------------------------+------------------------+ 151 | record specific data | 152 ... 153 +-------------------------------------------------+ 154 155-------------------------------------------------------------------- 156Field Description 157------------ --------------------------------------------------- 158emulator_id 0x00000000: Unknown (In the case of a legacy stream) 159 160 0x00000001: Qemu Traditional 161 162 0x00000002: Qemu Upstream 163 164 0x00000003 - 0xFFFFFFFF: Reserved for future emulators. 165 166index Index of this emulator for the domain. 167-------------------------------------------------------------------- 168 169\clearpage 170 171Records 172======= 173 174END 175---- 176 177A end record marks the end of the image, and shall be the final record 178in the stream. 179 180 0 1 2 3 4 5 6 7 octet 181 +-------------------------------------------------+ 182 183The end record contains no fields; its body_length is 0. 184 185LIBXC_CONTEXT 186------------- 187 188A libxc context record is a marker, indicating that the stream should be 189handed to `xc_domain_restore()`. `libxc` shall be responsible for reading its 190own image format from the stream. 191 192 0 1 2 3 4 5 6 7 octet 193 +-------------------------------------------------+ 194 195The libxc context record contains no fields; its body_length is 0[^1]. 196 197 198[^1]: The sending side cannot calculate ahead of time how much data `libxc` 199might write into the stream, especially for live migration where the quantity 200of data is partially proportional to the elapsed time. 201 202EMULATOR_XENSTORE_DATA 203---------------------- 204 205A set of xenstore key/value pairs for a specific emulator associated with the 206domain. 207 208 0 1 2 3 4 5 6 7 octet 209 +------------------------+------------------------+ 210 | emulator_id | index | 211 +------------------------+------------------------+ 212 | xenstore key/value data | 213 ... 214 +-------------------------------------------------+ 215 216Xenstore key/value data are encoded as a packed sequence of (key, value) 217tuples. Each (key, value) tuple is a packed pair of NUL terminated octets, 218conforming to xenstore protocol character encoding (keys strictly as 219alphanumeric ASCII and `-/_@`, values expected to be human-readable ASCII). 220 221Keys shall be relative to to the device models xenstore tree for the new 222domain. At the time of writing, keys are relative to the path 223 224> `/local/domain/$dm_domid/device-model/$domid/` 225 226although this path is free to change moving forward, thus should not be 227assumed. 228 229EMULATOR_CONTEXT 230---------------- 231 232A context blob for a specific emulator associated with the domain. 233 234 0 1 2 3 4 5 6 7 octet 235 +------------------------+------------------------+ 236 | emulator_id | index | 237 +------------------------+------------------------+ 238 | emulator_ctx | 239 ... 240 +-------------------------------------------------+ 241 242The *emulator_ctx* is a binary blob interpreted by the emulator identified by 243*emulator_id*. Its format is unspecified. 244 245CHECKPOINT_END 246-------------- 247 248A checkpoint end record marks the end of a checkpoint in the image. 249 250 0 1 2 3 4 5 6 7 octet 251 +-------------------------------------------------+ 252 253The end record contains no fields; its body_length is 0. 254 255 256CHECKPOINT_STATE 257---------------- 258 259A checkpoint state record contains the control information for checkpoint. It 260is only used by COLO, more detail please reference README.colo. 261 262 0 1 2 3 4 5 6 7 octet 263 +------------------------+------------------------+ 264 | control_id | padding | 265 +------------------------+------------------------+ 266 267-------------------------------------------------------------------- 268Field Description 269------------ --------------------------------------------------- 270control_id 0x00000000: Secondary VM is out of sync, start a new checkpoint 271 (Primary -> Secondary) 272 273 0x00000001: Secondary VM is suspended (Secondary -> Primary) 274 275 0x00000002: Secondary VM is ready (Secondary -> Primary) 276 277 0x00000003: Secondary VM is resumed (Secondary -> Primary) 278 279-------------------------------------------------------------------- 280 281In COLO, Primary is running in below loop: 282 2831. Suspend primary vm 284 a. Suspend primary vm 285 b. Read *CHECKPOINT_SVM_SUSPENDED* sent by secondary 2862. Checkpoint 2873. Resume primary vm 288 a. Read *CHECKPOINT_SVM_READY* from secondary 289 b. Resume primary vm 290 c. Read *CHECKPOINT_SVM_RESUMED* from secondary 2914. Wait a new checkpoint 292 a. Send *CHECKPOINT_NEW* to secondary 293 294While Secondary is running in below loop: 295 2961. Resume secondary vm 297 a. Send *CHECKPOINT_SVM_READY* to primary 298 b. Resume secondary vm 299 c. Send *CHECKPOINT_SVM_RESUMED* to primary 3002. Wait a new checkpoint 301 a. Read *CHECKPOINT_NEW* from primary 3023. Suspend secondary vm 303 a. Suspend secondary vm 304 b. Send *CHECKPOINT_SVM_SUSPENDED* to primary 3054. Checkpoint 306 307Future Extensions 308================= 309 310All changes to this specification should bump the revision number in 311the title block. 312 313All changes to the header require the header version to be increased. 314 315The format may be extended by adding additional record types. 316 317Extending an existing record type must be done by adding a new record 318type. This allows old images with the old record to still be 319restored. 320