1# Xen Live Patching Design v2 2 3## Rationale 4 5A mechanism is required to binarily patch the running hypervisor with new 6opcodes that have come about due to primarily security updates. 7 8This document describes the design of the API that would allow us to 9upload to the hypervisor binary patches. 10 11The document is split in four sections: 12 13 * Detailed descriptions of the problem statement. 14 * Design of the data structures. 15 * Design of the hypercalls. 16 * Implementation notes that should be taken into consideration. 17 18 19## Glossary 20 21 * splice - patch in the binary code with new opcodes 22 * trampoline - a jump to a new instruction. 23 * payload - telemetries of the old code along with binary blob of the new 24 function (if needed). 25 * reloc - telemetries contained in the payload to construct proper trampoline. 26 * hook - an auxiliary function being called before, during or after payload 27 application or revert. 28 * quiescing zone - period when all CPUs are lock-step with each other. 29 30## History 31 32The document has gone under various reviews and only covers v1 design. 33 34The end of the document has a section titled `Not Yet Done` which 35outlines ideas and design for the future version of this work. 36 37## Multiple ways to patch 38 39The mechanism needs to be flexible to patch the hypervisor in multiple ways 40and be as simple as possible. The compiled code is contiguous in memory with 41no gaps - so we have no luxury of 'moving' existing code and must either 42insert a trampoline to the new code to be executed - or only modify in-place 43the code if there is sufficient space. The placement of new code has to be done 44by hypervisor and the virtual address for the new code is allocated dynamically. 45 46This implies that the hypervisor must compute the new offsets when splicing 47in the new trampoline code. Where the trampoline is added (inside 48the function we are patching or just the callers?) is also important. 49 50To lessen the amount of code in hypervisor, the consumer of the API 51is responsible for identifying which mechanism to employ and how many locations 52to patch. Combinations of modifying in-place code, adding trampoline, etc 53has to be supported. The API should allow read/write any memory within 54the hypervisor virtual address space. 55 56We must also have a mechanism to query what has been applied and a mechanism 57to revert it if needed. 58 59## Workflow 60 61The expected workflows of higher-level tools that manage multiple patches 62on production machines would be: 63 64 * The first obvious task is loading all available / suggested 65 hotpatches when they are available. 66 * Whenever new hotpatches are installed, they should be loaded too. 67 * One wants to query which modules have been loaded at runtime. 68 * If unloading is deemed safe (see unloading below), one may want to 69 support a workflow where a specific hotpatch is marked as bad and 70 unloaded. 71 72## Patching code 73 74The first mechanism to patch that comes in mind is in-place replacement. 75That is replace the affected code with new code. Unfortunately the x86 76ISA is variable size which places limits on how much space we have available 77to replace the instructions. That is not a problem if the change is smaller 78than the original opcode and we can fill it with nops. Problems will 79appear if the replacement code is longer. 80 81The second mechanism is by ti replace the call or jump to the 82old function with the address of the new function. 83 84A third mechanism is to add a jump to the new function at the 85start of the old function. N.B. The Xen hypervisor implements the third 86mechanism. See `Trampoline (e9 opcode)` section for more details. 87 88### Example of trampoline and in-place splicing 89 90As example we will assume the hypervisor does not have XSA-132 (see 91[domctl/sysctl: don't leak hypervisor stack to toolstacks](https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=4ff3449f0e9d175ceb9551d3f2aecb59273f639d)) 92and we would like to binary patch the hypervisor with it. The original code 93looks as so: 94 95 48 89 e0 mov %rsp,%rax 96 48 25 00 80 ff ff and $0xffffffffffff8000,%rax 97 98while the new patched hypervisor would be: 99 100 48 c7 45 b8 00 00 00 00 movq $0x0,-0x48(%rbp) 101 48 c7 45 c0 00 00 00 00 movq $0x0,-0x40(%rbp) 102 48 c7 45 c8 00 00 00 00 movq $0x0,-0x38(%rbp) 103 48 89 e0 mov %rsp,%rax 104 48 25 00 80 ff ff and $0xffffffffffff8000,%rax 105 106This is inside the `arch_do_domctl`. This new change adds 21 extra 107bytes of code which alters all the offsets inside the function. To alter 108these offsets and add the extra 21 bytes of code we might not have enough 109space in .text to squeeze this in. 110 111As such we could simplify this problem by only patching the site 112which calls `arch_do_domctl`: 113 114 do_domctl: 115 e8 4b b1 05 00 callq ffff82d08015fbb9 <arch_do_domctl> 116 117with a new address for where the new `arch_do_domctl` would be (this 118area would be allocated dynamically). 119 120Astute readers will wonder what we need to do if we were to patch `do_domctl` 121- which is not called directly by hypervisor but on behalf of the guests via 122the `compat_hypercall_table` and `hypercall_table`. Patching the offset in 123`hypercall_table` for `do_domctl`: 124 125 ffff82d08024d490: 79 30 126 ffff82d08024d492: 10 80 d0 82 ff ff 127 128with the new address where the new `do_domctl` is possible. The other 129place where it is used is in `hvm_hypercall64_table` which would need 130to be patched in a similar way. This would require an in-place splicing 131of the new virtual address of `arch_do_domctl`. 132 133In summary this example patched the callee of the affected function by 134 135 * Allocating memory for the new code to live in, 136 * Changing the virtual address in all the functions which called the old 137 code (computing the new offset, patching the callq with a new callq). 138 * Changing the function pointer tables with the new virtual address of 139 the function (splicing in the new virtual address). Since this table 140 resides in the .rodata section we would need to temporarily change the 141 page table permissions during this part. 142 143However it has drawbacks - the safety checks which have to make sure 144the function is not on the stack - must also check every caller. For some 145patches this could mean - if there were an sufficient large amount of 146callers - that we would never be able to apply the update. 147 148Having the patching done at predetermined instances where the stacks 149are not deep mostly solves this problem. 150 151### Example of different trampoline patching. 152 153An alternative mechanism exists where we can insert a trampoline in the 154existing function to be patched to jump directly to the new code. This 155lessens the locations to be patched to one but it puts pressure on the 156CPU branching logic (I-cache, but it is just one unconditional jump). 157 158For this example we will assume that the hypervisor has not been compiled with 159XSA-125 (see 160[pre-fill structures for certain HYPERVISOR_xen_version sub-ops](https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=fe2e079f642effb3d24a6e1a7096ef26e691d93e)) 161which mem-sets an structure in `xen_version` hypercall. This function is not 162called **anywhere** in the hypervisor (it is called by the guest) but 163referenced in the `compat_hypercall_table` and `hypercall_table` (and 164indirectly called from that). Patching the offset in `hypercall_table` for the 165old `do_xen_version`: 166 167 ffff82d08024b270 <hypercall_table>: 168 ... 169 ffff82d08024b2f8: 9e 2f 11 80 d0 82 ff ff 170 171with the new address where the new `do_xen_version` is possible. The other 172place where it is used is in `hvm_hypercall64_table` which would need 173to be patched in a similar way. This would require an in-place splicing 174of the new virtual address of `do_xen_version`. 175 176An alternative solution would be to patch insert a trampoline in the 177old `do_xen_version` function to directly jump to the new `do_xen_version`: 178 179 ffff82d080112f9e do_xen_version: 180 ffff82d080112f9e: 48 c7 c0 da ff ff ff mov $0xffffffffffffffda,%rax 181 ffff82d080112fa5: 83 ff 09 cmp $0x9,%edi 182 ffff82d080112fa8: 0f 87 24 05 00 00 ja ffff82d0801134d2 ; do_xen_version+0x534 183 184with: 185 186 ffff82d080112f9e do_xen_version: 187 ffff82d080112f9e: e9 XX YY ZZ QQ jmpq [new do_xen_version] 188 189which would lessen the amount of patching to just one location. 190 191In summary this example patched the affected function to jump to the 192new replacement function which required: 193 194 * Allocating memory for the new code to live in, 195 * Inserting trampoline with new offset in the old function to point to the 196 new function. 197 * Optionally we can insert in the old function a trampoline jump to an function 198 providing an BUG_ON to catch errant code. 199 200The disadvantage of this are that the unconditional jump will consume a small 201I-cache penalty. However the simplicity of the patching and higher chance 202of passing safety checks make this a worthwhile option. 203 204This patching has a similar drawback as inline patching - the safety 205checks have to make sure the function is not on the stack. However 206since we are replacing at a higher level (a full function as opposed 207to various offsets within functions) the checks are simpler. 208 209Having the patching done at predetermined instances where the stacks 210are not deep mostly solves this problem as well. 211 212### Security 213 214With this method we can re-write the hypervisor - and as such we **MUST** be 215diligent in only allowing certain guests to perform this operation. 216 217Furthermore with SecureBoot or tboot, we **MUST** also verify the signature 218of the payload to be certain it came from a trusted source and integrity 219was intact. 220 221As such the hypercall **MUST** support an XSM policy to limit what the guest 222is allowed to invoke. If the system is booted with signature checking the 223signature checking will be enforced. 224 225## Design of payload format 226 227The payload **MUST** contain enough data to allow us to apply the update 228and also safely reverse it. As such we **MUST** know: 229 230 * The locations in memory to be patched. This can be determined dynamically 231 via symbols or via virtual addresses. 232 * The new code that will be patched in. 233 234This binary format can be constructed using an custom binary format but 235there are severe disadvantages of it: 236 237 * The format might need to be changed and we need an mechanism to accommodate 238 that. 239 * It has to be platform agnostic. 240 * Easily constructed using existing tools. 241 242As such having the payload in an ELF file is the sensible way. We would be 243carrying the various sets of structures (and data) in the ELF sections under 244different names and with definitions. 245 246Note that every structure has padding. This is added so that the hypervisor 247can re-use those fields as it sees fit. 248 249Earlier design attempted to ineptly explain the relations of the ELF sections 250to each other without using proper ELF mechanism (sh_info, sh_link, data 251structures using Elf types, etc). This design will explain the structures 252and how they are used together and not dig in the ELF format - except mention 253that the section names should match the structure names. 254 255The Xen Live Patch payload is a relocatable ELF binary. A typical binary would have: 256 257 * One or more .text sections. 258 * Zero or more read-only data sections. 259 * Zero or more data sections. 260 * Relocations for each of these sections. 261 262It may also have some architecture-specific sections. For example: 263 264 * Alternatives instructions. 265 * Bug frames. 266 * Exception tables. 267 * Relocations for each of these sections. 268 269The Xen Live Patch core code loads the payload as a standard ELF binary, relocates it 270and handles the architecture-specifc sections as needed. This process is much 271like what the Linux kernel module loader does. 272 273The payload contains at least three sections: 274 275 * `.livepatch.funcs` - which is an array of livepatch_func structures. 276 and/or any of: 277 * `.livepatch.hooks.{preapply,postapply,prerevert,postrevert}' 278 * `.livepatch.hooks.{apply,revert}` 279 - which are a pointer to a hook function pointer. 280 281 * `.livepatch.xen_depends` - which is an ELF Note that describes what Xen 282 build-id the payload depends on. **MUST** have one. 283 * `.livepatch.depends` - which is an ELF Note that describes what the payload 284 depends on. **MUST** have one. 285 * `.note.gnu.build-id` - the build-id of this payload. **MUST** have one. 286 287### .livepatch.funcs 288 289The `.livepatch.funcs` contains an array of livepatch_func structures 290which describe the functions to be patched: 291 292 struct livepatch_func { 293 const char *name; 294 void *new_addr; 295 void *old_addr; 296 uint32_t new_size; 297 uint32_t old_size; 298 uint8_t version; 299 uint8_t opaque[31]; 300 /* Added to livepatch payload version 2: */ 301 uint8_t applied; 302 uint8_t _pad[7]; 303 livepatch_expectation_t expect; 304 }; 305 306The size of the structure is 104 bytes on 64-bit hypervisors. It will be 30792 on 32-bit hypervisors. 308The version 2 of the payload adds additional 8 bytes to the structure size. 309 310 * `name` is the symbol name of the old function. Only used if `old_addr` is 311 zero, otherwise will be used during dynamic linking (when hypervisor loads 312 the payload). 313 * `old_addr` is the address of the function to be patched and is filled in at 314 payload generation time if hypervisor function address is known. If unknown, 315 the value *MUST* be zero and the hypervisor will attempt to resolve the 316 address. 317 * `new_addr` can either have a non-zero value or be zero. 318 * If there is a non-zero value, then it is the address of the function that 319 is replacing the old function and the address is recomputed during 320 relocation. The value **MUST** be the address of the new function in the 321 payload file. 322 * If the value is zero, then we NOPing out at the `old_addr` location 323 `new_size` bytes. 324 * `old_size` contains the sizes of the respective `old_addr` function in 325 bytes. The value of `old_size` **MUST** not be zero. 326 * `new_size` depends on what `new_addr` contains: 327 * If `new_addr` contains an non-zero value, then `new_size` has the size of 328 the new function (which will replace the one at `old_addr`) in bytes. 329 * If the value of `new_addr` is zero then `new_size` determines how many 330 instruction bytes to NOP (up to opaque size modulo smallest platform 331 instruction - 1 byte x86 and 4 bytes on ARM). 332 * `version` indicates version of the generated payload. 333 * `opaque` **MUST** be zero. 334 335The version 2 of the payload adds the following fields to the structure: 336 337 * `applied` tracks function's applied/reverted state. It has a boolean type 338 either LIVEPATCH_FUNC_NOT_APPLIED or LIVEPATCH_FUNC_APPLIED. 339 * `_pad[7]` adds padding to align to 8 bytes. 340 * `expect` is an optional structure containing expected to-be-replaced data 341 (mostly for inline asm patching). The `expect` structure format is: 342 343 struct livepatch_expectation { 344 uint8_t enabled : 1; 345 uint8_t len : 5; 346 uint8_t rsv: 2; 347 uint8_t data[LIVEPATCH_OPAQUE_SIZE]; /* Same size as opaque[] buffer of 348 struct livepatch_func. This is the 349 max number of bytes to be patched */ 350 }; 351 typedef struct livepatch_expectation livepatch_expectation_t; 352 353 * `enabled` allows to enable the expectation check for given function. 354 Default state is disabled. 355 * `len` specifies the number of valid bytes in `data` array. 5 bits is 356 enough to specify values up to 32 (of bytes), which is above the array 357 size. 358 * `rsv` reserved bitfields. **MUST** be zero. 359 * `data` contains expected bytes of content to be replaced. Same size as 360 `opaque` buffer of `struct livepatch_func` (max number of bytes to be 361 patched). 362 363The size of the `livepatch_func` array is determined from the ELF section 364size. 365 366When applying the patch the hypervisor iterates over each `livepatch_func` 367structure and the core code inserts a trampoline at `old_addr` to `new_addr`. 368The `new_addr` is altered when the ELF payload is loaded. 369 370When reverting a patch, the hypervisor iterates over each `livepatch_func` 371and the core code copies the data from the undo buffer (private internal copy) 372to `old_addr`. 373 374It optionally may contain the address of hooks to be called right before 375being applied and after being reverted (while all CPUs are still in quiescing 376zone). These hooks do not have access to payload structure. 377 378 * `.livepatch.hooks.load` - an array of function pointers. 379 * `.livepatch.hooks.unload` - an array of function pointers. 380 381It optionally may also contain the address of pre- and post- vetoing hooks to 382be called before (pre) or after (post) apply and revert payload actions (while 383all CPUs are already released from quiescing zone). These hooks do have 384access to payload structure. The pre-apply hook can prevent from loading the 385payload if encoded in it condition is not met. Accordingly, the pre-revert 386hook can prevent from unloading the livepatch if encoded in it condition is not 387met. 388 389 * `.livepatch.hooks.{preapply,postapply}` 390 * `.livepatch.hooks.{prerevert,postrevert}` 391 - which are a pointer to a single hook function pointer. 392 393Finally, it optionally may also contain the address of apply or revert action 394hooks to be called instead of the default apply and revert payload actions 395(while all CPUs are kept in quiescing zone). These hooks do have access to 396payload structure. 397 398 * `.livepatch.hooks.{apply,revert}` 399 - which are a pointer to a single hook function pointer. 400 401### Example of .livepatch.funcs 402 403A simple example of what a payload file can be: 404 405 /* MUST be in sync with hypervisor. */ 406 struct livepatch_func { 407 const char *name; 408 void *new_addr; 409 void *old_addr; 410 uint32_t new_size; 411 uint32_t old_size; 412 uint8_t version; 413 uint8_t pad[31]; 414 /* Added to livepatch payload version 2: */ 415 uint8_t applied; 416 uint8_t _pad[7]; 417 livepatch_expectation_t expect; 418 }; 419 420 /* Our replacement function for xen_extra_version. */ 421 const char *xen_hello_world(void) 422 { 423 return "Hello World"; 424 } 425 426 static unsigned char patch_this_fnc[] = "xen_extra_version"; 427 428 struct livepatch_func livepatch_hello_world = { 429 .version = LIVEPATCH_PAYLOAD_VERSION, 430 .name = patch_this_fnc, 431 .new_addr = xen_hello_world, 432 .old_addr = (void *)0xffff82d08013963c, /* Extracted from xen-syms. */ 433 .new_size = 13, /* To be be computed by scripts. */ 434 .old_size = 13, /* -----------""--------------- */ 435 /* Added to livepatch payload version 2: */ 436 .expect = { /* All fields to be filled manually */ 437 .enabled = 1, 438 .len = 5, 439 .rsv = 0, 440 .data = { 0x48, 0x8d, 0x05, 0x33, 0x1C } 441 }, 442 } __attribute__((__section__(".livepatch.funcs"))); 443 444Code must be compiled with `-fPIC`. 445 446### Hooks 447 448#### .livepatch.hooks.load and .livepatch.hooks.unload 449 450This section contains an array of function pointers to be executed 451before payload is being applied (.livepatch.funcs) or after reverting 452the payload. This is useful to prepare data structures that need to 453be modified patching. 454 455Each entry in this array is eight bytes. 456 457The type definition of the function are as follow: 458 459 typedef void (*livepatch_loadcall_t)(void); 460 typedef void (*livepatch_unloadcall_t)(void); 461 462#### .livepatch.hooks.preapply 463 464This section contains a pointer to a single function pointer to be executed 465before apply action is scheduled (and thereby before CPUs are put into 466quiescing zone). This is useful to prevent from applying a payload when 467certain expected conditions aren't met or when mutating actions implemented 468in the hook fail or cannot be executed. 469This type of hooks do have access to payload structure. 470 471Each entry in this array is eight bytes. 472 473The type definition of the function are as follow: 474 475 typedef int livepatch_precall_t(livepatch_payload_t *arg); 476 477#### .livepatch.hooks.postapply 478 479This section contains a pointer to a single function pointer to be executed 480after apply action has finished and after all CPUs left the quiescing zone. 481This is useful to provide an ability to follow up on actions performed by 482the preapply hook. Especially, when module application was successful or to 483be able to undo certain preparation steps of the preapply hook in case of a 484failure. The success/failure error code is provided to the postapply hooks 485via the `rc` field of the payload structure. 486This type of hooks do have access to payload structure. 487 488Each entry in this array is eight bytes. 489 490The type definition of the function are as follow: 491 492 typedef void livepatch_postcall_t(livepatch_payload_t *arg); 493 494#### .livepatch.hooks.prerevert 495 496This section contains a pointer to a single function pointer to be executed 497before revert action is scheduled (and thereby before CPUs are put into 498quiescing zone). This is useful to prevent from reverting a payload when 499certain expected conditions aren't met or when mutating actions implemented 500in the hook fail or cannot be executed. 501This type of hooks do have access to payload structure. 502 503Each entry in this array is eight bytes. 504 505The type definition of the function are as follow: 506 507 typedef int livepatch_precall_t(livepatch_payload_t *arg); 508 509#### .livepatch.hooks.postrevert 510 511This section contains a pointer to a single function pointer to be executed 512after revert action has finished and after all CPUs left the quiescing zone. 513This is useful to provide an ability to perform cleanup of all previously 514executed mutating actions in order to restore the original system state from 515before the current payload application. The success/failure error code is 516provided to the postrevert hook via the `rc` field of the payload structure. 517This type of hooks do have access to payload structure. 518 519Each entry in this array is eight bytes. 520 521The type definition of the function are as follow: 522 523 typedef void livepatch_postcall_t(livepatch_payload_t *arg); 524 525#### .livepatch.hooks.apply and .livepatch.hooks.revert 526 527This section contains a pointer to a single function pointer to be executed 528instead of a default apply (or revert) action function. This is useful to 529replace or augment default behavior of the apply (or revert) action that 530requires all CPUs to be in the quiescing zone. 531This type of hooks do have access to payload structure. 532 533Each entry in this array is eight bytes. 534 535The type definition of the function are as follow: 536 537 typedef int livepatch_actioncall_t(livepatch_payload_t *arg); 538 539### .livepatch.xen_depends, .livepatch.depends and .note.gnu.build-id 540 541To support dependencies checking and safe loading (to load the 542appropiate payload against the right hypervisor) there is a need 543to embbed an build-id dependency. 544 545This is done by the payload containing sections `.livepatch.xen_depends` 546and `.livepatch.depends` which follow the format of an ELF Note. 547The contents of these (name, and description) are specific to the linker 548utilized to build the hypevisor and payload. 549 550If GNU linker is used then the name is `GNU` and the description 551is a NT_GNU_BUILD_ID type ID. The description can be an SHA1 552checksum, MD5 checksum or any unique value. 553 554The size of these structures varies with the `--build-id` linker option. 555 556There are two kinds of build-id dependencies: 557 558 * Xen build-id dependency (.livepatch.xen_depends section) 559 * previous payload build-id dependency (.livepatch.depends section) 560 561See "Live patch interdependencies" for more information. 562 563## Hypercalls 564 565We will employ the sub operations of the system management hypercall (sysctl). 566There are to be four sub-operations: 567 568 * upload the payloads. 569 * listing of payloads summary uploaded and their state. 570 * getting an particular payload summary and its state. 571 * command to apply, delete, or revert the payload. 572 573Most of the actions are asynchronous therefore the caller is responsible 574to verify that it has been applied properly by retrieving the summary of it 575and verifying that there are no error codes associated with the payload. 576 577We **MUST** make some of them asynchronous due to the nature of patching 578it requires every physical CPU to be lock-step with each other. 579The patching mechanism while an implementation detail, is not an short 580operation and as such the design **MUST** assume it will be an long-running 581operation. 582 583The sub-operations will spell out how preemption is to be handled (if at all). 584 585Furthermore it is possible to have multiple different payloads for the same 586function. As such an unique name per payload has to be visible to allow proper manipulation. 587 588The hypercall is part of the `xen_sysctl`. The top level structure contains 589one uint32_t to determine the sub-operations and one padding field which 590*MUST* always be zero. 591 592 struct xen_sysctl_livepatch_op { 593 uint32_t cmd; /* IN: XEN_SYSCTL_LIVEPATCH_*. */ 594 uint32_t pad; /* IN: Always zero. */ 595 union { 596 ... see below ... 597 } u; 598 }; 599 600while the rest of hypercall specific structures are part of the this structure. 601 602### Basic type: struct xen_livepatch_name 603 604Most of the hypercalls employ an shared structure called `struct xen_livepatch_name` 605which contains: 606 607 * `name` - pointer where the string for the name is located. 608 * `size` - the size of the string 609 * `pad` - padding - to be zero. 610 611The structure is as follow: 612 613 /* 614 * Uniquely identifies the payload. Should be human readable. 615 * Includes the NUL terminator 616 */ 617 #define XEN_LIVEPATCH_NAME_SIZE 128 618 struct xen_livepatch_name { 619 XEN_GUEST_HANDLE_64(char) name; /* IN, pointer to name. */ 620 uint16_t size; /* IN, size of name. May be upto 621 XEN_LIVEPATCH_NAME_SIZE. */ 622 uint16_t pad[3]; /* IN: MUST be zero. */ 623 }; 624 625### XEN_SYSCTL_LIVEPATCH_UPLOAD (0) 626 627Upload a payload to the hypervisor. The payload is verified 628against basic checks and if there are any issues the proper return code 629will be returned. The payload is not applied at this time - that is 630controlled by *XEN_SYSCTL_LIVEPATCH_ACTION*. 631 632The caller provides: 633 634 * A `struct xen_livepatch_name` called `name` which has the unique name. 635 * `size` the size of the ELF payload (in bytes). 636 * `payload` the virtual address of where the ELF payload is. 637 638The `name` could be an UUID that stays fixed forever for a given 639payload. It can be embedded into the ELF payload at creation time 640and extracted by tools. 641 642The return value is zero if the payload was succesfully uploaded. 643Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported. 644 645The `payload` is the ELF payload as mentioned in the `Payload format` section. 646 647The structure is as follow: 648 649 struct xen_sysctl_livepatch_upload { 650 xen_livepatch_name_t name; /* IN, name of the patch. */ 651 uint64_t size; /* IN, size of the ELF file. */ 652 XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */ 653 }; 654 655### XEN_SYSCTL_LIVEPATCH_GET (1) 656 657Retrieve an status of an specific payload. This caller provides: 658 659 * A `struct xen_livepatch_name` called `name` which has the unique name. 660 * A `struct xen_livepatch_status` structure. The member values will 661 be over-written upon completion. 662 663Upon completion the `struct xen_livepatch_status` is updated. 664 665 * `status` - indicates the current status of the payload: 666 * *LIVEPATCH_STATUS_CHECKED* (1) loaded and the ELF payload safety checks passed. 667 * *LIVEPATCH_STATUS_APPLIED* (2) loaded, checked, and applied. 668 * No other value is possible. 669 * `rc` - -XEN_EXX type errors encountered while performing the last 670 LIVEPATCH_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which 671 respectively mean: success or operation in progress. Other values 672 imply an error occurred. If there is an error in `rc`, `status` will **NOT** 673 have changed. 674 675The return value of the hypercall is zero on success and -XEN_EXX on failure. 676(Note that the `rc` value can be different from the return value, as in 677rc = -XEN_EAGAIN and return value can be 0). 678 679For example, supposing there is an payload: 680 681 status: LIVEPATCH_STATUS_CHECKED 682 rc: 0 683 684We apply an action - LIVEPATCH_ACTION_REVERT - to revert it (which won't work 685as we have not even applied it. Afterwards we will have: 686 687 status: LIVEPATCH_STATUS_CHECKED 688 rc: -XEN_EINVAL 689 690It has failed but it remains loaded. 691 692This operation is synchronous and does not require preemption. 693 694The structure is as follow: 695 696 struct xen_livepatch_status { 697 #define LIVEPATCH_STATUS_CHECKED 1 698 #define LIVEPATCH_STATUS_APPLIED 2 699 uint32_t state; /* OUT: LIVEPATCH_STATE_*. */ 700 int32_t rc; /* OUT: 0 if no error, otherwise -XEN_EXX. */ 701 }; 702 703 struct xen_sysctl_livepatch_get { 704 xen_livepatch_name_t name; /* IN, the name of the payload. */ 705 xen_livepatch_status_t status; /* IN/OUT: status of the payload. */ 706 }; 707 708### XEN_SYSCTL_LIVEPATCH_LIST (2) 709 710Retrieve an array of abbreviated status, names and metadata of payloads that are 711loaded in the hypervisor. 712 713The caller provides: 714 715 * `version`. Version of the payload. Caller should re-use the field provided by 716 the hypervisor. If the value differs the data is stale. 717 * `idx` Index iterator. The index into the hypervisor's payload count. It is 718 recommended that on first invocation zero be used so that `nr` (which the 719 hypervisor will update with the remaining payload count) be provided. 720 Also the hypervisor will provide `version` with the most current value, 721 calculated total size of all payloads' names and calculated total size of 722 all payload's metadata. 723 * `nr` The max number of entries to populate. Can be zero which will result 724 in the hypercall being a probing one and return the number of payloads 725 (and update the `version`). 726 * `pad` - *MUST* be zero. 727 * `status` Virtual address of where to write `struct xen_livepatch_status` 728 structures. Caller *MUST* allocate up to `nr` of them. 729 * `name` - Virtual address of where to write the unique name of the payloads. 730 Caller *MUST* allocate enough space to be able to store all received data 731 (i.e. total allocated space *MUST* match the `name_total_size` value 732 provided by the hypervisor). Individual payload name cannot be longer than 733 **XEN_LIVEPATCH_NAME_SIZE** bytes. Note that **XEN_LIVEPATCH_NAME_SIZE** 734 includes the NUL terminator. 735 * `len` - Virtual address of where to write the length of each unique name 736 of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be 737 of sizeof(uint32_t) (4 bytes). 738 * `metadata` - Virtual address of where to write the metadata of the payloads. 739 Caller *MUST* allocate enough space to be able to store all received data 740 (i.e. total allocated space *MUST* match the `metadata_total_size` value 741 provided by the hypervisor). Individual payload metadata string can be of 742 arbitrary length. The metadata string format is: key=value\\0...key=value\\0. 743 * `metadata_len` - Virtual address of where to write the length of each metadata 744 string of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* 745 be of sizeof(uint32_t) (4 bytes). 746 747If the hypercall returns an positive number, it is the number (upto `nr` 748provided to the hypercall) of the payloads returned, along with `nr` updated 749with the number of remaining payloads, `version` updated (it may be the same 750across hypercalls - if it varies the data is stale and further calls could 751fail), `name_total_size` and `metadata_total_size` containing total sizes of 752transferred data for both the arrays. 753The `status`, `name`, `len`, `metadata` and `metadata_len` are updated at their 754designed index value (`idx`) with the returned value of data. 755 756If the hypercall returns -XEN_E2BIG the `nr` is too big and should be 757lowered. 758 759If the hypercall returns an zero value there are no more payloads. 760 761Note that due to the asynchronous nature of hypercalls the control domain might 762have added or removed a number of payloads making this information stale. It is 763the responsibility of the toolstack to use the `version` field to check 764between each invocation. if the version differs it should discard the stale 765data and start from scratch. It is OK for the toolstack to use the new 766`version` field. 767 768The `struct xen_livepatch_status` structure contains an status of payload which includes: 769 770 * `status` - indicates the current status of the payload: 771 * *LIVEPATCH_STATUS_CHECKED* (1) loaded and the ELF payload safety checks passed. 772 * *LIVEPATCH_STATUS_APPLIED* (2) loaded, checked, and applied. 773 * No other value is possible. 774 * `rc` - -XEN_EXX type errors encountered while performing the last 775 LIVEPATCH_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which 776 respectively mean: success or operation in progress. Other values 777 imply an error occurred. If there is an error in `rc`, `status` will **NOT** 778 have changed. 779 780The structure is as follow: 781 782 struct xen_sysctl_livepatch_list { 783 uint32_t version; /* OUT: Hypervisor stamps value. 784 If varies between calls, we are 785 getting stale data. */ 786 uint32_t idx; /* IN: Index into hypervisor list. */ 787 uint32_t nr; /* IN: How many status, names, and len 788 should be filled out. Can be zero to get 789 amount of payloads and version. 790 OUT: How many payloads left. */ 791 uint32_t pad; /* IN: Must be zero. */ 792 uint32_t name_total_size; /* OUT: Total size of all transfer names */ 793 uint32_t metadata_total_size; /* OUT: Total size of all transfer metadata */ 794 XEN_GUEST_HANDLE_64(xen_livepatch_status_t) status; /* OUT. Must have enough 795 space allocate for nr of them. */ 796 XEN_GUEST_HANDLE_64(char) name; /* OUT: Array of names. Each member 797 may have an arbitrary length up to 798 XEN_LIVEPATCH_NAME_SIZE bytes. Must have 799 nr of them. */ 800 XEN_GUEST_HANDLE_64(uint32) len; /* OUT: Array of lengths of name's. 801 Must have nr of them. */ 802 XEN_GUEST_HANDLE_64(char) metadata; /* OUT: Array of metadata strings. Each 803 member may have an arbitrary length. 804 Must have nr of them. */ 805 XEN_GUEST_HANDLE_64(uint32) metadata_len; /* OUT: Array of lengths of metadata's. 806 Must have nr of them. */ 807 808 }; 809 810### XEN_SYSCTL_LIVEPATCH_ACTION (3) 811 812Perform an operation on the payload structure referenced by the `name` field. 813The operation request is asynchronous and the status should be retrieved 814by using either **XEN_SYSCTL_LIVEPATCH_GET** or **XEN_SYSCTL_LIVEPATCH_LIST** hypercall. 815 816The caller provides: 817 818 * A `struct xen_livepatch_name` `name` containing the unique name. 819 * `cmd` The command requested: 820 * *LIVEPATCH_ACTION_UNLOAD* (1) Unload the payload. 821 Any further hypercalls against the `name` will result in failure unless 822 **XEN_SYSCTL_LIVEPATCH_UPLOAD** hypercall is perfomed with same `name`. 823 * *LIVEPATCH_ACTION_REVERT* (2) Revert the payload. If the operation takes 824 more time than the upper bound of time the `rc` in `xen_livepatch_status` 825 retrieved via **XEN_SYSCTL_LIVEPATCH_GET** will be -XEN_EBUSY. 826 * *LIVEPATCH_ACTION_APPLY* (3) Apply the payload. If the operation takes 827 more time than the upper bound of time the `rc` in `xen_livepatch_status` 828 retrieved via **XEN_SYSCTL_LIVEPATCH_GET** will be -XEN_EBUSY. 829 * *LIVEPATCH_ACTION_REPLACE* (4) Revert all applied payloads and apply this 830 payload. If the operation takes more time than the upper bound of time 831 the `rc` in `xen_livepatch_status` retrieved via **XEN_SYSCTL_LIVEPATCH_GET** 832 will be -XEN_EBUSY. 833 * `time` The upper bound of time (ns) the cmd should take. Zero means to use 834 the hypervisor default. If within the time the operation does not succeed 835 the operation would go in error state. 836 * `flags` provides additional parameters for an action: 837 * *LIVEPATCH_ACTION_APPLY_NODEPS* (1) Apply action ignores inter-module 838 buildid dependency. Checks only if module is built for given hypervisor by 839 comparing buildid. 840 * `pad` - *MUST* be zero. 841 842The return value will be zero unless the provided fields are incorrect. 843 844The structure is as follow: 845 846 #define LIVEPATCH_ACTION_UNLOAD 1 847 #define LIVEPATCH_ACTION_REVERT 2 848 #define LIVEPATCH_ACTION_APPLY 3 849 #define LIVEPATCH_ACTION_REPLACE 4 850 struct xen_sysctl_livepatch_action { 851 xen_livepatch_name_t name; /* IN, name of the patch. */ 852 uint32_t cmd; /* IN: LIVEPATCH_ACTION_* */ 853 uint32_t time; /* IN: If zero then uses */ 854 /* hypervisor default. */ 855 /* Or upper bound of time (ns) */ 856 /* for operation to take. */ 857 uint32_t flags; /* IN: action flags. */ 858 /* Provide additional parameters */ 859 /* for an action. */ 860 uint32_t pad; /* IN: Always zero. */ 861 }; 862 863 864## State diagrams of LIVEPATCH_ACTION commands. 865 866There is a strict ordering state of what the commands can be. 867The LIVEPATCH_ACTION prefix has been dropped to easy reading and 868does not include the LIVEPATCH_STATES: 869 870 /->\ 871 \ / 872 UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\ 873 \ | 874 \-------------------<-------------/ 875 876## State transition table of LIVEPATCH_ACTION commands and LIVEPATCH_STATUS. 877 878Note that: 879 880 - The CHECKED state is the starting one achieved with *XEN_SYSCTL_LIVEPATCH_UPLOAD* hypercall. 881 - The REVERT operation on success will automatically move to the CHECKED state. 882 - There are two STATES: CHECKED and APPLIED. 883 - There are four actions (aka commands): APPLY, REPLACE, REVERT, and UNLOAD. 884 885The state transition table of valid states and action states: 886 887 +---------+---------+--------------------------------+-------+--------+ 888 | ACTION | Current | Result | Next STATE: | 889 | ACTION | STATE | |CHECKED|APPLIED | 890 +---------+----------+-------------------------------+-------+--------+ 891 | UNLOAD | CHECKED | Unload payload. Always works. | | | 892 | | | No next states. | | | 893 +---------+---------+--------------------------------+-------+--------+ 894 | APPLY | CHECKED | Apply payload (success). | | x | 895 +---------+---------+--------------------------------+-------+--------+ 896 | APPLY | CHECKED | Apply payload (error|timeout) | x | | 897 +---------+---------+--------------------------------+-------+--------+ 898 | REPLACE | CHECKED | Revert payloads and apply new | | x | 899 | | | payload with success. | | | 900 +---------+---------+--------------------------------+-------+--------+ 901 | REPLACE | CHECKED | Revert payloads and apply new | x | | 902 | | | payload with error. | | | 903 +---------+---------+--------------------------------+-------+--------+ 904 | REVERT | APPLIED | Revert payload (success). | x | | 905 +---------+---------+--------------------------------+-------+--------+ 906 | REVERT | APPLIED | Revert payload (error|timeout) | | x | 907 +---------+---------+--------------------------------+-------+--------+ 908 909All the other state transitions are invalid. 910 911## Sequence of events. 912 913The normal sequence of events is to: 914 915 1. *XEN_SYSCTL_LIVEPATCH_UPLOAD* to upload the payload. If there are errors *STOP* here. 916 2. *XEN_SYSCTL_LIVEPATCH_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step. 917 3. *XEN_SYSCTL_LIVEPATCH_ACTION* with *LIVEPATCH_ACTION_APPLY* to apply the patch. 918 4. *XEN_SYSCTL_LIVEPATCH_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success. 919 920 921## Addendum 922 923Implementation quirks should not be discussed in a design document. 924 925However these observations can provide aid when developing against this 926document. 927 928 929### Alternative assembler 930 931Alternative assembler is a mechanism to use different instructions depending 932on what the CPU supports. This is done by providing multiple streams of code 933that can be patched in - or if the CPU does not support it - padded with 934`nop` operations. The alternative assembler macros cause the compiler to 935expand the code to place a most generic code in place - emit a special 936ELF .section header to tag this location. During run-time the hypervisor 937can leave the areas alone or patch them with an better suited opcodes. 938 939Note that patching functions that copy to or from guest memory requires 940to support alternative support. For example this can be due to SMAP 941(specifically *stac* and *clac* operations) which is enabled on Broadwell 942and later architectures. It may be related to other alternative instructions. 943 944### When to patch 945 946During the discussion on the design two candidates bubbled where 947the call stack for each CPU would be deterministic. This would 948minimize the chance of the patch not being applied due to safety 949checks failing. Safety checks such as not patching code which 950is on the stack - which can lead to corruption. 951 952#### Rendezvous code instead of stop_machine for patching 953 954The hypervisor's time rendezvous code runs synchronously across all CPUs 955every second. Using the `stop_machine` to patch can stall the time rendezvous 956code and result in NMI. As such having the patching be done at the tail 957of rendezvous code should avoid this problem. 958 959However the entrance point for that code is `do_softirq -> 960timer_softirq_action -> time_calibration` which ends up calling 961`on_selected_cpus` on remote CPUs. 962 963The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the 964desired function. 965 966#### Before entering the guest code. 967 968Before we call VMXResume we check whether any soft IRQs need to be executed. 969This is a good spot because all Xen stacks are effectively empty at 970that point. 971 972To randezvous all the CPUs an barrier with an maximum timeout (which 973could be adjusted), combined with forcing all other CPUs through the 974hypervisor with IPIs, can be utilized to execute lockstep instructions 975on all CPUs. 976 977The approach is similar in concept to `stop_machine` and the time rendezvous 978but is time-bound. However the local CPU stack is much shorter and 979a lot more deterministic. 980 981This is implemented in the Xen hypervisor. 982 983### Compiling the hypervisor code 984 985Hotpatch generation often requires support for compiling the target 986with `-ffunction-sections` / `-fdata-sections`. Changes would have to 987be done to the linker scripts to support this. 988 989### Generation of Live Patch ELF payloads 990 991The design of that is not discussed in this design. 992 993This is implemented in a seperate tool which lives in a seperate 994GIT repo. 995 996Currently it resides at git://xenbits.xen.org/livepatch-build-tools.git 997 998### Exception tables and symbol tables growth 999 1000We may need support for adapting or augmenting exception tables if 1001patching such code. Hotpatches may need to bring their own small 1002exception tables (similar to how Linux modules support this). 1003 1004If supporting hotpatches that introduce additional exception-locations 1005is not important, one could also change the exception table in-place 1006and reorder it afterwards. 1007 1008As found almost every patch (XSA) to a non-trivial function requires 1009additional entries in the exception table and/or the bug frames. 1010 1011This is implemented in the Xen hypervisor. 1012 1013### .rodata sections 1014 1015The patching might require strings to be updated as well. As such we must be 1016also able to patch the strings as needed. This sounds simple - but the compiler 1017has a habit of coalescing strings that are the same - which means if we in-place 1018alter the strings - other users will be inadvertently affected as well. 1019 1020This is also where pointers to functions live - and we may need to patch this 1021as well. And switch-style jump tables. 1022 1023To guard against that we must be prepared to do patching similar to 1024trampoline patching or in-line depending on the flavour. If we can 1025do in-line patching we would need to: 1026 1027 * Alter `.rodata` to be writeable. 1028 * Inline patch. 1029 * Alter `.rodata` to be read-only. 1030 1031If are doing trampoline patching we would need to: 1032 1033 * Allocate a new memory location for the string. 1034 * All locations which use this string will have to be updated to use the 1035 offset to the string. 1036 * Mark the region RO when we are done. 1037 1038The trampoline patching is implemented in the Xen hypervisor. 1039 1040### .bss and .data sections. 1041 1042In place patching writable data is not suitable as it is unclear what should be done 1043depending on the current state of data. As such it should not be attempted. 1044 1045However, functions which are being patched can bring in changes to strings 1046(.data or .rodata section changes), or even to .bss sections. 1047 1048As such the ELF payload can introduce new .rodata, .bss, and .data sections. 1049Patching in the new function will end up also patching in the new .rodata 1050section and the new function will reference the new string in the new 1051.rodata section. 1052 1053This is implemented in the Xen hypervisor. 1054 1055### Security 1056 1057Only the privileged domain should be allowed to do this operation. 1058 1059### Live patch interdependencies 1060 1061Live patch patches interdependencies are tricky. 1062 1063There are the ways this can be addressed: 1064 * A single large patch that subsumes and replaces all previous ones. 1065 Over the life-time of patching the hypervisor this large patch 1066 grows to accumulate all the code changes. 1067 * Hotpatch stack - where an mechanism exists that loads the hotpatches 1068 in the same order they were built in. We would need an build-id 1069 of the hypevisor to make sure the hot-patches are build against the 1070 correct build. 1071 * Payload containing the old code to check against that. That allows 1072 the hotpatches to be loaded indepedently (if they don't overlap) - or 1073 if the old code also containst previously patched code - even if they 1074 overlap. 1075 1076The disadvantage of the first large patch is that it can grow over 1077time and not provide an bisection mechanism to identify faulty patches. 1078 1079The hot-patch stack puts stricts requirements on the order of the patches 1080being loaded and requires an hypervisor build-id to match against. 1081 1082The old code allows much more flexibility and an additional guard, 1083but is more complex to implement. 1084 1085The second option which requires an build-id of the hypervisor 1086is implemented in the Xen hypervisor. 1087 1088Specifically each payload has three build-id ELF notes: 1089 * The build-id of the payload itself (generated via --build-id). 1090 * The build-id of the Xen hypervisor it depends on (extracted from the 1091 hypervisor during build time). 1092 * The build-id of the payload it depends on (extracted from the 1093 the previous payload or hypervisor during build time). 1094 1095This means that every payload depends on the hypervisor build-id and on 1096the build-id of the previous payload in the stack. 1097The very first payload depends on the hypervisor build-id only. 1098 1099# Not Yet Done 1100 1101This is for further development of live patching. 1102 1103## TODO Goals 1104 1105The implementation must also have a mechanism for (in no particular order): 1106 1107 * Be able to lookup in the Xen hypervisor the symbol names of functions from the 1108 ELF payload. (Either as `symbol` or `symbol`+`offset`). 1109 * Be able to patch .rodata, .bss, and .data sections. 1110 * Deal with NMI/MCE checks during patching instead of ignoring them. 1111 * Further safety checks (blacklist of which functions cannot be patched, check 1112 the stack, make sure the payload is built with same compiler as hypervisor). 1113 Specifically we want to make sure that live patching codepaths cannot be patched. 1114 * NOP out the code sequence if `new_size` is zero. 1115 * Deal with other relocation types: `R_X86_64_[8,16,32,32S]`, `R_X86_64_PC[8,16,64]` 1116 in payload file. 1117 1118### Handle inlined \__LINE__ 1119 1120This problem is related to hotpatch construction 1121and potentially has influence on the design of the hotpatching 1122infrastructure in Xen. 1123 1124For example: 1125 1126We have file1.c with functions f1 and f2 (in that order). f2 contains a 1127BUG() (or WARN()) macro and at that point embeds the source line number 1128into the generated code for f2. 1129 1130Now we want to hotpatch f1 and the hotpatch source-code patch adds 2 1131lines to f1 and as a consequence shifts out f2 by two lines. The newly 1132constructed file1.o will now contain differences in both binary 1133functions f1 (because we actually changed it with the applied patch) and 1134f2 (because the contained BUG macro embeds the new line number). 1135 1136Without additional information, an algorithm comparing file1.o before 1137and after hotpatch application will determine both functions to be 1138changed and will have to include both into the binary hotpatch. 1139 1140Options: 1141 11421. Transform source code patches for hotpatches to be line-neutral for 1143 each chunk. This can be done in almost all cases with either 1144 reformatting of the source code or by introducing artificial 1145 preprocessor "#line n" directives to adjust for the introduced 1146 differences. 1147 1148 This approach is low-tech and simple. Potentially generated 1149 backtraces and existing debug information refers to the original 1150 build and does not reflect hotpatching state except for actually 1151 hotpatched functions but should be mostly correct. 1152 11532. Ignoring the problem and living with artificially large hotpatches 1154 that unnecessarily patch many functions. 1155 1156 This approach might lead to some very large hotpatches depending on 1157 content of specific source file. It may also trigger pulling in 1158 functions into the hotpatch that cannot reasonable be hotpatched due 1159 to limitations of a hotpatching framework (init-sections, parts of 1160 the hotpatching framework itself, ...) and may thereby prevent us 1161 from patching a specific problem. 1162 1163 The decision between 1. and 2. can be made on a patch--by-patch 1164 basis. 1165 11663. Introducing an indirection table for storing line numbers and 1167 treating that specially for binary diffing. Linux may follow 1168 this approach. 1169 1170 We might either use this indirection table for runtime use and patch 1171 that with each hotpatch (similarly to exception tables) or we might 1172 purely use it when building hotpatches to ignore functions that only 1173 differ at exactly the location where a line-number is embedded. 1174 1175For BUG(), WARN(), etc., the line number is embedded into the bug frame, not 1176the function itself. 1177 1178Similar considerations are true to a lesser extent for \__FILE__, but it 1179could be argued that file renaming should be done outside of hotpatches. 1180 1181## Signature checking requirements. 1182 1183The signature checking requires that the layout of the data in memory 1184**MUST** be same for signature to be verified. This means that the payload 1185data layout in ELF format **MUST** match what the hypervisor would be 1186expecting such that it can properly do signature verification. 1187 1188The signature is based on the all of the payloads continuously laid out 1189in memory. The signature is to be appended at the end of the ELF payload 1190prefixed with the string '`~Module signature appended~\n`', followed by 1191an signature header then followed by the signature, key identifier, and signers 1192name. 1193 1194Specifically the signature header would be: 1195 1196 #define PKEY_ALGO_DSA 0 1197 #define PKEY_ALGO_RSA 1 1198 1199 #define PKEY_ID_PGP 0 /* OpenPGP generated key ID */ 1200 #define PKEY_ID_X509 1 /* X.509 arbitrary subjectKeyIdentifier */ 1201 1202 #define HASH_ALGO_MD4 0 1203 #define HASH_ALGO_MD5 1 1204 #define HASH_ALGO_SHA1 2 1205 #define HASH_ALGO_RIPE_MD_160 3 1206 #define HASH_ALGO_SHA256 4 1207 #define HASH_ALGO_SHA384 5 1208 #define HASH_ALGO_SHA512 6 1209 #define HASH_ALGO_SHA224 7 1210 #define HASH_ALGO_RIPE_MD_128 8 1211 #define HASH_ALGO_RIPE_MD_256 9 1212 #define HASH_ALGO_RIPE_MD_320 10 1213 #define HASH_ALGO_WP_256 11 1214 #define HASH_ALGO_WP_384 12 1215 #define HASH_ALGO_WP_512 13 1216 #define HASH_ALGO_TGR_128 14 1217 #define HASH_ALGO_TGR_160 15 1218 #define HASH_ALGO_TGR_192 16 1219 1220 struct elf_payload_signature { 1221 u8 algo; /* Public-key crypto algorithm PKEY_ALGO_*. */ 1222 u8 hash; /* Digest algorithm: HASH_ALGO_*. */ 1223 u8 id_type; /* Key identifier type PKEY_ID*. */ 1224 u8 signer_len; /* Length of signer's name */ 1225 u8 key_id_len; /* Length of key identifier */ 1226 u8 __pad[3]; 1227 __be32 sig_len; /* Length of signature data */ 1228 }; 1229 1230(Note that this has been borrowed from Linux module signature code.). 1231 1232 1233### .bss and .data sections. 1234 1235In place patching writable data is not suitable as it is unclear what should be done 1236depending on the current state of data. As such it should not be attempted. 1237 1238That said we should provide hook functions so that the existing data 1239can be changed during payload application. 1240 1241To guarantee safety we disallow re-applying an payload after it has been 1242reverted. This is because we cannot guarantee that the state of .bss 1243and .data to be exactly as it was during loading. Hence the administrator 1244MUST unload the payload and upload it again to apply it. 1245 1246There is an exception to this: if the payload only has .livepatch.funcs; 1247and the .data or .bss sections are of zero length. 1248 1249### Inline patching 1250 1251The hypervisor should verify that the in-place patching would fit within 1252the code or data. 1253 1254### Trampoline (e9 opcode), x86 1255 1256The e9 opcode used for jmpq uses a 32-bit signed displacement. That means 1257we are limited to up to 2GB of virtual address to place the new code 1258from the old code. That should not be a problem since Xen hypervisor has 1259a very small footprint. 1260 1261However if we need - we can always add two trampolines. One at the 2GB 1262limit that calls the next trampoline. 1263 1264Please note there is a small limitation for trampolines in 1265function entries: The target function (+ trailing padding) must be able 1266to accomodate the trampoline. On x86 with +-2 GB relative jumps, 1267this means 5 bytes are required which means that `old_size` **MUST** be 1268at least five bytes if patching in trampoline. 1269 1270Depending on compiler settings, there are several functions in Xen that 1271are smaller (without inter-function padding). 1272 1273 readelf -sW xen-syms | grep " FUNC " | \ 1274 awk '{ if ($3 < 5) print $3, $4, $5, $8 }' 1275 1276 ... 1277 3 FUNC LOCAL wbinvd_ipi 1278 3 FUNC LOCAL shadow_l1_index 1279 ... 1280 1281A compile-time check for, e.g., a minimum alignment of functions or a 1282runtime check that verifies symbol size (+ padding to next symbols) for 1283that in the hypervisor is advised. 1284 1285The tool for generating payloads currently does perform a compile-time 1286check to ensure that the function to be replaced is large enough. 1287 1288#### Trampoline, ARM 1289 1290The unconditional branch instruction (for the encoding see the 1291DDI 0406C.c and DDI 0487A.j Architecture Reference Manual's). 1292with proper offset is used for an unconditional branch to the new code. 1293This means that that `old_size` **MUST** be at least four bytes if patching 1294in trampoline. 1295 1296The instruction offset is limited on ARM32 to +/- 32MB to displacement 1297and on ARM64 to +/- 128MB displacement. 1298 1299The new code is placed in the 8M - 10M virtual address space while the 1300Xen code is in 2M - 4M. That gives us enough space. 1301 1302The hypervisor also checks the displacement during loading of the payload. 1303