1# Xen Live Patching Design v2
2
3## Rationale
4
5A mechanism is required to binarily patch the running hypervisor with new
6opcodes that have come about due to primarily security updates.
7
8This document describes the design of the API that would allow us to
9upload to the hypervisor binary patches.
10
11The document is split in four sections:
12
13 * Detailed descriptions of the problem statement.
14 * Design of the data structures.
15 * Design of the hypercalls.
16 * Implementation notes that should be taken into consideration.
17
18
19## Glossary
20
21 * splice - patch in the binary code with new opcodes
22 * trampoline - a jump to a new instruction.
23 * payload - telemetries of the old code along with binary blob of the new
24   function (if needed).
25 * reloc - telemetries contained in the payload to construct proper trampoline.
26 * hook - an auxiliary function being called before, during or after payload
27          application or revert.
28 * quiescing zone - period when all CPUs are lock-step with each other.
29
30## History
31
32The document has gone under various reviews and only covers v1 design.
33
34The end of the document has a section titled `Not Yet Done` which
35outlines ideas and design for the future version of this work.
36
37## Multiple ways to patch
38
39The mechanism needs to be flexible to patch the hypervisor in multiple ways
40and be as simple as possible. The compiled code is contiguous in memory with
41no gaps - so we have no luxury of 'moving' existing code and must either
42insert a trampoline to the new code to be executed - or only modify in-place
43the code if there is sufficient space. The placement of new code has to be done
44by hypervisor and the virtual address for the new code is allocated dynamically.
45
46This implies that the hypervisor must compute the new offsets when splicing
47in the new trampoline code. Where the trampoline is added (inside
48the function we are patching or just the callers?) is also important.
49
50To lessen the amount of code in hypervisor, the consumer of the API
51is responsible for identifying which mechanism to employ and how many locations
52to patch. Combinations of modifying in-place code, adding trampoline, etc
53has to be supported. The API should allow read/write any memory within
54the hypervisor virtual address space.
55
56We must also have a mechanism to query what has been applied and a mechanism
57to revert it if needed.
58
59## Workflow
60
61The expected workflows of higher-level tools that manage multiple patches
62on production machines would be:
63
64 * The first obvious task is loading all available / suggested
65   hotpatches when they are available.
66 * Whenever new hotpatches are installed, they should be loaded too.
67 * One wants to query which modules have been loaded at runtime.
68 * If unloading is deemed safe (see unloading below), one may want to
69   support a workflow where a specific hotpatch is marked as bad and
70   unloaded.
71
72## Patching code
73
74The first mechanism to patch that comes in mind is in-place replacement.
75That is replace the affected code with new code. Unfortunately the x86
76ISA is variable size which places limits on how much space we have available
77to replace the instructions. That is not a problem if the change is smaller
78than the original opcode and we can fill it with nops. Problems will
79appear if the replacement code is longer.
80
81The second mechanism is by ti replace the call or jump to the
82old function with the address of the new function.
83
84A third mechanism is to add a jump to the new function at the
85start of the old function. N.B. The Xen hypervisor implements the third
86mechanism. See `Trampoline (e9 opcode)` section for more details.
87
88### Example of trampoline and in-place splicing
89
90As example we will assume the hypervisor does not have XSA-132 (see
91[domctl/sysctl: don't leak hypervisor stack to toolstacks](https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=4ff3449f0e9d175ceb9551d3f2aecb59273f639d))
92and we would like to binary patch the hypervisor with it. The original code
93looks as so:
94
95    48 89 e0                  mov    %rsp,%rax
96    48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
97
98while the new patched hypervisor would be:
99
100    48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)
101    48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)
102    48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)
103    48 89 e0                  mov    %rsp,%rax
104    48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax
105
106This is inside the `arch_do_domctl`. This new change adds 21 extra
107bytes of code which alters all the offsets inside the function. To alter
108these offsets and add the extra 21 bytes of code we might not have enough
109space in .text to squeeze this in.
110
111As such we could simplify this problem by only patching the site
112which calls `arch_do_domctl`:
113
114    do_domctl:
115    e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>
116
117with a new address for where the new `arch_do_domctl` would be (this
118area would be allocated dynamically).
119
120Astute readers will wonder what we need to do if we were to patch `do_domctl`
121- which is not called directly by hypervisor but on behalf of the guests via
122the `compat_hypercall_table` and `hypercall_table`.  Patching the offset in
123`hypercall_table` for `do_domctl`:
124
125    ffff82d08024d490:   79 30
126    ffff82d08024d492:   10 80 d0 82 ff ff
127
128with the new address where the new `do_domctl` is possible. The other
129place where it is used is in `hvm_hypercall64_table` which would need
130to be patched in a similar way. This would require an in-place splicing
131of the new virtual address of `arch_do_domctl`.
132
133In summary this example patched the callee of the affected function by
134
135 * Allocating memory for the new code to live in,
136 * Changing the virtual address in all the functions which called the old
137   code (computing the new offset, patching the callq with a new callq).
138 * Changing the function pointer tables with the new virtual address of
139   the function (splicing in the new virtual address). Since this table
140   resides in the .rodata section we would need to temporarily change the
141   page table permissions during this part.
142
143However it has drawbacks - the safety checks which have to make sure
144the function is not on the stack - must also check every caller. For some
145patches this could mean - if there were an sufficient large amount of
146callers - that we would never be able to apply the update.
147
148Having the patching done at predetermined instances where the stacks
149are not deep mostly solves this problem.
150
151### Example of different trampoline patching.
152
153An alternative mechanism exists where we can insert a trampoline in the
154existing function to be patched to jump directly to the new code. This
155lessens the locations to be patched to one but it puts pressure on the
156CPU branching logic (I-cache, but it is just one unconditional jump).
157
158For this example we will assume that the hypervisor has not been compiled with
159XSA-125 (see
160[pre-fill structures for certain HYPERVISOR_xen_version sub-ops](https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=fe2e079f642effb3d24a6e1a7096ef26e691d93e))
161which mem-sets an structure in `xen_version` hypercall. This function is not
162called **anywhere** in the hypervisor (it is called by the guest) but
163referenced in the `compat_hypercall_table` and `hypercall_table` (and
164indirectly called from that). Patching the offset in `hypercall_table` for the
165old `do_xen_version`:
166
167    ffff82d08024b270 <hypercall_table>:
168    ...
169    ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff
170
171with the new address where the new `do_xen_version` is possible. The other
172place where it is used is in `hvm_hypercall64_table` which would need
173to be patched in a similar way. This would require an in-place splicing
174of the new virtual address of `do_xen_version`.
175
176An alternative solution would be to patch insert a trampoline in the
177old `do_xen_version` function to directly jump to the new `do_xen_version`:
178
179    ffff82d080112f9e do_xen_version:
180    ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax
181    ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi
182    ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 ; do_xen_version+0x534
183
184with:
185
186    ffff82d080112f9e do_xen_version:
187    ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]
188
189which would lessen the amount of patching to just one location.
190
191In summary this example patched the affected function to jump to the
192new replacement function which required:
193
194 * Allocating memory for the new code to live in,
195 * Inserting trampoline with new offset in the old function to point to the
196   new function.
197 * Optionally we can insert in the old function a trampoline jump to an function
198   providing an BUG_ON to catch errant code.
199
200The disadvantage of this are that the unconditional jump will consume a small
201I-cache penalty. However the simplicity of the patching and higher chance
202of passing safety checks make this a worthwhile option.
203
204This patching has a similar drawback as inline patching - the safety
205checks have to make sure the function is not on the stack. However
206since we are replacing at a higher level (a full function as opposed
207to various offsets within functions) the checks are simpler.
208
209Having the patching done at predetermined instances where the stacks
210are not deep mostly solves this problem as well.
211
212### Security
213
214With this method we can re-write the hypervisor - and as such we **MUST** be
215diligent in only allowing certain guests to perform this operation.
216
217Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
218of the payload to be certain it came from a trusted source and integrity
219was intact.
220
221As such the hypercall **MUST** support an XSM policy to limit what the guest
222is allowed to invoke. If the system is booted with signature checking the
223signature checking will be enforced.
224
225## Design of payload format
226
227The payload **MUST** contain enough data to allow us to apply the update
228and also safely reverse it. As such we **MUST** know:
229
230 * The locations in memory to be patched. This can be determined dynamically
231   via symbols or via virtual addresses.
232 * The new code that will be patched in.
233
234This binary format can be constructed using an custom binary format but
235there are severe disadvantages of it:
236
237 * The format might need to be changed and we need an mechanism to accommodate
238   that.
239 * It has to be platform agnostic.
240 * Easily constructed using existing tools.
241
242As such having the payload in an ELF file is the sensible way. We would be
243carrying the various sets of structures (and data) in the ELF sections under
244different names and with definitions.
245
246Note that every structure has padding. This is added so that the hypervisor
247can re-use those fields as it sees fit.
248
249Earlier design attempted to ineptly explain the relations of the ELF sections
250to each other without using proper ELF mechanism (sh_info, sh_link, data
251structures using Elf types, etc). This design will explain the structures
252and how they are used together and not dig in the ELF format - except mention
253that the section names should match the structure names.
254
255The Xen Live Patch payload is a relocatable ELF binary. A typical binary would have:
256
257 * One or more .text sections.
258 * Zero or more read-only data sections.
259 * Zero or more data sections.
260 * Relocations for each of these sections.
261
262It may also have some architecture-specific sections. For example:
263
264 * Alternatives instructions.
265 * Bug frames.
266 * Exception tables.
267 * Relocations for each of these sections.
268
269The Xen Live Patch core code loads the payload as a standard ELF binary, relocates it
270and handles the architecture-specifc sections as needed. This process is much
271like what the Linux kernel module loader does.
272
273The payload contains at least three sections:
274
275 * `.livepatch.funcs` - which is an array of livepatch_func structures.
276   and/or any of:
277 * `.livepatch.hooks.{preapply,postapply,prerevert,postrevert}'
278 * `.livepatch.hooks.{apply,revert}`
279   - which are a pointer to a hook function pointer.
280
281 * `.livepatch.xen_depends` - which is an ELF Note that describes what Xen
282    build-id the payload depends on. **MUST** have one.
283 * `.livepatch.depends` - which is an ELF Note that describes what the payload
284    depends on. **MUST** have one.
285 *  `.note.gnu.build-id` - the build-id of this payload. **MUST** have one.
286
287### .livepatch.funcs
288
289The `.livepatch.funcs` contains an array of livepatch_func structures
290which describe the functions to be patched:
291
292    struct livepatch_func {
293        const char *name;
294        void *new_addr;
295        void *old_addr;
296        uint32_t new_size;
297        uint32_t old_size;
298        uint8_t version;
299        uint8_t opaque[31];
300        /* Added to livepatch payload version 2: */
301        uint8_t applied;
302        uint8_t _pad[7];
303        livepatch_expectation_t expect;
304    };
305
306The size of the structure is 104 bytes on 64-bit hypervisors. It will be
30792 on 32-bit hypervisors.
308The version 2 of the payload adds additional 8 bytes to the structure size.
309
310 * `name` is the symbol name of the old function. Only used if `old_addr` is
311   zero, otherwise will be used during dynamic linking (when hypervisor loads
312   the payload).
313 * `old_addr` is the address of the function to be patched and is filled in at
314   payload generation time if hypervisor function address is known. If unknown,
315   the value *MUST* be zero and the hypervisor will attempt to resolve the
316   address.
317 * `new_addr` can either have a non-zero value or be zero.
318   * If there is a non-zero value, then it is the address of the function that
319    is replacing the old function and the address is recomputed during
320    relocation.  The value **MUST** be the address of the new function in the
321    payload file.
322   * If the value is zero, then we NOPing out at the `old_addr` location
323    `new_size` bytes.
324 * `old_size` contains the sizes of the respective `old_addr` function in
325    bytes.  The value of `old_size` **MUST** not be zero.
326 * `new_size` depends on what `new_addr` contains:
327   * If `new_addr` contains an non-zero value, then `new_size` has the size of
328    the new function (which will replace the one at `old_addr`) in bytes.
329   * If the value of `new_addr` is zero then `new_size` determines how many
330    instruction bytes to NOP (up to opaque size modulo smallest platform
331    instruction - 1 byte x86 and 4 bytes on ARM).
332 * `version` indicates version of the generated payload.
333 * `opaque` **MUST** be zero.
334
335The version 2 of the payload adds the following fields to the structure:
336
337  * `applied` tracks function's applied/reverted state. It has a boolean type
338    either LIVEPATCH_FUNC_NOT_APPLIED or LIVEPATCH_FUNC_APPLIED.
339  * `_pad[7]` adds padding to align to 8 bytes.
340  * `expect` is an optional structure containing expected to-be-replaced data
341    (mostly for inline asm patching). The `expect` structure format is:
342
343    struct livepatch_expectation {
344        uint8_t enabled : 1;
345        uint8_t len : 5;
346        uint8_t rsv: 2;
347        uint8_t data[LIVEPATCH_OPAQUE_SIZE]; /* Same size as opaque[] buffer of
348                                            struct livepatch_func. This is the
349                                            max number of bytes to be patched */
350    };
351    typedef struct livepatch_expectation livepatch_expectation_t;
352
353    * `enabled` allows to enable the expectation check for given function.
354      Default state is disabled.
355    * `len` specifies the number of valid bytes in `data` array. 5 bits is
356      enough to specify values up to 32 (of bytes), which is above the array
357      size.
358    * `rsv` reserved bitfields. **MUST** be zero.
359    * `data` contains expected bytes of content to be replaced. Same size as
360      `opaque` buffer of `struct livepatch_func` (max number of bytes to be
361      patched).
362
363The size of the `livepatch_func` array is determined from the ELF section
364size.
365
366When applying the patch the hypervisor iterates over each `livepatch_func`
367structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
368The `new_addr` is altered when the ELF payload is loaded.
369
370When reverting a patch, the hypervisor iterates over each `livepatch_func`
371and the core code copies the data from the undo buffer (private internal copy)
372to `old_addr`.
373
374It optionally may contain the address of hooks to be called right before
375being applied and after being reverted (while all CPUs are still in quiescing
376zone). These hooks do not have access to payload structure.
377
378 * `.livepatch.hooks.load` - an array of function pointers.
379 * `.livepatch.hooks.unload` - an array of function pointers.
380
381It optionally may also contain the address of pre- and post- vetoing hooks to
382be called before (pre) or after (post) apply and revert payload actions (while
383all CPUs are already released from quiescing zone). These hooks do have
384access to payload structure. The pre-apply hook can prevent from loading the
385payload if encoded in it condition is not met. Accordingly, the pre-revert
386hook can prevent from unloading the livepatch if encoded in it condition is not
387met.
388
389 * `.livepatch.hooks.{preapply,postapply}`
390 * `.livepatch.hooks.{prerevert,postrevert}`
391   - which are a pointer to a single hook function pointer.
392
393Finally, it optionally may also contain the address of apply or revert action
394hooks to be called instead of the default apply and revert payload actions
395(while all CPUs are kept in quiescing zone). These hooks do have access to
396payload structure.
397
398 * `.livepatch.hooks.{apply,revert}`
399   - which are a pointer to a single hook function pointer.
400
401### Example of .livepatch.funcs
402
403A simple example of what a payload file can be:
404
405    /* MUST be in sync with hypervisor. */
406    struct livepatch_func {
407        const char *name;
408        void *new_addr;
409        void *old_addr;
410        uint32_t new_size;
411        uint32_t old_size;
412        uint8_t version;
413        uint8_t pad[31];
414        /* Added to livepatch payload version 2: */
415        uint8_t applied;
416        uint8_t _pad[7];
417        livepatch_expectation_t expect;
418    };
419
420    /* Our replacement function for xen_extra_version. */
421    const char *xen_hello_world(void)
422    {
423        return "Hello World";
424    }
425
426    static unsigned char patch_this_fnc[] = "xen_extra_version";
427
428    struct livepatch_func livepatch_hello_world = {
429        .version = LIVEPATCH_PAYLOAD_VERSION,
430        .name = patch_this_fnc,
431        .new_addr = xen_hello_world,
432        .old_addr = (void *)0xffff82d08013963c, /* Extracted from xen-syms. */
433        .new_size = 13, /* To be be computed by scripts. */
434        .old_size = 13, /* -----------""---------------  */
435        /* Added to livepatch payload version 2: */
436        .expect = { /* All fields to be filled manually */
437            .enabled = 1,
438            .len = 5,
439            .rsv = 0,
440            .data = { 0x48, 0x8d, 0x05, 0x33, 0x1C }
441        },
442    } __attribute__((__section__(".livepatch.funcs")));
443
444Code must be compiled with `-fPIC`.
445
446### Hooks
447
448#### .livepatch.hooks.load and .livepatch.hooks.unload
449
450This section contains an array of function pointers to be executed
451before payload is being applied (.livepatch.funcs) or after reverting
452the payload. This is useful to prepare data structures that need to
453be modified patching.
454
455Each entry in this array is eight bytes.
456
457The type definition of the function are as follow:
458
459    typedef void (*livepatch_loadcall_t)(void);
460    typedef void (*livepatch_unloadcall_t)(void);
461
462#### .livepatch.hooks.preapply
463
464This section contains a pointer to a single function pointer to be executed
465before apply action is scheduled (and thereby before CPUs are put into
466quiescing zone). This is useful to prevent from applying a payload when
467certain expected conditions aren't met or when mutating actions implemented
468in the hook fail or cannot be executed.
469This type of hooks do have access to payload structure.
470
471Each entry in this array is eight bytes.
472
473The type definition of the function are as follow:
474
475    typedef int livepatch_precall_t(livepatch_payload_t *arg);
476
477#### .livepatch.hooks.postapply
478
479This section contains a pointer to a single function pointer to be executed
480after apply action has finished and after all CPUs left the quiescing zone.
481This is useful to provide an ability to follow up on actions performed by
482the preapply hook. Especially, when module application was successful or to
483be able to undo certain preparation steps of the preapply hook in case of a
484failure. The success/failure error code is provided to the postapply hooks
485via the `rc` field of the payload structure.
486This type of hooks do have access to payload structure.
487
488Each entry in this array is eight bytes.
489
490The type definition of the function are as follow:
491
492    typedef void livepatch_postcall_t(livepatch_payload_t *arg);
493
494#### .livepatch.hooks.prerevert
495
496This section contains a pointer to a single function pointer to be executed
497before revert action is scheduled (and thereby before CPUs are put into
498quiescing zone). This is useful to prevent from reverting a payload when
499certain expected conditions aren't met or when mutating actions implemented
500in the hook fail or cannot be executed.
501This type of hooks do have access to payload structure.
502
503Each entry in this array is eight bytes.
504
505The type definition of the function are as follow:
506
507    typedef int livepatch_precall_t(livepatch_payload_t *arg);
508
509#### .livepatch.hooks.postrevert
510
511This section contains a pointer to a single function pointer to be executed
512after revert action has finished and after all CPUs left the quiescing zone.
513This is useful to provide an ability to perform cleanup of all previously
514executed mutating actions in order to restore the original system state from
515before the current payload application. The success/failure error code is
516provided to the postrevert hook via the `rc` field of the payload structure.
517This type of hooks do have access to payload structure.
518
519Each entry in this array is eight bytes.
520
521The type definition of the function are as follow:
522
523    typedef void livepatch_postcall_t(livepatch_payload_t *arg);
524
525#### .livepatch.hooks.apply and .livepatch.hooks.revert
526
527This section contains a pointer to a single function pointer to be executed
528instead of a default apply (or revert) action function. This is useful to
529replace or augment default behavior of the apply (or revert) action that
530requires all CPUs to be in the quiescing zone.
531This type of hooks do have access to payload structure.
532
533Each entry in this array is eight bytes.
534
535The type definition of the function are as follow:
536
537    typedef int livepatch_actioncall_t(livepatch_payload_t *arg);
538
539### .livepatch.xen_depends, .livepatch.depends and .note.gnu.build-id
540
541To support dependencies checking and safe loading (to load the
542appropiate payload against the right hypervisor) there is a need
543to embbed an build-id dependency.
544
545This is done by the payload containing sections `.livepatch.xen_depends`
546and `.livepatch.depends` which follow the format of an ELF Note.
547The contents of these (name, and description) are specific to the linker
548utilized to build the hypevisor and payload.
549
550If GNU linker is used then the name is `GNU` and the description
551is a NT_GNU_BUILD_ID type ID. The description can be an SHA1
552checksum, MD5 checksum or any unique value.
553
554The size of these structures varies with the `--build-id` linker option.
555
556There are two kinds of build-id dependencies:
557
558 * Xen build-id dependency (.livepatch.xen_depends section)
559 * previous payload build-id dependency (.livepatch.depends section)
560
561See "Live patch interdependencies" for more information.
562
563## Hypercalls
564
565We will employ the sub operations of the system management hypercall (sysctl).
566There are to be four sub-operations:
567
568 * upload the payloads.
569 * listing of payloads summary uploaded and their state.
570 * getting an particular payload summary and its state.
571 * command to apply, delete, or revert the payload.
572
573Most of the actions are asynchronous therefore the caller is responsible
574to verify that it has been applied properly by retrieving the summary of it
575and verifying that there are no error codes associated with the payload.
576
577We **MUST** make some of them asynchronous due to the nature of patching
578it requires every physical CPU to be lock-step with each other.
579The patching mechanism while an implementation detail, is not an short
580operation and as such the design **MUST** assume it will be an long-running
581operation.
582
583The sub-operations will spell out how preemption is to be handled (if at all).
584
585Furthermore it is possible to have multiple different payloads for the same
586function. As such an unique name per payload has to be visible to allow proper manipulation.
587
588The hypercall is part of the `xen_sysctl`. The top level structure contains
589one uint32_t to determine the sub-operations and one padding field which
590*MUST* always be zero.
591
592    struct xen_sysctl_livepatch_op {
593        uint32_t cmd;                   /* IN: XEN_SYSCTL_LIVEPATCH_*. */
594        uint32_t pad;                   /* IN: Always zero. */
595	    union {
596              ... see below ...
597            } u;
598    };
599
600while the rest of hypercall specific structures are part of the this structure.
601
602### Basic type: struct xen_livepatch_name
603
604Most of the hypercalls employ an shared structure called `struct xen_livepatch_name`
605which contains:
606
607 * `name` - pointer where the string for the name is located.
608 * `size` - the size of the string
609 * `pad` - padding - to be zero.
610
611The structure is as follow:
612
613    /*
614     *  Uniquely identifies the payload.  Should be human readable.
615     * Includes the NUL terminator
616     */
617    #define XEN_LIVEPATCH_NAME_SIZE 128
618    struct xen_livepatch_name {
619        XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */
620        uint16_t size;                          /* IN, size of name. May be upto
621                                                   XEN_LIVEPATCH_NAME_SIZE. */
622        uint16_t pad[3];                        /* IN: MUST be zero. */
623    };
624
625### XEN_SYSCTL_LIVEPATCH_UPLOAD (0)
626
627Upload a payload to the hypervisor. The payload is verified
628against basic checks and if there are any issues the proper return code
629will be returned. The payload is not applied at this time - that is
630controlled by *XEN_SYSCTL_LIVEPATCH_ACTION*.
631
632The caller provides:
633
634 * A `struct xen_livepatch_name` called `name` which has the unique name.
635 * `size` the size of the ELF payload (in bytes).
636 * `payload` the virtual address of where the ELF payload is.
637
638The `name` could be an UUID that stays fixed forever for a given
639payload. It can be embedded into the ELF payload at creation time
640and extracted by tools.
641
642The return value is zero if the payload was succesfully uploaded.
643Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported.
644
645The `payload` is the ELF payload as mentioned in the `Payload format` section.
646
647The structure is as follow:
648
649    struct xen_sysctl_livepatch_upload {
650        xen_livepatch_name_t name;          /* IN, name of the patch. */
651        uint64_t size;                      /* IN, size of the ELF file. */
652        XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */
653    };
654
655### XEN_SYSCTL_LIVEPATCH_GET (1)
656
657Retrieve an status of an specific payload. This caller provides:
658
659 * A `struct xen_livepatch_name` called `name` which has the unique name.
660 * A `struct xen_livepatch_status` structure. The member values will
661   be over-written upon completion.
662
663Upon completion the `struct xen_livepatch_status` is updated.
664
665 * `status` - indicates the current status of the payload:
666   * *LIVEPATCH_STATUS_CHECKED* (1) loaded and the ELF payload safety checks passed.
667   * *LIVEPATCH_STATUS_APPLIED* (2) loaded, checked, and applied.
668   *  No other value is possible.
669 * `rc` - -XEN_EXX type errors encountered while performing the last
670   LIVEPATCH_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
671   respectively mean: success or operation in progress. Other values
672   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
673   have changed.
674
675The return value of the hypercall is zero on success and -XEN_EXX on failure.
676(Note that the `rc` value can be different from the return value, as in
677rc = -XEN_EAGAIN and return value can be 0).
678
679For example, supposing there is an payload:
680
681    status: LIVEPATCH_STATUS_CHECKED
682    rc: 0
683
684We apply an action - LIVEPATCH_ACTION_REVERT - to revert it (which won't work
685as we have not even applied it. Afterwards we will have:
686
687    status: LIVEPATCH_STATUS_CHECKED
688    rc: -XEN_EINVAL
689
690It has failed but it remains loaded.
691
692This operation is synchronous and does not require preemption.
693
694The structure is as follow:
695
696    struct xen_livepatch_status {
697    #define LIVEPATCH_STATUS_CHECKED      1
698    #define LIVEPATCH_STATUS_APPLIED      2
699        uint32_t state;                 /* OUT: LIVEPATCH_STATE_*. */
700        int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */
701    };
702
703    struct xen_sysctl_livepatch_get {
704        xen_livepatch_name_t name;      /* IN, the name of the payload. */
705        xen_livepatch_status_t status;  /* IN/OUT: status of the payload. */
706    };
707
708### XEN_SYSCTL_LIVEPATCH_LIST (2)
709
710Retrieve an array of abbreviated status, names and metadata of payloads that are
711loaded in the hypervisor.
712
713The caller provides:
714
715 * `version`. Version of the payload. Caller should re-use the field provided by
716    the hypervisor. If the value differs the data is stale.
717 * `idx` Index iterator. The index into the hypervisor's payload count. It is
718    recommended that on first invocation zero be used so that `nr` (which the
719    hypervisor will update with the remaining payload count) be provided.
720    Also the hypervisor will provide `version` with the most current value,
721    calculated total size of all payloads' names and calculated total size of
722    all payload's metadata.
723 * `nr` The max number of entries to populate. Can be zero which will result
724    in the hypercall being a probing one and return the number of payloads
725    (and update the `version`).
726 * `pad` - *MUST* be zero.
727 * `status` Virtual address of where to write `struct xen_livepatch_status`
728   structures. Caller *MUST* allocate up to `nr` of them.
729 * `name` - Virtual address of where to write the unique name of the payloads.
730   Caller *MUST* allocate enough space to be able to store all received data
731   (i.e. total allocated space *MUST* match the `name_total_size` value
732   provided by the hypervisor). Individual payload name cannot be longer than
733   **XEN_LIVEPATCH_NAME_SIZE** bytes. Note that **XEN_LIVEPATCH_NAME_SIZE**
734   includes the NUL terminator.
735 * `len` - Virtual address of where to write the length of each unique name
736   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
737   of sizeof(uint32_t) (4 bytes).
738 * `metadata` - Virtual address of where to write the metadata of the payloads.
739   Caller *MUST* allocate enough space to be able to store all received data
740   (i.e. total allocated space *MUST* match the `metadata_total_size` value
741   provided by the hypervisor). Individual payload metadata string can be of
742   arbitrary length. The metadata string format is: key=value\\0...key=value\\0.
743 * `metadata_len` - Virtual address of where to write the length of each metadata
744   string of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST*
745   be of sizeof(uint32_t) (4 bytes).
746
747If the hypercall returns an positive number, it is the number (upto `nr`
748provided to the hypercall) of the payloads returned, along with `nr` updated
749with the number of remaining payloads, `version` updated (it may be the same
750across hypercalls - if it varies the data is stale and further calls could
751fail), `name_total_size` and `metadata_total_size` containing total sizes of
752transferred data for both the arrays.
753The `status`, `name`, `len`, `metadata` and `metadata_len` are updated at their
754designed index value (`idx`) with the returned value of data.
755
756If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
757lowered.
758
759If the hypercall returns an zero value there are no more payloads.
760
761Note that due to the asynchronous nature of hypercalls the control domain might
762have added or removed a number of payloads making this information stale. It is
763the responsibility of the toolstack to use the `version` field to check
764between each invocation. if the version differs it should discard the stale
765data and start from scratch. It is OK for the toolstack to use the new
766`version` field.
767
768The `struct xen_livepatch_status` structure contains an status of payload which includes:
769
770 * `status` - indicates the current status of the payload:
771   * *LIVEPATCH_STATUS_CHECKED* (1) loaded and the ELF payload safety checks passed.
772   * *LIVEPATCH_STATUS_APPLIED* (2) loaded, checked, and applied.
773   *  No other value is possible.
774 * `rc` - -XEN_EXX type errors encountered while performing the last
775   LIVEPATCH_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
776   respectively mean: success or operation in progress. Other values
777   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
778   have changed.
779
780The structure is as follow:
781
782    struct xen_sysctl_livepatch_list {
783        uint32_t version;                       /* OUT: Hypervisor stamps value.
784                                                   If varies between calls, we are
785                                                   getting stale data. */
786        uint32_t idx;                           /* IN: Index into hypervisor list. */
787        uint32_t nr;                            /* IN: How many status, names, and len
788                                                   should be filled out. Can be zero to get
789                                                   amount of payloads and version.
790                                                   OUT: How many payloads left. */
791        uint32_t pad;                           /* IN: Must be zero. */
792        uint32_t name_total_size;               /* OUT: Total size of all transfer names */
793        uint32_t metadata_total_size;           /* OUT: Total size of all transfer metadata */
794        XEN_GUEST_HANDLE_64(xen_livepatch_status_t) status;  /* OUT. Must have enough
795                                                   space allocate for nr of them. */
796        XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
797                                                   may have an arbitrary length up to
798                                                   XEN_LIVEPATCH_NAME_SIZE bytes. Must have
799                                                   nr of them. */
800        XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
801                                                   Must have nr of them. */
802        XEN_GUEST_HANDLE_64(char) metadata;     /* OUT: Array of metadata strings. Each
803                                                   member may have an arbitrary length.
804                                                   Must have nr of them. */
805        XEN_GUEST_HANDLE_64(uint32) metadata_len;  /* OUT: Array of lengths of metadata's.
806                                                      Must have nr of them. */
807
808    };
809
810### XEN_SYSCTL_LIVEPATCH_ACTION (3)
811
812Perform an operation on the payload structure referenced by the `name` field.
813The operation request is asynchronous and the status should be retrieved
814by using either **XEN_SYSCTL_LIVEPATCH_GET** or **XEN_SYSCTL_LIVEPATCH_LIST** hypercall.
815
816The caller provides:
817
818 * A `struct xen_livepatch_name` `name` containing the unique name.
819 * `cmd` The command requested:
820  * *LIVEPATCH_ACTION_UNLOAD* (1) Unload the payload.
821   Any further hypercalls against the `name` will result in failure unless
822   **XEN_SYSCTL_LIVEPATCH_UPLOAD** hypercall is perfomed with same `name`.
823  * *LIVEPATCH_ACTION_REVERT* (2) Revert the payload. If the operation takes
824  more time than the upper bound of time the `rc` in `xen_livepatch_status`
825  retrieved via **XEN_SYSCTL_LIVEPATCH_GET** will be -XEN_EBUSY.
826  * *LIVEPATCH_ACTION_APPLY* (3) Apply the payload. If the operation takes
827  more time than the upper bound of time the `rc` in `xen_livepatch_status`
828  retrieved via **XEN_SYSCTL_LIVEPATCH_GET** will be -XEN_EBUSY.
829  * *LIVEPATCH_ACTION_REPLACE* (4) Revert all applied payloads and apply this
830  payload. If the operation takes more time than the upper bound of time
831  the `rc` in `xen_livepatch_status` retrieved via **XEN_SYSCTL_LIVEPATCH_GET**
832  will be -XEN_EBUSY.
833 * `time` The upper bound of time (ns) the cmd should take. Zero means to use
834   the hypervisor default. If within the time the operation does not succeed
835   the operation would go in error state.
836 * `flags` provides additional parameters for an action:
837  * *LIVEPATCH_ACTION_APPLY_NODEPS* (1) Apply action ignores inter-module
838  buildid dependency. Checks only if module is built for given hypervisor by
839  comparing buildid.
840 * `pad` - *MUST* be zero.
841
842The return value will be zero unless the provided fields are incorrect.
843
844The structure is as follow:
845
846    #define LIVEPATCH_ACTION_UNLOAD  1
847    #define LIVEPATCH_ACTION_REVERT  2
848    #define LIVEPATCH_ACTION_APPLY   3
849    #define LIVEPATCH_ACTION_REPLACE 4
850    struct xen_sysctl_livepatch_action {
851        xen_livepatch_name_t name;              /* IN, name of the patch. */
852        uint32_t cmd;                           /* IN: LIVEPATCH_ACTION_* */
853        uint32_t time;                          /* IN: If zero then uses */
854                                                /* hypervisor default. */
855                                                /* Or upper bound of time (ns) */
856                                                /* for operation to take. */
857        uint32_t flags;                         /* IN: action flags. */
858                                                /* Provide additional parameters */
859                                                /* for an action. */
860        uint32_t pad;                           /* IN: Always zero. */
861    };
862
863
864## State diagrams of LIVEPATCH_ACTION commands.
865
866There is a strict ordering state of what the commands can be.
867The LIVEPATCH_ACTION prefix has been dropped to easy reading and
868does not include the LIVEPATCH_STATES:
869
870                 /->\
871                 \  /
872    UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\
873                   \                                  |
874                    \-------------------<-------------/
875
876## State transition table of LIVEPATCH_ACTION commands and LIVEPATCH_STATUS.
877
878Note that:
879
880 - The CHECKED state is the starting one achieved with *XEN_SYSCTL_LIVEPATCH_UPLOAD* hypercall.
881 - The REVERT operation on success will automatically move to the CHECKED state.
882 - There are two STATES: CHECKED and APPLIED.
883 - There are four actions (aka commands): APPLY, REPLACE, REVERT, and UNLOAD.
884
885The state transition table of valid states and action states:
886
887    +---------+---------+--------------------------------+-------+--------+
888    | ACTION  | Current | Result                         |   Next STATE:  |
889    | ACTION  | STATE   |                                |CHECKED|APPLIED |
890    +---------+----------+-------------------------------+-------+--------+
891    | UNLOAD  | CHECKED | Unload payload. Always works.  |       |        |
892    |         |         | No next states.                |       |        |
893    +---------+---------+--------------------------------+-------+--------+
894    | APPLY   | CHECKED | Apply payload (success).       |       |   x    |
895    +---------+---------+--------------------------------+-------+--------+
896    | APPLY   | CHECKED | Apply payload (error|timeout)  |   x   |        |
897    +---------+---------+--------------------------------+-------+--------+
898    | REPLACE | CHECKED | Revert payloads and apply new  |       |   x    |
899    |         |         | payload with success.          |       |        |
900    +---------+---------+--------------------------------+-------+--------+
901    | REPLACE | CHECKED | Revert payloads and apply new  |   x   |        |
902    |         |         | payload with error.            |       |        |
903    +---------+---------+--------------------------------+-------+--------+
904    | REVERT  | APPLIED | Revert payload (success).      |   x   |        |
905    +---------+---------+--------------------------------+-------+--------+
906    | REVERT  | APPLIED | Revert payload (error|timeout) |       |   x    |
907    +---------+---------+--------------------------------+-------+--------+
908
909All the other state transitions are invalid.
910
911## Sequence of events.
912
913The normal sequence of events is to:
914
915 1. *XEN_SYSCTL_LIVEPATCH_UPLOAD* to upload the payload. If there are errors *STOP* here.
916 2. *XEN_SYSCTL_LIVEPATCH_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
917 3. *XEN_SYSCTL_LIVEPATCH_ACTION* with *LIVEPATCH_ACTION_APPLY* to apply the patch.
918 4. *XEN_SYSCTL_LIVEPATCH_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success.
919
920
921## Addendum
922
923Implementation quirks should not be discussed in a design document.
924
925However these observations can provide aid when developing against this
926document.
927
928
929### Alternative assembler
930
931Alternative assembler is a mechanism to use different instructions depending
932on what the CPU supports. This is done by providing multiple streams of code
933that can be patched in - or if the CPU does not support it - padded with
934`nop` operations. The alternative assembler macros cause the compiler to
935expand the code to place a most generic code in place - emit a special
936ELF .section header to tag this location. During run-time the hypervisor
937can leave the areas alone or patch them with an better suited opcodes.
938
939Note that patching functions that copy to or from guest memory requires
940to support alternative support. For example this can be due to SMAP
941(specifically *stac* and *clac* operations) which is enabled on Broadwell
942and later architectures. It may be related to other alternative instructions.
943
944### When to patch
945
946During the discussion on the design two candidates bubbled where
947the call stack for each CPU would be deterministic. This would
948minimize the chance of the patch not being applied due to safety
949checks failing. Safety checks such as not patching code which
950is on the stack - which can lead to corruption.
951
952#### Rendezvous code instead of stop_machine for patching
953
954The hypervisor's time rendezvous code runs synchronously across all CPUs
955every second. Using the `stop_machine` to patch can stall the time rendezvous
956code and result in NMI. As such having the patching be done at the tail
957of rendezvous code should avoid this problem.
958
959However the entrance point for that code is `do_softirq ->
960timer_softirq_action -> time_calibration` which ends up calling
961`on_selected_cpus` on remote CPUs.
962
963The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
964desired function.
965
966#### Before entering the guest code.
967
968Before we call VMXResume we check whether any soft IRQs need to be executed.
969This is a good spot because all Xen stacks are effectively empty at
970that point.
971
972To randezvous all the CPUs an barrier with an maximum timeout (which
973could be adjusted), combined with forcing all other CPUs through the
974hypervisor with IPIs, can be utilized to execute lockstep instructions
975on all CPUs.
976
977The approach is similar in concept to `stop_machine` and the time rendezvous
978but is time-bound. However the local CPU stack is much shorter and
979a lot more deterministic.
980
981This is implemented in the Xen hypervisor.
982
983### Compiling the hypervisor code
984
985Hotpatch generation often requires support for compiling the target
986with `-ffunction-sections` / `-fdata-sections`.  Changes would have to
987be done to the linker scripts to support this.
988
989### Generation of Live Patch ELF payloads
990
991The design of that is not discussed in this design.
992
993This is implemented in a seperate tool which lives in a seperate
994GIT repo.
995
996Currently it resides at git://xenbits.xen.org/livepatch-build-tools.git
997
998### Exception tables and symbol tables growth
999
1000We may need support for adapting or augmenting exception tables if
1001patching such code.  Hotpatches may need to bring their own small
1002exception tables (similar to how Linux modules support this).
1003
1004If supporting hotpatches that introduce additional exception-locations
1005is not important, one could also change the exception table in-place
1006and reorder it afterwards.
1007
1008As found almost every patch (XSA) to a non-trivial function requires
1009additional entries in the exception table and/or the bug frames.
1010
1011This is implemented in the Xen hypervisor.
1012
1013### .rodata sections
1014
1015The patching might require strings to be updated as well. As such we must be
1016also able to patch the strings as needed. This sounds simple - but the compiler
1017has a habit of coalescing strings that are the same - which means if we in-place
1018alter the strings - other users will be inadvertently affected as well.
1019
1020This is also where pointers to functions live - and we may need to patch this
1021as well. And switch-style jump tables.
1022
1023To guard against that we must be prepared to do patching similar to
1024trampoline patching or in-line depending on the flavour. If we can
1025do in-line patching we would need to:
1026
1027 * Alter `.rodata` to be writeable.
1028 * Inline patch.
1029 * Alter `.rodata` to be read-only.
1030
1031If are doing trampoline patching we would need to:
1032
1033 * Allocate a new memory location for the string.
1034 * All locations which use this string will have to be updated to use the
1035   offset to the string.
1036 * Mark the region RO when we are done.
1037
1038The trampoline patching is implemented in the Xen hypervisor.
1039
1040### .bss and .data sections.
1041
1042In place patching writable data is not suitable as it is unclear what should be done
1043depending on the current state of data. As such it should not be attempted.
1044
1045However, functions which are being patched can bring in changes to strings
1046(.data or .rodata section changes), or even to .bss sections.
1047
1048As such the ELF payload can introduce new .rodata, .bss, and .data sections.
1049Patching in the new function will end up also patching in the new .rodata
1050section and the new function will reference the new string in the new
1051.rodata section.
1052
1053This is implemented in the Xen hypervisor.
1054
1055### Security
1056
1057Only the privileged domain should be allowed to do this operation.
1058
1059### Live patch interdependencies
1060
1061Live patch patches interdependencies are tricky.
1062
1063There are the ways this can be addressed:
1064 * A single large patch that subsumes and replaces all previous ones.
1065   Over the life-time of patching the hypervisor this large patch
1066   grows to accumulate all the code changes.
1067 * Hotpatch stack - where an mechanism exists that loads the hotpatches
1068   in the same order they were built in. We would need an build-id
1069   of the hypevisor to make sure the hot-patches are build against the
1070   correct build.
1071 * Payload containing the old code to check against that. That allows
1072   the hotpatches to be loaded indepedently (if they don't overlap) - or
1073   if the old code also containst previously patched code - even if they
1074   overlap.
1075
1076The disadvantage of the first large patch is that it can grow over
1077time and not provide an bisection mechanism to identify faulty patches.
1078
1079The hot-patch stack puts stricts requirements on the order of the patches
1080being loaded and requires an hypervisor build-id to match against.
1081
1082The old code allows much more flexibility and an additional guard,
1083but is more complex to implement.
1084
1085The second option which requires an build-id of the hypervisor
1086is implemented in the Xen hypervisor.
1087
1088Specifically each payload has three build-id ELF notes:
1089 * The build-id of the payload itself (generated via --build-id).
1090 * The build-id of the Xen hypervisor it depends on (extracted from the
1091   hypervisor during build time).
1092 * The build-id of the payload it depends on (extracted from the
1093   the previous payload or hypervisor during build time).
1094
1095This means that every payload depends on the hypervisor build-id and on
1096the build-id of the previous payload in the stack.
1097The very first payload depends on the hypervisor build-id only.
1098
1099# Not Yet Done
1100
1101This is for further development of live patching.
1102
1103## TODO Goals
1104
1105The implementation must also have a mechanism for (in no particular order):
1106
1107 * Be able to lookup in the Xen hypervisor the symbol names of functions from the
1108   ELF payload. (Either as `symbol` or `symbol`+`offset`).
1109 * Be able to patch .rodata, .bss, and .data sections.
1110 * Deal with NMI/MCE checks during patching instead of ignoring them.
1111 * Further safety checks (blacklist of which functions cannot be patched, check
1112   the stack, make sure the payload is built with same compiler as hypervisor).
1113   Specifically we want to make sure that live patching codepaths cannot be patched.
1114 * NOP out the code sequence if `new_size` is zero.
1115 * Deal with other relocation types:  `R_X86_64_[8,16,32,32S]`, `R_X86_64_PC[8,16,64]`
1116   in payload file.
1117
1118### Handle inlined \__LINE__
1119
1120This problem is related to hotpatch construction
1121and potentially has influence on the design of the hotpatching
1122infrastructure in Xen.
1123
1124For example:
1125
1126We have file1.c with functions f1 and f2 (in that order).  f2 contains a
1127BUG() (or WARN()) macro and at that point embeds the source line number
1128into the generated code for f2.
1129
1130Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
1131lines to f1 and as a consequence shifts out f2 by two lines.  The newly
1132constructed file1.o will now contain differences in both binary
1133functions f1 (because we actually changed it with the applied patch) and
1134f2 (because the contained BUG macro embeds the new line number).
1135
1136Without additional information, an algorithm comparing file1.o before
1137and after hotpatch application will determine both functions to be
1138changed and will have to include both into the binary hotpatch.
1139
1140Options:
1141
11421. Transform source code patches for hotpatches to be line-neutral for
1143   each chunk.  This can be done in almost all cases with either
1144   reformatting of the source code or by introducing artificial
1145   preprocessor "#line n" directives to adjust for the introduced
1146   differences.
1147
1148   This approach is low-tech and simple.  Potentially generated
1149   backtraces and existing debug information refers to the original
1150   build and does not reflect hotpatching state except for actually
1151   hotpatched functions but should be mostly correct.
1152
11532. Ignoring the problem and living with artificially large hotpatches
1154   that unnecessarily patch many functions.
1155
1156   This approach might lead to some very large hotpatches depending on
1157   content of specific source file.  It may also trigger pulling in
1158   functions into the hotpatch that cannot reasonable be hotpatched due
1159   to limitations of a hotpatching framework (init-sections, parts of
1160   the hotpatching framework itself, ...) and may thereby prevent us
1161   from patching a specific problem.
1162
1163   The decision between 1. and 2. can be made on a patch--by-patch
1164   basis.
1165
11663. Introducing an indirection table for storing line numbers and
1167   treating that specially for binary diffing. Linux may follow
1168   this approach.
1169
1170   We might either use this indirection table for runtime use and patch
1171   that with each hotpatch (similarly to exception tables) or we might
1172   purely use it when building hotpatches to ignore functions that only
1173   differ at exactly the location where a line-number is embedded.
1174
1175For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
1176the function itself.
1177
1178Similar considerations are true to a lesser extent for \__FILE__, but it
1179could be argued that file renaming should be done outside of hotpatches.
1180
1181## Signature checking requirements.
1182
1183The signature checking requires that the layout of the data in memory
1184**MUST** be same for signature to be verified. This means that the payload
1185data layout in ELF format **MUST** match what the hypervisor would be
1186expecting such that it can properly do signature verification.
1187
1188The signature is based on the all of the payloads continuously laid out
1189in memory. The signature is to be appended at the end of the ELF payload
1190prefixed with the string '`~Module signature appended~\n`', followed by
1191an signature header then followed by the signature, key identifier, and signers
1192name.
1193
1194Specifically the signature header would be:
1195
1196    #define PKEY_ALGO_DSA       0
1197    #define PKEY_ALGO_RSA       1
1198
1199    #define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
1200    #define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
1201
1202    #define HASH_ALGO_MD4          0
1203    #define HASH_ALGO_MD5          1
1204    #define HASH_ALGO_SHA1         2
1205    #define HASH_ALGO_RIPE_MD_160  3
1206    #define HASH_ALGO_SHA256       4
1207    #define HASH_ALGO_SHA384       5
1208    #define HASH_ALGO_SHA512       6
1209    #define HASH_ALGO_SHA224       7
1210    #define HASH_ALGO_RIPE_MD_128  8
1211    #define HASH_ALGO_RIPE_MD_256  9
1212    #define HASH_ALGO_RIPE_MD_320 10
1213    #define HASH_ALGO_WP_256      11
1214    #define HASH_ALGO_WP_384      12
1215    #define HASH_ALGO_WP_512      13
1216    #define HASH_ALGO_TGR_128     14
1217    #define HASH_ALGO_TGR_160     15
1218    #define HASH_ALGO_TGR_192     16
1219
1220    struct elf_payload_signature {
1221	    u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */
1222	    u8	hash;		/* Digest algorithm: HASH_ALGO_*. */
1223	    u8	id_type;	/* Key identifier type PKEY_ID*. */
1224	    u8	signer_len;	/* Length of signer's name */
1225	    u8	key_id_len;	/* Length of key identifier */
1226	    u8	__pad[3];
1227	    __be32	sig_len;	/* Length of signature data */
1228    };
1229
1230(Note that this has been borrowed from Linux module signature code.).
1231
1232
1233### .bss and .data sections.
1234
1235In place patching writable data is not suitable as it is unclear what should be done
1236depending on the current state of data. As such it should not be attempted.
1237
1238That said we should provide hook functions so that the existing data
1239can be changed during payload application.
1240
1241To guarantee safety we disallow re-applying an payload after it has been
1242reverted. This is because we cannot guarantee that the state of .bss
1243and .data to be exactly as it was during loading. Hence the administrator
1244MUST unload the payload and upload it again to apply it.
1245
1246There is an exception to this: if the payload only has .livepatch.funcs;
1247and the .data or .bss sections are of zero length.
1248
1249### Inline patching
1250
1251The hypervisor should verify that the in-place patching would fit within
1252the code or data.
1253
1254### Trampoline (e9 opcode), x86
1255
1256The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
1257we are limited to up to 2GB of virtual address to place the new code
1258from the old code. That should not be a problem since Xen hypervisor has
1259a very small footprint.
1260
1261However if we need - we can always add two trampolines. One at the 2GB
1262limit that calls the next trampoline.
1263
1264Please note there is a small limitation for trampolines in
1265function entries: The target function (+ trailing padding) must be able
1266to accomodate the trampoline. On x86 with +-2 GB relative jumps,
1267this means 5 bytes are required which means that `old_size` **MUST** be
1268at least five bytes if patching in trampoline.
1269
1270Depending on compiler settings, there are several functions in Xen that
1271are smaller (without inter-function padding).
1272
1273    readelf -sW xen-syms | grep " FUNC " | \
1274        awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
1275
1276    ...
1277    3 FUNC LOCAL wbinvd_ipi
1278    3 FUNC LOCAL shadow_l1_index
1279    ...
1280
1281A compile-time check for, e.g., a minimum alignment of functions or a
1282runtime check that verifies symbol size (+ padding to next symbols) for
1283that in the hypervisor is advised.
1284
1285The tool for generating payloads currently does perform a compile-time
1286check to ensure that the function to be replaced is large enough.
1287
1288#### Trampoline, ARM
1289
1290The unconditional branch instruction (for the encoding see the
1291DDI 0406C.c and DDI 0487A.j Architecture Reference Manual's).
1292with proper offset is used for an unconditional branch to the new code.
1293This means that that `old_size` **MUST** be at least four bytes if patching
1294in trampoline.
1295
1296The instruction offset is limited on ARM32 to +/- 32MB to displacement
1297and on ARM64 to +/- 128MB displacement.
1298
1299The new code is placed in the 8M - 10M virtual address space while the
1300Xen code is in 2M - 4M. That gives us enough space.
1301
1302The hypervisor also checks the displacement during loading of the payload.
1303