1% QEMU Deprivileging / dm_restrict
2% Revision 1
3
4\clearpage
5
6# Basics
7
8---------------- ----------------------------------------------------
9         Status: **Tech Preview**
10
11Architecture(s): x86
12
13   Component(s): toolstack
14
15---------------- ----------------------------------------------------
16
17# Overview
18
19By default, the QEMU device model is run in domain 0.  If an attacker
20can gain control of a QEMU process, it could easily take control of a
21system.
22
23dm_restrict is a set of operations to restrict QEMU running in domain
240.  It consists of two halves:
25
26 1. Mechanisms to restrict QEMU to only being able to affect its own
27domain
28 2. Mechanisms to restruct QEMU's ability to interact with domain 0.
29
30# User details
31
32## Getting the right versions of software
33
34Linux: 4.11+
35
36Qemu: 3.0+ (Or the version that comes with Xen 4.12+)
37
38## Setting up a group and userid range
39
40For maximum security, libxl needs to run the devicemodel for each
41domain under a user id (UID) corresponding to its domain id.  There
42are 32752 possible domain IDs, and so libxl needs 32752 user ids set
43aside for it.  Setting up a group for all devicemodels to run at is
44also recommended.
45
46The simplest and most effective way to do this is to allocate a
47contiguous block of UIDs, and create a single user named
48`xen-qemuuser-range-base` with the first UID.  For example, under
49Debian:
50
51    adduser --system --uid 131072 --group --no-create-home xen-qemuuser-range-base
52
53Two comments on this method:
54
55  1. Most modern systems have 32-bit UIDs, and so can in theory go up
56to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
57UIDs however; UID 65535 is reserved for an invalid value, and 65534 is
58normally allocated to "nobody".
59  2. Additionally, some container systems have proposed using the
60upper 16 bits of the uid for a container ID.  Using a multiple of 2^16
61for the range base (as is done above) will result in all UIDs being
62interpreted by such systems as a single container ID.
63
64Another, less-secure way is to run all QEMUs as the same UID.  To do
65this, create a user named `xen-qemuuser-shared`; for example:
66
67    adduser --no-create-home --system xen-qemuuser-shared
68
69A final way to set up a separate process for qemus is to allocate one
70UID per VM, and set the UID in the domain config file with the
71`device_model_user` argument.  For example, suppose you have a VM
72named `c6-01`.  You might do the following:
73
74    adduser --system --no-create-home --group xen-qemuuser-c6-01
75
76And then in your config file, the following line:
77
78    device_model_user="xen-qemuuser-c6-01"
79
80If you use this method, you should also allocate one "reaper" user to
81be used for killing device models:
82
83    adduser --system --no-create-home --group xen-qemuuser-reaper
84
85NOTE: It is important when using `device_model_user` that EACH VM HAVE
86A SEPARATE UID, and that none of these UIDs map to root.  xl will
87throw an error a uid maps to zero, but not if multiple VMs have the
88same uid.  Multiple VMs with the same device model uid will cause
89problems.
90
91It is also important that `xen-qemuuser-reaper` not have any processes
92associated with it, as they will be destroyed when deprivileged qemu
93processes are destroyed.
94
95## Domain config changes
96
97The core domain config change is to add the following line to the
98domain configuration:
99
100    dm_restrict=1
101
102This will perform a number of restrictions, outlined below in the
103'Technical details' section.
104
105# Technical details
106
107See docs/design/qemu-deprivilege.md for technical details.
108
109# Limitations
110
111The following features still need to be implemented:
112
113* Inserting a new cdrom while the guest is running (xl cdrom-insert)
114* Support for qdisk backends
115
116A number of restrictions still need to be implemented.  A compromised
117device model may be able to do the following:
118
119* Delay or exploit weaknesses in the toolstack
120* Launch "fork bombs" or other resource exhaustion attacks
121* Make network connections on the management network
122* Break out of the restrictions after migration
123
124Additionally, getting PCI passthrough to work securely would require a
125significant rework of how passthrough works at the moment.  It may be
126implemented at some point but is not a near-term priority.
127
128See SUPPORT.md for security support status.
129
130# History
131
132------------------------------------------------------------------------
133Date       Revision Version  Notes
134---------- -------- -------- -------------------------------------------
1352018-09-14 1        Xen 4.12 Imported from docs/misc
136---------- -------- -------- -------------------------------------------
137