1% QEMU Deprivileging / dm_restrict 2% Revision 1 3 4\clearpage 5 6# Basics 7 8---------------- ---------------------------------------------------- 9 Status: **Tech Preview** 10 11Architecture(s): x86 12 13 Component(s): toolstack 14 15---------------- ---------------------------------------------------- 16 17# Overview 18 19By default, the QEMU device model is run in domain 0. If an attacker 20can gain control of a QEMU process, it could easily take control of a 21system. 22 23dm_restrict is a set of operations to restrict QEMU running in domain 240. It consists of two halves: 25 26 1. Mechanisms to restrict QEMU to only being able to affect its own 27domain 28 2. Mechanisms to restruct QEMU's ability to interact with domain 0. 29 30# User details 31 32## Getting the right versions of software 33 34Linux: 4.11+ 35 36Qemu: 3.0+ (Or the version that comes with Xen 4.12+) 37 38## Setting up a group and userid range 39 40For maximum security, libxl needs to run the devicemodel for each 41domain under a user id (UID) corresponding to its domain id. There 42are 32752 possible domain IDs, and so libxl needs 32752 user ids set 43aside for it. Setting up a group for all devicemodels to run at is 44also recommended. 45 46The simplest and most effective way to do this is to allocate a 47contiguous block of UIDs, and create a single user named 48`xen-qemuuser-range-base` with the first UID. For example, under 49Debian: 50 51 adduser --system --uid 131072 --group --no-create-home xen-qemuuser-range-base 52 53Two comments on this method: 54 55 1. Most modern systems have 32-bit UIDs, and so can in theory go up 56to 2^31 (or 2^32 if uids are unsigned). POSIX only guarantees 16-bit 57UIDs however; UID 65535 is reserved for an invalid value, and 65534 is 58normally allocated to "nobody". 59 2. Additionally, some container systems have proposed using the 60upper 16 bits of the uid for a container ID. Using a multiple of 2^16 61for the range base (as is done above) will result in all UIDs being 62interpreted by such systems as a single container ID. 63 64Another, less-secure way is to run all QEMUs as the same UID. To do 65this, create a user named `xen-qemuuser-shared`; for example: 66 67 adduser --no-create-home --system xen-qemuuser-shared 68 69A final way to set up a separate process for qemus is to allocate one 70UID per VM, and set the UID in the domain config file with the 71`device_model_user` argument. For example, suppose you have a VM 72named `c6-01`. You might do the following: 73 74 adduser --system --no-create-home --group xen-qemuuser-c6-01 75 76And then in your config file, the following line: 77 78 device_model_user="xen-qemuuser-c6-01" 79 80If you use this method, you should also allocate one "reaper" user to 81be used for killing device models: 82 83 adduser --system --no-create-home --group xen-qemuuser-reaper 84 85NOTE: It is important when using `device_model_user` that EACH VM HAVE 86A SEPARATE UID, and that none of these UIDs map to root. xl will 87throw an error a uid maps to zero, but not if multiple VMs have the 88same uid. Multiple VMs with the same device model uid will cause 89problems. 90 91It is also important that `xen-qemuuser-reaper` not have any processes 92associated with it, as they will be destroyed when deprivileged qemu 93processes are destroyed. 94 95## Domain config changes 96 97The core domain config change is to add the following line to the 98domain configuration: 99 100 dm_restrict=1 101 102This will perform a number of restrictions, outlined below in the 103'Technical details' section. 104 105# Technical details 106 107See docs/design/qemu-deprivilege.md for technical details. 108 109# Limitations 110 111The following features still need to be implemented: 112 113* Inserting a new cdrom while the guest is running (xl cdrom-insert) 114* Support for qdisk backends 115 116A number of restrictions still need to be implemented. A compromised 117device model may be able to do the following: 118 119* Delay or exploit weaknesses in the toolstack 120* Launch "fork bombs" or other resource exhaustion attacks 121* Make network connections on the management network 122* Break out of the restrictions after migration 123 124Additionally, getting PCI passthrough to work securely would require a 125significant rework of how passthrough works at the moment. It may be 126implemented at some point but is not a near-term priority. 127 128See SUPPORT.md for security support status. 129 130# History 131 132------------------------------------------------------------------------ 133Date Revision Version Notes 134---------- -------- -------- ------------------------------------------- 1352018-09-14 1 Xen 4.12 Imported from docs/misc 136---------- -------- -------- ------------------------------------------- 137