1 ----------------------- 2 XSM/FLASK Configuration 3 ----------------------- 4 5Xen provides a security framework called XSM, and FLASK is an implementation of 6a security model using this framework (at the time of writing, it is the only 7one). FLASK defines a mandatory access control policy providing fine-grained 8controls over Xen domains, allowing the policy writer to define what 9interactions between domains, devices, and the hypervisor are permitted. 10 11Some examples of what FLASK can do: 12 - Prevent two domains from communicating via event channels or grants 13 - Control which domains can use device passthrough (and which devices) 14 - Restrict or audit operations performed by privileged domains 15 - Prevent a privileged domain from arbitrarily mapping pages from other domains 16 17Some of these examples require dom0 disaggregation to be useful, since the 18domain build process requires the ability to write to the new domain's memory. 19 20Security Status of dom0 disaggregation 21-------------------------------------- 22 23Xen supports disaggregation of various support and management 24functions into their own domains, via the XSM mechanisms described in 25this document. 26 27However the implementations of these support and management interfaces 28were originally written to be used only by the totally-privileged 29dom0, and have not been reviewed for security when exposed to 30supposedly-only-semi-privileged disaggregated management domains. But 31such management domains are (in such a design) to be seen as 32potentially hostile, e.g. due to privilege escalation following 33exploitation of a bug in the management domain. 34 35Until the interfaces have been properly reviewed for security against 36hostile callers, the Xen.org security team intends (subject of course 37to the permission of anyone disclosing to us) to handle these and 38future vulnerabilities in these interfaces in public, as if they were 39normal non-security-related bugs. 40 41This applies only to bugs which do no more than reduce the security of 42a radically disaggregated system to the security of a 43non-disaggregated one. Here a "radically disaggregated system" is one 44which uses the XSM mechanism to delegate the affected interfaces to 45other-than-fully-trusted domains. 46 47This policy does not apply to bugs which affect stub device models, 48driver domains, or stub xenstored - even if those bugs do no worse 49than reduce the security of such a system to one whose device models, 50backend drivers, or xenstore, run in dom0. 51 52For more information see https://xenbits.xen.org/xsa/advisory-77.html. 53 54The following interfaces are covered by this statement. Interfaces 55not listed here are considered safe for disaggregation, security 56issues found in interfaces not listed here will be handled according 57to the normal security problem response policy 58https://www.xenproject.org/security-policy.html. 59 60__HYPERVISOR_domctl (xen/include/public/domctl.h) 61 62 All subops except the following are covered by this statement. (That 63 is, only the subops below are considered safe for disaggregation.) 64 65 * XEN_DOMCTL_ioport_mapping 66 * XEN_DOMCTL_memory_mapping 67 * XEN_DOMCTL_bind_pt_irq 68 * XEN_DOMCTL_unbind_pt_irq 69 70__HYPERVISOR_sysctl (xen/include/public/sysctl.h) 71 72 All subops are covered by this statement. (That is, no subops are 73 considered safe for disaggregation.) 74 75__HYPERVISOR_memory_op (xen/include/public/memory.h) 76 77 The following subops are covered by this statement. subops not listed 78 here are considered safe for disaggregation. 79 80 * XENMEM_set_pod_target 81 * XENMEM_get_pod_target 82 * XENMEM_claim_pages 83 84 85Setting up FLASK 86---------------- 87 88Xen must be compiled with XSM and FLASK enabled; by default, the security 89framework is disabled. Running 'make -C xen menuconfig' and enabling XSM 90and FLASK inside 'Common Features'; this change requires a make clean and 91rebuild. 92 93FLASK uses only one domain configuration parameter (seclabel) defining the 94full security label of the newly created domain. If using the example policy, 95"seclabel='system_u:system_r:domU_t'" is an example of a normal domain. The 96labels are in the same format as SELinux labels; see http://selinuxproject.org 97for more details on the use of the user, role, and optional MLS/MCS labels. 98 99FLASK policy overview 100--------------------- 101 102Most of FLASK policy consists of defining the interactions allowed between 103different types (domU_t would be the type in this example). For simple policies, 104only type enforcement is used and the user and role are set to system_u and 105system_r for all domains. 106 107The FLASK security framework is mostly configured using a security policy file. 108It relies on the SELinux compiler "checkpolicy"; if this is available, the 109policy will be compiled as part of the tools build. If hypervisor support for a 110built-in policy is enabled ("Compile Xen with a built-in security policy"), the 111policy will be built during the hypervisor build. 112 113The policy is generated from definition files in tools/flask/policy. Most 114changes to security policy will involve creating or modifying modules found in 115tools/flask/policy/modules/. The modules.conf file there defines what modules 116are enabled and has short descriptions of each module. 117 118If not using the built-in policy, the XSM policy file needs to be copied to 119/boot and loaded as a module by grub. The exact position and filename of the 120module does not matter as long as it is after the Xen kernel; it is normally 121placed either just above the dom0 kernel or at the end. Once dom0 is running, 122the policy can be reloaded using "xl loadpolicy". 123 124The example policy included with Xen demonstrates most of the features of FLASK 125that can be used without dom0 disaggregation. The main types for domUs are: 126 127 - domU_t is a domain that can communicate with any other domU_t 128 - isolated_domU_t can only communicate with dom0 129 - prot_domU_t is a domain type whose creation can be disabled with a boolean 130 - nomigrate_t is a domain that must be created via the nomigrate_t_building 131 type, and whose memory cannot be read by dom0 once created 132 133HVM domains with stubdomain device models also need a type for the stub domain. 134The example policy defines dm_dom_t for the device model of a domU_t domain; 135there are no device model types defined for the other domU types. 136 137One disadvantage of using type enforcement to enforce isolation is that a new 138type is needed for each group of domains. The user field can be used to address 139this for the most common case of groups that can communicate internally but not 140externally; see "Users and roles" below. 141 142Type transitions 143---------------- 144 145Xen defines a number of operations such as memory mapping that are necessary for 146a domain to perform on itself, but are also undesirable to allow a domain to 147perform on every other domain of the same label. While it is possible to address 148this by only creating one domain per type, this solution significantly limits 149the flexibility of the type system. Another method to address this issue is to 150duplicate the permission names for every operation that can be performed on the 151current domain or on other domains; however, this significantly increases the 152necessary number of permissions and complicates the XSM hooks. Instead, this is 153addressed by allowing a distinct type to be used for a domain's access to 154itself. The same applies for a device model domain's access to its designated 155target, allowing the IS_PRIV_FOR checks used in Xen's DAC model to be 156implemented in FLASK. 157 158Upon domain creation (or relabel), a type transition is computed using the 159domain's label as the source and target. The result of this computation is used 160as the target when the domain accesses itself. In the example policy, this 161computed type is the result of appending _self to a domain's type: domU_t_self 162for domU_t. If no type transition rule exists, the domain will continue to use 163its own label for both the source and target. An AVC message will look like: 164 165 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self 166 167A similar type transition is done when a device model domain is associated with 168its target using the set_target operation. The transition is computed with the 169target domain as the source and the device model domain as the target: this 170ordering was chosen in order to preserve the original label for the target when 171no type transition rule exists. In the example policy, these computed types are 172the result of appending _target to the domain. 173 174Type transitions are also used to compute the labels of event channels. 175 176Users and roles 177--------------- 178 179The default user and role used for domains is system_u and system_r. Users are 180visible in the labels of domains and associated objects (event channels); when 181the vm_role module is enabled, "user_1:vm_r:domU_t" is a valid label for a 182domain created by the user_1 user. 183 184Access control rules involving users and roles are defined in a module's 185constraints file (for example, vm_rule.cons). The vm_role module defines one 186role (vm_r) and three users (user_1 .. user_3), along with constraints that 187prevent different users from communicating using grants or event channels, while 188still allowing communication with the system_u user where dom0 resides. 189 190Resource Policy 191--------------- 192 193The example policy also includes a resource type (nic_dev_t) for device 194passthrough, configured to allow use by domU_t. To label the PCI device 3:2.0 195for passthrough, run: 196 197 tools/flask/utils/flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t 198 199This command must be rerun on each boot or after any policy reload. 200 201When first loading or writing a policy, you should run FLASK in permissive mode 202(flask=permissive on the command line) and check the Xen logs (xl dmesg) for AVC 203denials before using it in enforcing mode (the default value of the boot 204parameter, which can also be changed using xl setenforce). When using the 205default types for domains (domU_t), the example policy shipped with Xen should 206allow the same operations on or between domains as when not using FLASK. 207 208 209MLS/MCS policy 210-------------- 211 212If you want to use the MLS policy, then set TYPE=xen-mls in the policy Makefile 213before building the policy. Note that the MLS constraints in policy/mls 214are incomplete and are only a sample. 215 216 217AVC denials 218----------- 219 220XSM:Flask will emit avc: denied messages when a permission is denied by the 221policy, just like SELinux. For example, if the HVM rules are removed from the 222declare_domain and create_domain interfaces: 223 224# xl dmesg | grep avc 225(XEN) avc: denied { setparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 226(XEN) avc: denied { getparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 227(XEN) avc: denied { irqlevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 228(XEN) avc: denied { pciroute } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 229(XEN) avc: denied { setparam } for domid=4 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t tclass=hvm 230(XEN) avc: denied { cacheattr } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 231(XEN) avc: denied { pcilevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm 232 233Existing SELinux tools such as audit2allow can be applied to these denials, e.g. 234xl dmesg | audit2allow 235 236The generated allow rules can then be fed back into the policy by adding them to 237a module, although manual review is advised and will often lead to adding 238parameterized rules to the interfaces in xen.if to address the general case. 239 240 241Device Labeling in Policy 242------------------------- 243 244FLASK is capable of labeling devices and enforcing policies associated with 245them. There are two methods to label devices: dynamic labeling using 246flask-label-pci or similar tools run in dom0, or static labeling defined in 247policy. Static labeling will make security policy machine-specific and may 248prevent the system from booting after any hardware changes (adding PCI cards, 249memory, or even changing certain BIOS settings). Dynamic labeling requires that 250the domain performing the labeling be trusted to label all the devices in the 251system properly. 252 253IRQs, PCI devices, I/O memory and x86 IO ports can all have labels defined. 254There are examples commented out in tools/flask/policy/policy/device_contexts. 255 256Device Labeling 257--------------- 258 259The "lspci -vvn" command can be used to output all the devices and identifiers 260associated with them. For example, to label an Intel e1000e ethernet card the 261lspci output is.. 262 26300:19.0 0200: 8086:10de (rev 02) 264 Subsystem: 1028:0276 265 Interrupt: pin A routed to IRQ 33 266 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] 267 Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K] 268 Region 2: I/O ports at ecc0 [size=32] 269 Kernel modules: e1000e 270 271The labeling can be done with these lines in device_contexts: 272 273pirqcon 33 system_u:object_r:nicP_t 274iomemcon 0xfebe0-0xfebff system_u:object_r:nicP_t 275iomemcon 0xfebd9 system_u:object_r:nicP_t 276ioportcon 0xecc0-0xecdf system_u:object_r:nicP_t 277pcidevicecon 0xc800 system_u:object_r:nicP_t 278 279The PCI device label must be computed as the 32-bit SBDF number for the PCI 280device. It the PCI device is aaaa:bb:cc.d or bb:cc.d, then the SBDF can be 281calculated using: 282 SBDF = (a << 16) | (b << 8) | (c << 3) | d 283 284The AVC denials for IRQs, memory, ports, and PCI devices will normally contain 285the ranges being denied to more easily determine what resources are required. 286When running in permissive mode, only the first denial of a given 287source/destination is printed to the log, so labeling devices using this method 288may require multiple passes to find all required ranges. 289