1Title : How to do PCI Passthrough with VT-d 2Authors : Allen Kay <allen.m.kay@intel.com> 3 Weidong Han <weidong.han@intel.com> 4 Yuji Shimada <shimada-yxb@necst.nec.co.jp> 5Created : October-24-2007 6Updated : July-07-2009 7 8How to turn on VT-d in Xen 9-------------------------- 10 11Xen with 2.6.18 dom0: 121 ) cd xen-unstable.hg 132 ) make install 143 ) make linux-2.6-xen-config CONFIGMODE=menuconfig 154 ) change XEN->"PCI-device backend driver" from "M" to "*". 165 ) make linux-2.6-xen-build 176 ) make linux-2.6-xen-install 187 ) depmod 2.6.18.8-xen 198 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.18-xen.img 2.6.18.8-xen 209 ) cp initrd-2.6.18-xen.img /boot 2110) lspci - select the PCI BDF you want to assign to guest OS 2211) "hide" pci device from dom0 as following sample grub entry: 23 24title Xen-Fedora Core (2.6.18-xen) 25 root (hd0,0) 26 kernel /boot/xen.gz com1=115200,8n1 console=com1 iommu=1 27 module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro xencons=ttyS console=tty0 console=ttyS0, pciback.hide=(01:00.0)(03:00.0) 28 module /boot/initrd-2.6.18-xen.img 29 30 or use dynamic hiding via PCI backend sysfs interface: 31 a) check if the driver has binded to the device 32 ls -l /sys/bus/pci/devices/0000:01:00.0/driver 33 ... /sys/bus/pci/devices/0000:01:00.0/driver -> ../../../../bus/pci/drivers/igb 34 b) if yes, then unload the driver first 35 echo -n 0000:01:00.0 >/sys/bus/pci/drivers/igb/unbind 36 c) add the device to the PCI backend 37 echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/new_slot 38 d) let the PCI backend bind to the device 39 echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/bind 40 4112) reboot system (not requires if you use the dynamic hiding method) 4213) add "pci" line in /etc/xen/hvm.conf for to assigned devices 43 pci = [ '01:00.0', '03:00.0' ] 4415) start hvm guest and use "lspci" to see the passthru device and 45 "ifconfig" to see if IP address has been assigned to NIC devices. 46 47 48Xen with pv-ops dom0: 491 ) cd xen-unstable.hg 502 ) make install 513 ) make linux-2.6-pvops-config CONFIGMODE=menuconfig 524 ) change Bus options (PCI etc.)->"PCI Stub driver" to "*". 535 ) make linux-2.6-pvops-build 546 ) make linux-2.6-pvops-install 557 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.30-rc3-tip.img 2.6.30-rc3-tip 56 (change 2.6.30-rc3-tip to pv-ops dom0 version when it's updated in future) 578 ) cp initrd-2.6.30-rc3-tip.img /boot 589 ) edit grub: 59 60title Xen-Fedora Core (pv-ops) 61 root (hd0,0) 62 kernel /boot/xen.gz console=com1,vga console=com1 com1=115200,8n1 iommu=1 63 module /boot/vmlinuz-2.6.30-rc3-tip root=LABEL=/ ro console=hvc0 earlyprintk=xen 64 module /boot/initrd-2.6.30-rc3-tip.img 65 6610) reboot system 6711) hide device using pci-stub (example PCI device 01:00.0): 68 69 - lspci -n 70 - locate the entry for device 01:00.0 and note down the vendor & device ID 718086:10b9 72 ... 73 01:00.0 0200: 8086:10b9 (rev 06) 74 ... 75 - then use following commands to hide it: 76 echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id 77 echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind 78 echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind 79 8012) add "pci" line in /etc/xen/hvm.conf for to assigned devices 81 pci = [ '01:00.0' ] 8213) start hvm guest and use "lspci" to see the passthru device and 83 "ifconfig" to see if IP address has been assigned to NIC devices. 84 85 86Enable MSI/MSI-x for assigned devices 87------------------------------------- 88Add "msi=1" option in kernel line of host grub. 89 90 91MSI-INTx translation for passthrough devices in HVM 92--------------------------------------------------- 93 94If the assigned device uses a physical IRQ that is shared by more than 95one device among multiple domains, there may be significant impact on 96device performance. Unfortunately, this is quite a common case if the 97IO-APIC (INTx) IRQ is used. MSI can avoid this issue, but was only 98available if the guest enables it. 99 100With MSI-INTx translation turned on, Xen enables device MSI if it's 101available, regardless of whether the guest uses INTx or MSI. If the 102guest uses INTx IRQ, Xen will inject a translated INTx IRQ to guest's 103virtual ioapic whenever an MSI message is received. This reduces the 104interrupt sharing of the system. If the guest OS enables MSI or MSI-X, 105the translation is automatically turned off. 106 107To enable or disable MSI-INTx translation globally, add "pci_msitranslate" 108in the config file: 109 pci_msitranslate = 1 (default is 1) 110 111To override for a specific device: 112 pci = [ '01:00.0,msitranslate=0', '03:00.0' ] 113 114RDM, 'reserved device memory', for PCI Device Passthrough 115--------------------------------------------------------- 116 117There are some devices the BIOS controls, for e.g. USB devices to perform 118PS2 emulation. The regions of memory used for these devices are marked 119reserved in the e820 map. When we turn on DMA translation, DMA to those 120regions will fail. Hence BIOS uses RMRR to specify these regions along with 121devices that need to access these regions. OS is expected to setup 122identity mappings for these regions for these devices to access these regions. 123 124While creating a VM we should reserve them in advance, and avoid any conflicts. 125So we introduce user configurable parameters to specify RDM resource and 126according policies, 127 128To enable this globally, add "rdm" in the config file: 129 130 rdm = "strategy=host, policy=relaxed" (default policy is "relaxed") 131 132Or just for a specific device: 133 134 pci = [ '01:00.0,rdm_policy=relaxed', '03:00.0,rdm_policy=strict' ] 135 136For all the options available to RDM, see xl.cfg(5). 137 138 139Caveat on Conventional PCI Device Passthrough 140--------------------------------------------- 141 142VT-d spec specifies that all conventional PCI devices behind a 143PCIe-to-PCI bridge have to be assigned to the same domain. 144 145PCIe devices do not have this restriction. 146 147 148VT-d Works on OS: 149----------------- 150 1511) Host OS: PAE, 64-bit 1522) Guest OS: 32-bit, PAE, 64-bit 153 154 155Combinations Tested: 156-------------------- 157 1581) 64-bit host: 32/PAE/64 Linux/XP/Win2003/Vista guests 1592) PAE host: 32/PAE Linux/XP/Win2003/Vista guests 160 161 162VTd device hotplug: 163------------------- 164 1652 virtual PCI slots (6~7) are reserved in HVM guest to support VTd hotplug. If you have more VTd devices, only 2 of them can support hotplug. Usage is simple: 166 167 1. List the VTd device by dom. You can see a VTd device 0:2:0.0 is inserted in the HVM domain's PCI slot 6. '''lspci''' inside the guest should see the same. 168 169 [root@vt-vtd ~]# xm pci-list HVMDomainVtd 170 VSlt domain bus slot func 171 0x6 0x0 0x02 0x00 0x0 172 173 2. Detach the device from the guest by the physical BDF. Then HVM guest will receive a virtual PCI hot removal event to detach the physical device 174 175 [root@vt-vtd ~]# xm pci-detach HVMDomainVtd 0:2:0.0 176 177 3. Attach a PCI device to the guest by the physical BDF and desired virtual slot(optional). Following command would insert the physical device into guest's virtual slot 7 178 179 [root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7 180 181 To specify options for the device, use -o or --options=. Following command would disable MSI-INTx translation for the device 182 183 [root@vt-vtd ~]# xm pci-attach -o msitranslate=0 0:2:0.0 7 184 185 186VTd hotplug usage model: 187------------------------ 188 189 * For live migration: As you know, VTd device would break the live migration as physical device can't be save/restored like virtual device. With hotplug, live migration is back again. Just hot remove all the VTd devices before live migration and hot add new VTd devices on target machine after live migration. 190 191 * VTd hotplug for device switch: VTd hotplug can be used to dynamically switch physical device between different HVM guest without shutdown. 192 193 194VT-d Enabled Systems 195-------------------- 196 1971) For VT-d enabling work on Xen, we have been using development 198systems using following Intel motherboards: 199 - DQ35MP 200 - DQ35JO 201 2022) As far as we know, following OEM systems also has vt-d enabled. 203Feel free to add others as they become available. 204 205- Dell: Optiplex 755 206http://www.dell.com/content/products/category.aspx/optix?c=us&cs=555&l=en&s=biz 207 208- HP Compaq: DC7800 209http://h10010.www1.hp.com/wwpc/us/en/en/WF04a/12454-12454-64287-321860-3328898.html 210 211For more information, pls refer to https://wiki.xen.org/wiki/VTdHowTo. 212 213 214Assigning devices to HVM domains 215-------------------------------- 216 217Most device types such as NIC, HBA, EHCI and UHCI can be assigned to 218an HVM domain. 219 220But some devices have design features which make them unsuitable for 221assignment to an HVM domain. Examples include: 222 223 * Device has an internal resource, such as private memory, which is 224 mapped to memory address space with BAR (Base Address Register). 225 * Driver submits command with a pointer to a buffer within internal 226 resource. Device decodes the pointer (address), and accesses to the 227 buffer. 228 229In an HVM domain, the BAR is virtualized, and host-BAR value and 230guest-BAR value are different. The addresses of internal resource from 231device's view and driver's view are different. Similarly, the 232addresses of buffer within internal resource from device's view and 233driver's view are different. As a result, device can't access to the 234buffer specified by driver. 235 236Such devices assigned to HVM domain currently do not work. 237 238 239Using SR-IOV with VT-d 240-------------------------------- 241 242The Single Root I/O Virtualization is a PCI Express feature supported by 243some devices such as Intel 82576 which allows you to create virtual PCI 244devices (Virtual Function) and assign them to the HVM guest. 245 246You can use latest lspci (v3.1 and above) to check if your PCIe device 247supports the SR-IOV capability or not. 248 249 $ lspci -s 01:00.0 -vvv 250 251 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 252 Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter 253 254 ... 255 256 Capabilities: [160] Single Root I/O Virtualization (SR-IOV) 257 IOVCap: Migration-, Interrupt Message Number: 000 258 IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+ 259 IOVSta: Migration- 260 Initial VFs: 8, Total VFs: 8, Number of VFs: 7, Function Dependency Link: 00 261 VF offset: 128, stride: 2, Device ID: 10ca 262 Supported Page Size: 00000553, System Page Size: 00000001 263 VF Migration: offset: 00000000, BIR: 0 264 Kernel driver in use: igb 265 266 267The function that has the SR-IOV capability is also known as Physical 268Function. You need the Physical Function driver (runs in the Dom0 and 269controls the physical resources allocation) to enable the Virtual Function. 270Following is the Virtual Functions associated with above Physical Function. 271 272 $ lspci | grep -e 01:1[01].[0246] 273 274 01:10.0 Ethernet controller: Intel Corporation Device 10ca (rev 01) 275 01:10.2 Ethernet controller: Intel Corporation Device 10ca (rev 01) 276 01:10.4 Ethernet controller: Intel Corporation Device 10ca (rev 01) 277 01:10.6 Ethernet controller: Intel Corporation Device 10ca (rev 01) 278 01:11.0 Ethernet controller: Intel Corporation Device 10ca (rev 01) 279 01:11.2 Ethernet controller: Intel Corporation Device 10ca (rev 01) 280 01:11.4 Ethernet controller: Intel Corporation Device 10ca (rev 01) 281 282We can tell that Physical Function 01:00.0 has 7 Virtual Functions (01:10.0, 28301:10.2, 01:10.4, 01:10.6, 01:11.0, 01:11.2, 01:11.4). And the Virtual 284Function PCI Configuration Space looks just like normal PCI device. 285 286 $ lspci -s 01:10.0 -vvv 287 288 01:10.0 Ethernet controller: Intel Corporation 82576 Gigabit Virtual Function 289 Subsystem: Intel Corporation Gigabit Virtual Function 290 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- 291 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 292 Region 0: [virtual] Memory at d2840000 (64-bit, non-prefetchable) [size=16K] 293 Region 3: [virtual] Memory at d2860000 (64-bit, non-prefetchable) [size=16K] 294 Capabilities: [70] MSI-X: Enable+ Mask- TabSize=3 295 Vector table: BAR=3 offset=00000000 296 PBA: BAR=3 offset=00002000 297 Capabilities: [a0] Express (v2) Endpoint, MSI 00 298 299 ... 300 301 302The Virtual Function only appears after the Physical Function driver 303is loaded. Once the Physical Function driver is unloaded. All Virtual 304Functions associated with this Physical Function disappear. 305 306The Virtual Function is essentially same as the normal PCI device when 307using it in VT-d environment. You need to hide the Virtual Function, 308use the Virtual Function bus, device and function number in the HVM 309guest configuration file and then boot the HVM guest. You also need the 310Virtual Function driver which is the normal PCI device driver in the 311HVM guest to drive the Virtual Function. The PCIe SR-IOV specification 312requires that the Virtual Function can only support MSI/MSI-x if it 313uses interrupt. This means you also need to enable Xen/MSI support. 314Since the Virtual Function is dynamically allocated by Physical Function 315driver, you might want to use the dynamic hiding method mentioned above. 316