1Title   : How to do PCI Passthrough with VT-d
2Authors : Allen Kay    <allen.m.kay@intel.com>
3          Weidong Han  <weidong.han@intel.com>
4          Yuji Shimada <shimada-yxb@necst.nec.co.jp>
5Created : October-24-2007
6Updated : July-07-2009
7
8How to turn on VT-d in Xen
9--------------------------
10
11Xen with 2.6.18 dom0:
121 ) cd xen-unstable.hg
132 ) make install
143 ) make linux-2.6-xen-config CONFIGMODE=menuconfig
154 ) change XEN->"PCI-device backend driver" from "M" to "*".
165 ) make linux-2.6-xen-build
176 ) make linux-2.6-xen-install
187 ) depmod 2.6.18.8-xen
198 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.18-xen.img 2.6.18.8-xen
209 ) cp initrd-2.6.18-xen.img /boot
2110) lspci - select the PCI BDF you want to assign to guest OS
2211) "hide" pci device from dom0 as following sample grub entry:
23
24title Xen-Fedora Core (2.6.18-xen)
25        root (hd0,0)
26        kernel /boot/xen.gz com1=115200,8n1 console=com1 iommu=1
27        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro xencons=ttyS console=tty0 console=ttyS0, pciback.hide=(01:00.0)(03:00.0)
28        module /boot/initrd-2.6.18-xen.img
29
30    or use dynamic hiding via PCI backend sysfs interface:
31        a) check if the driver has binded to the device
32            ls -l /sys/bus/pci/devices/0000:01:00.0/driver
33            ... /sys/bus/pci/devices/0000:01:00.0/driver -> ../../../../bus/pci/drivers/igb
34        b) if yes, then unload the driver first
35            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/igb/unbind
36        c) add the device to the PCI backend
37            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/new_slot
38        d) let the PCI backend bind to the device
39            echo -n 0000:01:00.0 >/sys/bus/pci/drivers/pciback/bind
40
4112) reboot system (not requires if you use the dynamic hiding method)
4213) add "pci" line in /etc/xen/hvm.conf for to assigned devices
43        pci = [ '01:00.0', '03:00.0' ]
4415) start hvm guest and use "lspci" to see the passthru device and
45    "ifconfig" to see if IP address has been assigned to NIC devices.
46
47
48Xen with pv-ops dom0:
491 ) cd xen-unstable.hg
502 ) make install
513 ) make linux-2.6-pvops-config CONFIGMODE=menuconfig
524 ) change Bus options (PCI etc.)->"PCI Stub driver" to "*".
535 ) make linux-2.6-pvops-build
546 ) make linux-2.6-pvops-install
557 ) mkinitrd -v -f --with=ahci --with=aacraid --with=sd_mod --with=scsi_mod initrd-2.6.30-rc3-tip.img 2.6.30-rc3-tip
56    (change 2.6.30-rc3-tip to pv-ops dom0 version when it's updated in future)
578 ) cp initrd-2.6.30-rc3-tip.img /boot
589 ) edit grub:
59
60title Xen-Fedora Core (pv-ops)
61        root (hd0,0)
62        kernel /boot/xen.gz console=com1,vga console=com1 com1=115200,8n1 iommu=1
63        module /boot/vmlinuz-2.6.30-rc3-tip root=LABEL=/ ro console=hvc0 earlyprintk=xen
64        module /boot/initrd-2.6.30-rc3-tip.img
65
6610) reboot system
6711) hide device using pci-stub (example PCI device 01:00.0):
68
69    - lspci -n
70    - locate the entry for device 01:00.0 and note down the vendor & device ID
718086:10b9
72        ...
73        01:00.0 0200: 8086:10b9 (rev 06)
74        ...
75    - then use following commands to hide it:
76        echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id
77        echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
78        echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind
79
8012) add "pci" line in /etc/xen/hvm.conf for to assigned devices
81        pci = [ '01:00.0' ]
8213) start hvm guest and use "lspci" to see the passthru device and
83    "ifconfig" to see if IP address has been assigned to NIC devices.
84
85
86Enable MSI/MSI-x for assigned devices
87-------------------------------------
88Add "msi=1" option in kernel line of host grub.
89
90
91MSI-INTx translation for passthrough devices in HVM
92---------------------------------------------------
93
94If the assigned device uses a physical IRQ that is shared by more than
95one device among multiple domains, there may be significant impact on
96device performance. Unfortunately, this is quite a common case if the
97IO-APIC (INTx) IRQ is used. MSI can avoid this issue, but was only
98available if the guest enables it.
99
100With MSI-INTx translation turned on, Xen enables device MSI if it's
101available, regardless of whether the guest uses INTx or MSI. If the
102guest uses INTx IRQ, Xen will inject a translated INTx IRQ to guest's
103virtual ioapic whenever an MSI message is received. This reduces the
104interrupt sharing of the system. If the guest OS enables MSI or MSI-X,
105the translation is automatically turned off.
106
107To enable or disable MSI-INTx translation globally, add "pci_msitranslate"
108in the config file:
109	pci_msitranslate = 1         (default is 1)
110
111To override for a specific device:
112	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
113
114RDM, 'reserved device memory', for PCI Device Passthrough
115---------------------------------------------------------
116
117There are some devices the BIOS controls, for e.g. USB devices to perform
118PS2 emulation. The regions of memory used for these devices are marked
119reserved in the e820 map. When we turn on DMA translation, DMA to those
120regions will fail. Hence BIOS uses RMRR to specify these regions along with
121devices that need to access these regions. OS is expected to setup
122identity mappings for these regions for these devices to access these regions.
123
124While creating a VM we should reserve them in advance, and avoid any conflicts.
125So we introduce user configurable parameters to specify RDM resource and
126according policies,
127
128To enable this globally, add "rdm" in the config file:
129
130    rdm = "strategy=host, policy=relaxed"   (default policy is "relaxed")
131
132Or just for a specific device:
133
134    pci = [ '01:00.0,rdm_policy=relaxed', '03:00.0,rdm_policy=strict' ]
135
136For all the options available to RDM, see xl.cfg(5).
137
138
139Caveat on Conventional PCI Device Passthrough
140---------------------------------------------
141
142VT-d spec specifies that all conventional PCI devices behind a
143PCIe-to-PCI bridge have to be assigned to the same domain.
144
145PCIe devices do not have this restriction.
146
147
148VT-d Works on OS:
149-----------------
150
1511) Host OS: PAE, 64-bit
1522) Guest OS: 32-bit, PAE, 64-bit
153
154
155Combinations Tested:
156--------------------
157
1581) 64-bit host: 32/PAE/64 Linux/XP/Win2003/Vista guests
1592) PAE host: 32/PAE Linux/XP/Win2003/Vista guests
160
161
162VTd device hotplug:
163-------------------
164
1652 virtual PCI slots (6~7) are reserved in HVM guest to support VTd hotplug. If you have more VTd devices, only 2 of them can support hotplug. Usage is simple:
166
167 1. List the VTd device by dom. You can see a VTd device 0:2:0.0 is inserted in the HVM domain's PCI slot 6. '''lspci''' inside the guest should see the same.
168
169	[root@vt-vtd ~]# xm pci-list HVMDomainVtd
170	VSlt domain   bus   slot   func
171	0x6    0x0  0x02   0x00    0x0
172
173 2. Detach the device from the guest by the physical BDF. Then HVM guest will receive a virtual PCI hot removal event to detach the physical device
174
175	[root@vt-vtd ~]# xm pci-detach HVMDomainVtd 0:2:0.0
176
177 3. Attach a PCI device to the guest by the physical BDF and desired virtual slot(optional). Following command would insert the physical device into guest's virtual slot 7
178
179	[root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7
180
181    To specify options for the device, use -o or --options=. Following command would disable MSI-INTx translation for the device
182
183	[root@vt-vtd ~]# xm pci-attach -o msitranslate=0 0:2:0.0 7
184
185
186VTd hotplug usage model:
187------------------------
188
189 * For live migration: As you know, VTd device would break the live migration as physical device can't be save/restored like virtual device. With hotplug, live migration is back again. Just hot remove all the VTd devices before live migration and hot add new VTd devices on target machine after live migration.
190
191 * VTd hotplug for device switch: VTd hotplug can be used to dynamically switch physical device between different HVM guest without shutdown.
192
193
194VT-d Enabled Systems
195--------------------
196
1971) For VT-d enabling work on Xen, we have been using development
198systems using following Intel motherboards:
199    - DQ35MP
200    - DQ35JO
201
2022) As far as we know, following OEM systems also has vt-d enabled.
203Feel free to add others as they become available.
204
205- Dell: Optiplex 755
206http://www.dell.com/content/products/category.aspx/optix?c=us&cs=555&l=en&s=biz
207
208- HP Compaq:  DC7800
209http://h10010.www1.hp.com/wwpc/us/en/en/WF04a/12454-12454-64287-321860-3328898.html
210
211For more information, pls refer to https://wiki.xen.org/wiki/VTdHowTo.
212
213
214Assigning devices to HVM domains
215--------------------------------
216
217Most device types such as NIC, HBA, EHCI and UHCI can be assigned to
218an HVM domain.
219
220But some devices have design features which make them unsuitable for
221assignment to an HVM domain. Examples include:
222
223 * Device has an internal resource, such as private memory, which is
224   mapped to memory address space with BAR (Base Address Register).
225 * Driver submits command with a pointer to a buffer within internal
226   resource. Device decodes the pointer (address), and accesses to the
227   buffer.
228
229In an HVM domain, the BAR is virtualized, and host-BAR value and
230guest-BAR value are different. The addresses of internal resource from
231device's view and driver's view are different. Similarly, the
232addresses of buffer within internal resource from device's view and
233driver's view are different. As a result, device can't access to the
234buffer specified by driver.
235
236Such devices assigned to HVM domain currently do not work.
237
238
239Using SR-IOV with VT-d
240--------------------------------
241
242The Single Root I/O Virtualization is a PCI Express feature supported by
243some devices such as Intel 82576 which allows you to create virtual PCI
244devices (Virtual Function) and assign them to the HVM guest.
245
246You can use latest lspci (v3.1 and above) to check if your PCIe device
247supports the SR-IOV capability or not.
248
249  $ lspci -s 01:00.0 -vvv
250
251  01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
252        Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter
253
254        ...
255
256        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
257                IOVCap: Migration-, Interrupt Message Number: 000
258                IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
259                IOVSta: Migration-
260                Initial VFs: 8, Total VFs: 8, Number of VFs: 7, Function Dependency Link: 00
261                VF offset: 128, stride: 2, Device ID: 10ca
262                Supported Page Size: 00000553, System Page Size: 00000001
263                VF Migration: offset: 00000000, BIR: 0
264        Kernel driver in use: igb
265
266
267The function that has the SR-IOV capability is also known as Physical
268Function. You need the Physical Function driver (runs in the Dom0 and
269controls the physical resources allocation) to enable the Virtual Function.
270Following is the Virtual Functions associated with above Physical Function.
271
272  $ lspci | grep -e 01:1[01].[0246]
273
274  01:10.0 Ethernet controller: Intel Corporation Device 10ca (rev 01)
275  01:10.2 Ethernet controller: Intel Corporation Device 10ca (rev 01)
276  01:10.4 Ethernet controller: Intel Corporation Device 10ca (rev 01)
277  01:10.6 Ethernet controller: Intel Corporation Device 10ca (rev 01)
278  01:11.0 Ethernet controller: Intel Corporation Device 10ca (rev 01)
279  01:11.2 Ethernet controller: Intel Corporation Device 10ca (rev 01)
280  01:11.4 Ethernet controller: Intel Corporation Device 10ca (rev 01)
281
282We can tell that Physical Function 01:00.0 has 7 Virtual Functions (01:10.0,
28301:10.2, 01:10.4, 01:10.6, 01:11.0, 01:11.2, 01:11.4). And the Virtual
284Function PCI Configuration Space looks just like normal PCI device.
285
286  $ lspci -s 01:10.0 -vvv
287
288  01:10.0 Ethernet controller: Intel Corporation 82576 Gigabit Virtual Function
289        Subsystem: Intel Corporation Gigabit Virtual Function
290        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
291        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
292        Region 0: [virtual] Memory at d2840000 (64-bit, non-prefetchable) [size=16K]
293        Region 3: [virtual] Memory at d2860000 (64-bit, non-prefetchable) [size=16K]
294        Capabilities: [70] MSI-X: Enable+ Mask- TabSize=3
295                Vector table: BAR=3 offset=00000000
296                PBA: BAR=3 offset=00002000
297        Capabilities: [a0] Express (v2) Endpoint, MSI 00
298
299        ...
300
301
302The Virtual Function only appears after the Physical Function driver
303is loaded. Once the Physical Function driver is unloaded. All Virtual
304Functions associated with this Physical Function disappear.
305
306The Virtual Function is essentially same as the normal PCI device when
307using it in VT-d environment. You need to hide the Virtual Function,
308use the Virtual Function bus, device and function number in the HVM
309guest configuration file and then boot the HVM guest. You also need the
310Virtual Function driver which is the normal PCI device driver in the
311HVM guest to drive the Virtual Function. The PCIe SR-IOV specification
312requires that the Virtual Function can only support MSI/MSI-x if it
313uses interrupt. This means you also need to enable Xen/MSI support.
314Since the Virtual Function is dynamically allocated by Physical Function
315driver, you might want to use the dynamic hiding method mentioned above.
316