IOEMU stubdom ============= This boosts HVM performance by putting ioemu in its own lightweight domain. General Configuration ===================== Due to a race between the creation of the IOEMU stubdomain itself and allocation of video memory for the HVM domain, you need to avoid the need for ballooning, by using the hypervisor dom0_mem= option for instance. Using with XL ------------- The enable IOEMU stub domains set the following in your domain config: device_model_stubdomain_override = 1 See xl.cfg(5) for more details of the xl domain configuration syntax and https://wiki.xen.org/wiki/Device_Model_Stub_Domains for more information on device model stub domains Toolstack to MiniOS ioemu stubdomain protocol --------------------------------------------- This section describe communication protocol between toolstack and qemu-traditional running in MiniOS stubdomain. The protocol include expectations of both qemu and stubdomain itself. Setup (done by toolstack, expected by stubdomain): - Block devices for target domain are connected as PV disks to stubdomain, according to configuration order, starting with xvda - Network devices for target domain are connected as PV nics to stubdomain, according to configuration order, starting with 0 - if graphics output is expected, VFB and VKB devices are set for stubdomain (its backend is responsible for exposing them using appropriate protocol like VNC or Spice) - other target domain's devices are not connected at this point to stubdomain (may be hot-plugged later) - QEMU command line (space separated arguments) is stored in /vm//image/dmargs xenstore path - target domain id is stored in /local/domain//target xenstore path ?? - bios type is stored in /local/domain//hvmloader/bios - stubdomain's console 0 is connected to qemu log file - stubdomain's console 1 is connected to qemu save file (for saving state) - stubdomain's console 2 is connected to qemu save file (for restoring state) - next consoles are connected according to target guest's serial console configuration Startup: 1. PV stubdomain is started with ioemu-stubdom.gz kernel and no initrd 2. stubdomain initialize relevant devices 3. stubdomain signal readiness by writing "running" to /local/domain//device-model//state xenstore path 4. now stubdomain is considered running Runtime control (hotplug etc): Toolstack can issue command through xenstore. The sequence is (from toolstack POV): 1. Write parameter to /local/domain//device-model//parameter. 2. Write command to /local/domain//device-model//command. 3. Wait for command result in /local/domain//device-model//state (command specific value). 4. Write "running" back to /local/domain//device-model//state. Defined commands: - "pci-ins" - PCI hot plug, results: - "pci-inserted" - success - "pci-insert-failed" - failure - "pci-rem" - PCI hot remove, results: - "pci-removed" - success - ?? - "save" - save domain state to console 1, results: - "paused" - success - "continue" - resume domain execution, after loading state from console 2 (require -loadvm command argument), results: - "running" - success Toolstack to Linux ioemu stubdomain protocol -------------------------------------------- This section describe communication protocol between toolstack and qemu-upstream running in Linux stubdomain. The protocol include expectations of both stubdomain, and qemu. Setup (done by toolstack, expected by stubdomain): - Block devices for target domain are connected as PV disks to stubdomain, according to configuration order, starting with xvda - Network devices for target domain are connected as PV nics to stubdomain, according to configuration order, starting with 0 - [not implemented] if graphics output is expected, VFB and VKB devices are set for stubdomain (its backend is responsible for exposing them using appropriate protocol like VNC or Spice) - other target domain's devices are not connected at this point to stubdomain (may be hot-plugged later) - QEMU command line is stored in /vm//image/dm-argv xenstore dir, each argument as separate key in form /vm//image/dm-argv/NNN, where NNN is 0-padded argument number - target domain id is stored in /local/domain//target xenstore path ?? - bios type is stored in /local/domain//hvmloader/bios - stubdomain's console 0 is connected to qemu log file - stubdomain's console 1 is connected to qemu save file (for saving state) - stubdomain's console 2 is connected to qemu save file (for restoring state) - next consoles are connected according to target guest's serial console configuration Environment exposed by stubdomain to qemu (needed to construct appropriate qemu command line and later interact with qmp): - target domain's disks are available as /dev/xvd[a-z] - console 2 (incoming domain state) must be connected to an FD and the command line argument $STUBDOM_RESTORE_INCOMING_ARG must be replaced with fd:$FD to form "-incoming fd:$FD" - console 1 (saving domain state) is added over QMP to qemu as "fdset-id 1" (done by stubdomain, toolstack doesn't need to care about it) - nics are connected to relevant stubdomain PV vifs when available (qemu -netdev should specify ifname= explicitly) Startup: 1. toolstack starts PV stubdomain with stubdom-linux-kernel kernel and stubdom-linux-initrd initrd 2. stubdomain initialize relevant devices 3. stubdomain starts qemu with requested command line, plus few stubdomain specific ones - including local qmp access options 4. stubdomain starts vchan server on /local/domain//device-model//qmp-vchan, exposing qmp socket to the toolstack 5. qemu signal readiness by writing "running" to /local/domain//device-model//state xenstore path 6. now device model is considered running QEMU can be controlled using QMP over vchan at /local/domain//device-model//qmp-vchan. Only one simultaneous connection is supported and toolstack needs to ensure that. Limitations: - PCI passthrough require permissive mode - only one nic is supported - at most 26 emulated disks are supported (more are still available as PV disks) - graphics output (VNC/SDL/Spice) not supported PV-GRUB ======= This replaces pygrub to boot domU images safely: it runs the regular grub inside the created domain itself and uses regular domU facilities to read the disk / fetch files from network etc. ; it eventually loads the PV kernel and chain-boots it. Configuration ============= In your PV config, - use pv-grub.gz as kernel: kernel = "pv-grub.gz" - set the path to menu.lst, as seen from the domU, in extra: extra = "(hd0,0)/boot/grub/menu.lst" or you can provide the content of a menu.lst stored in dom0 by passing it as a ramdisk: ramdisk = "/boot/domU-1-menu.lst" or you can also use a tftp path (dhcp will be automatically performed): extra = "(nd)/somepath/menu.lst" or you can set it in option 150 of your dhcp server and leave extra and ramdisk empty (dhcp will be automatically performed) Limitations =========== - You can not boot a 64bit kernel with a 32bit-compiled PV-GRUB and vice-versa. To cross-compile a 32bit PV-GRUB, export XEN_TARGET_ARCH=x86_32 - bootsplash is supported, but the ioemu backend does not yet support restart for use by the booted kernel. - PV-GRUB doesn't support virtualized partitions. For instance: disk = [ 'phy:hda7,hda7,w' ] will be seen by PV-GRUB as (hd0), not (hd0,6), since GRUB will not see any partition table. Your own stubdom ================ By running cd stubdom/ make c-stubdom or cd stubdom/ make caml-stubdom you can compile examples of C or caml stub domain kernels. You can use these and the relevant Makefile rules as basis to build your own stub domain kernel. Available libraries are libc, libxc, libxs, zlib and libpci.