1Xen Performance Monitor 2----------------------- 3 4The xenmon tools make use of the existing xen tracing feature to provide fine 5grained reporting of various domain related metrics. It should be stressed that 6the xenmon.py script included here is just an example of the data that may be 7displayed. The xenbake demon keeps a large amount of history in a shared memory 8area that may be accessed by tools such as xenmon. 9 10For each domain, xenmon reports various metrics. One part of the display is a 11group of metrics that have been accumulated over the last second, while another 12part of the display shows data measured over 10 seconds. Other measurement 13intervals are possible, but we have just chosen 1s and 10s as an example. 14 15 16Execution Count 17--------------- 18 o The number of times that a domain was scheduled to run (ie, dispatched) over 19 the measurement interval 20 21 22CPU usage 23--------- 24 o Total time used over the measurement interval 25 o Usage expressed as a percentage of the measurement interval 26 o Average cpu time used during each execution of the domain 27 28 29Waiting time 30------------ 31This is how much time the domain spent waiting to run, or put another way, the 32amount of time the domain spent in the "runnable" state (or on the run queue) 33but not actually running. Xenmon displays: 34 35 o Total time waiting over the measurement interval 36 o Wait time expressed as a percentage of the measurement interval 37 o Average waiting time for each execution of the domain 38 39Blocked time 40------------ 41This is how much time the domain spent blocked (or sleeping); Put another way, 42the amount of time the domain spent not needing/wanting the cpu because it was 43waiting for some event (ie, I/O). Xenmon reports: 44 45 o Total time blocked over the measurement interval 46 o Blocked time expressed as a percentage of the measurement interval 47 o Blocked time per I/O (see I/O count below) 48 49Allocation time 50--------------- 51This is how much cpu time was allocated to the domain by the scheduler; This is 52distinct from cpu usage since the "time slice" given to a domain is frequently 53cut short for one reason or another, ie, the domain requests I/O and blocks. 54Xenmon reports: 55 56 o Average allocation time per execution (ie, time slice) 57 o Min and Max allocation times 58 59I/O Count 60--------- 61This is a rough measure of I/O requested by the domain. The number of page 62exchanges (or page "flips") between the domain and dom0 are counted. The 63number of pages exchanged may not accurately reflect the number of bytes 64transferred to/from a domain due to partial pages being used by the network 65protocols, etc. But it does give a good sense of the magnitude of I/O being 66requested by a domain. Xenmon reports: 67 68 o Total number of page exchanges during the measurement interval 69 o Average number of page exchanges per execution of the domain 70 71 72Usage Notes and issues 73---------------------- 74 - Start xenmon by simply running xenmon.py; The xenbake demon is started and 75 stopped automatically by xenmon. 76 - To see the various options for xenmon, run xenmon -h. Ditto for xenbaked. 77 - xenmon also has an option (-n) to output log data to a file instead of the 78 curses interface. 79 - NDOMAINS is defined to be 32, but can be changed by recompiling xenbaked 80 - Xenmon.py appears to create 1-2% cpu overhead; Part of this is just the 81 overhead of the python interpreter. Part of it may be the number of trace 82 records being generated. The number of trace records generated can be 83 limited by setting the trace mask (with a dom0 Op), which controls which 84 events cause a trace record to be emitted. 85 - To exit xenmon, type 'q' 86 - To cycle the display to other physical cpu's, type 'c' 87 - The first time xenmon is run, it attempts to allocate xen trace buffers 88 using a default size. If you wish to use a non-default value for the 89 trace buffer size, run the 'setsize' program (located in tools/xentrace) 90 and specify the number of memory pages as a parameter. The default is 20. 91 - Not well tested with domains using more than 1 virtual cpu 92 - If you create a lot of domains, or repeatedly kill a domain and restart it, 93 and the domain id's get to be bigger than NDOMAINS, then xenmon behaves badly. 94 This is a bug that is due to xenbaked's treatment of domain id's vs. domain 95 indices in a data array. Will be fixed in a future release; Workaround: 96 Increase NDOMAINS in xenbaked and rebuild. 97 98Future Work 99----------- 100o RPC interface to allow external entities to programmatically access processed data 101o I/O Count batching to reduce number of trace records generated 102 103Case Study 104---------- 105We have written a case study which demonstrates some of the usefulness of 106this tool and the metrics reported. It is available at: 107http://www.hpl.hp.com/techreports/2005/HPL-2005-187.html 108 109Authors 110------- 111Diwaker Gupta <diwaker.gupta@hp.com> 112Rob Gardner <rob.gardner@hp.com> 113Lucy Cherkasova <lucy.cherkasova.hp.com> 114 115