In the five years since the libvirt project started, alot has changed. The size of the libvirt API has increased dramatically; the number of languages you can access the API from has likewise grown to cover most important targets; libvirt has been translated to fit into several other object models; plugins have been developed to bind libvirt to other tools. At the same time many other libraries have grown up alongside libvirt, not least libguestfs, gtk-vnc and more recently spice-gtk. Together all these pieces provide a rich software development platform for people building virtualization management applications. A picture is worth 1000 words, so to keep this blog post short, here is the way I visualize the pieces in the virtualization tools platform, and a selection of the applications built on it (click to enlarge the image)

The base layer
- libvirt: the core hypervisor agnostic management API, coring virtual machines, host devices, networking, storage, security and more
- libvirt-qemu: a small set of QEMU specific APIs, such as the ability to talk to the QEMU monitor, or attach to externally launched QEMU guests. This library builds on top libvirt.
- libguestfs: the library for manipulating and accessing the contents of guest filesystem images. This uses libvirt for some actions internally. libguestfs has its own huge set of language bindings which are not shown in the diagram, for the sake of clarity. It will also soon be gaining a mapping into the GObject type system, which will help it play nicely with other GObject based APIs here.
Language bindings
The language bindings for libvirt aim to be a 1-for-1 export of the libvirt C API into the corresponding language. They generally don’t attempt to change the way the libvirt API looks or is structured. There is generally completely interoperability between all language bindings, so you can trivially have part of your application written in Perl and another part written in Java and play nicely together.
- libvirt-ocaml: a binding into the OCaml functional language
- libvirt-php: a binding into the PHP scripting language
- libvirt-perl: a binding into the Perl scripting language
- libvirt-python: a binding into the Python scripting language, which comes as a standard part of the libvirt package
- libvirt-java: a binding into the Java object language
- libvirt-ruby: a binding into the Ruby scripting language
- libvirt-csharp: a binding into the C# object language
Object mappings
The object mappings are distinct from language bindings, because they will often significantly change the structure of the libvirt API to fit in the requirement of the object system being targeted. Depending on the object systems involved, this translation might be lossless, thus an application generally has to pick one object system & stick with it. It is not a good idea to do a mixture of SNMP and QMF calls from the same application.
- libvirt-snmp: an agent for SNMP that translates from an SNMP MIB to libvirt API calls.
- libvirt-cim: an agent for CIM the translates from the DMTF virtualization schema to the libvirt API
- libvirt-qmf: an agent for Matahari that translates from a QMF schema to the libvirt API
Infrastructure plugins
Many common infrastructure applications can be extended by adding plugins for new functionality. This particularly common with network monitoring or performance collection applications. libvirt can of course be used to create plugins for such applications
- libvirt-collectd: a plugin for collectd that reports statistics on virtual machines
- libvirt-munin: a plugin for collectd that reports statistics on virtual machines
- libvirt-nagios: a plugin for nagious that reports where virtual machines are running
- fence-virt: a plugin for clustering software to allow virtual machines to be “fenced”
GObject layer
The development of a set of GObject based libraries came about after noticing that many users of the basic libvirt API were having to solve similar problems over & over. For example, every application wanted some programmatic way to extract info from XML documents. Many applications wanted libvirt translated into GObjects. Many applications needed a way to determine optimal hardware configuration for operating systems. The primary reasons for choosing to use GObject as the basis for these APIs was first to facilitate development of graphical desktop applications. With the advent of GObject Introspection, the even more compelling reason is that you get language bindings to all GObject libraries for free. Contrary to popular understanding, GObject is not solely for GTK based desktop applications. It is entirely independent of GTK and can be easily used from any conceivable application. If libvirt were to be started from scratch again today, it would probably go straight for GObject as the basis for the primary C library. It is that compelling.
- libosinfo: an API for managing metadata related to operating systems. It includes a database of operating systems with details such as common download URLs, magic byte sequences to identify ISO images, lists of supported hardware. In addition there is a database of hypervisors and their supported hardware. The API allows applications to determine the optimal virtual hardware configuration for deployment of an operating system on a particular hypervisor.
- gvnc: an API providing a client for the RFB protocol, used for VNC servers. The API facilitates the creation of new VNC client applications.
- spice-glib: an API providing a client for the SPICE protocol, used for SPICE servers. The API facilitates the creation of new SPICE client applications.
- libvirt-glib: an API binding the libvirt event loop into the GLib main loop, and translating libvirt errors into GLib errors.
- libvirt-gconfig: an API for generating and manipulating libvirt XML documents. It removes the need for application programmers to directly deal with raw XML themselves.
- libvirt-gobject: an API which translates the libvirt object model, also integrating them with the lbivirt-gconfig APIs.
- libvirt-sandbox: an API for building application sandboxes using virtualization technology.
GTK layer
- gtk-vnc: an API building on gvnc providing a GTK widget which acts as a VNC client. This is used in both virt-manager & virt-viewer
- spice-gtk: an API building on spice-glib providing a GTK widget which acts as a SPICE client. This is used in both virt-manager & virt-viewer
Applications
- python-virtinst: provides the original python virt-install command line tool, as well as a python API which is leveraged by virt-manager. The python-virtinst internal API was the motivation behind the libosinfo library and libvirt-gconfig library
- virt-manager: provides a general purpose desktop application for interacting with libvirt managed virtualization hosts. The virt-manager internal API was the motivation behind the libvirt-gobject library
- oVirt: the umbrella project for building an open source virtualized data center management application. Its VDSM component uses the libvirt python language bindings for managing KVM hosts
- OpenStack: the umbrella project for building an open source cloud management application. Its Nova component uses the libvirt python language bindings for managing KVM, Xen and LXC hosts.
- GNOME Boxes: the new GNOME desktop application for running virtual machines and accessing remote desktops. It uses libirt-gobject, libosinfo, gtk-vnc & spice-gtk via automatically generated vala bindings.
The Future
- Get oVirt, OpenStack, python-virtinst and virt-manager using the libosinfo library to centralize definitions of what hardware config to use for deploying operating systems
- Get oVirt & OpenStack using the libvirt-gconfig library to generate configuration, instead of building XML documents up through string concatenation
- Convert python-virtinst & virt-manager to use the libvirt-gconfig, libvirt-gobject libraries instead of their private internal equivalents
- Create a remote-viewer library which pulls in both gtk-vnc and spice-gtk in a higher level framework. This is essentially pulling the commonality out of virt-viewer, virt-manager and GNOME boxes use of gtk-vnc and spice-gtk.
- Create a libvirt-install library which provides APIs for provisioning operating systems. This would be pulling out commonality between the way python-virtinst, GNOME boxes and other applications deploy new operating systems. This would be a bridge layer between libosinfo and libvirt-gobject
There is undoubtably plenty of stuff I left out of this diagram & description. For example there are many other data center & cloud management projects that are based on libvirt, which I left out for clarity. There are plenty more libvirt plugins for other applications too, many I will never have heard about. No doubt our future plans will change too, as we adapt to new information. This should have given a good overview of how broad the open source virtualization tools software development ecosystem has become.
The general goal of the libvirt project is to provide an API definition and XML schema that is independent of any one hypervisor technology. This means that for every feature in KVM, we have to take a little time to carefully design a suitable API or XML schema to expose in libvirt, if none already exists. Sometimes we are lucky and features in KVM can be expressed in almost the same way in libvirt, but more often than not, this does not hold true – the design for libvirt may look somewhat different. For example, several KVM monitor commands at the KVM level may be exposed as a single API call in libvirt. Or conversely, several different libvirt APIs may all end up invoking the same underlying KVM monitor commands. The need to put careful thought into libvirt API & XML design means that there may be a short delay between a feature appearing in KVM, and it appearing in libvirt, though it can also be the other way around, where libvirt has a feature but KVM doesn’t provide a way to support it yet!
For many applications / developers this delay is a non-issue, since they don’t need to be on the absolute bleeding edge of development of KVM or libvirt, but there are always exceptions to the rule. For a long time, we did not want to enable direct use of hypervisor specific features at all via libvirt, because of the support implications of doing so. A little over a year ago though, we decide to change our position on this matter. The key to this change in policy was deciding how to clearly demarcate functionality which is long term supportable, vs that which is not.
KVM custom command line arguments
To support custom command line argument passthrough for KVM, we decided to introduce a new set of XML elements in a different XML namespace. This namespace is defined:
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'
Our policy is that any guest configuration that uses this QEMU XML namespace is not guaranteed to continue working if either libvirt or KVM are upgraded. A change to the way libvirt generates command line arguments may break the arguments the app has passed through. Alternatively KVM itself may drop or change the semantics of an existing command line argument. In other words applications should not rely on this capability long term, rather they should raise an RFE against libvirt to support it via the primary XML namespace, and just use the QEMU namespace until the RFE is complete.
The QEMU namespace defines syntax for passing arbitrary command line arguments, along with arbitrary environment variables:
<domain type='qemu' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>QEMUGuest1</name>
<uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
...
<devices>
<emulator>/usr/bin/qemu</emulator>
<disk type='block' device='disk'>
<source dev='/dev/HostVG/QEMUGuest1'/>
<target dev='hda' bus='ide'/>
</disk>
...
</devices>
<qemu:commandline>
<qemu:arg value='-unknown'/>
<qemu:arg value='parameter'/>
<qemu:env name='NS' value='ns'/>
<qemu:env name='BAR'/>
</qemu:commandline>
</domain>
Remember that when adding the <qemu:commandline> element, you need to always declare the XML namespace ‘xmlns:qemu’ on the top level <domain> element.
KVM monitor command passthrough
To support custom monitor command passthrough for KVM, we decided to introduce a second ELF library, libvirt-qemu.so, and separate header file /usr/include/libvirt/libvirt-qemu.h. An application that uses APIs in libvirt-qemu.so is not guaranteed to continue working if either libvirt or KVM are upgraded. A change to the way libvirt manages guests may conflict with the monitor commands the app is trying to issue. Alternatively KVM itself may drop or change the semantics of an existing monitor commands. In other words applications should not rely on this capability long term, rather they should raise an RFE against libvirt to support it via the primary library API, and just use libvirt-qemu.so until the RFE is complete.
Currently the libvirt-qemu.h defines two custom APIs
typedef enum {
VIR_DOMAIN_QEMU_MONITOR_COMMAND_DEFAULT = 0,
VIR_DOMAIN_QEMU_MONITOR_COMMAND_HMP = (1 << 0), /* cmd is in HMP */
} virDomainQemuMonitorCommandFlags;
int virDomainQemuMonitorCommand(virDomainPtr domain, const char *cmd,
char **result, unsigned int flags);
virDomainPtr virDomainQemuAttach(virConnectPtr domain,
unsigned int pid,
unsigned int flags);
The first allows passthrough of arbitrary monitor commands, while the latter allows attachment to an existing QEMU instance as discussed previously. The monitor command API is quite straighforward, it accepts a string command, and returns a string reply. The data for the command/reply can be either in HMP or QMP syntax, depending on how QEMU was launched by libvirt. The VIR_DOMAIN_QEMU_MONITOR_COMMAND_HMP flag allows an application to force use of the HMP syntax at all times.
Using monitor command passthrough from virsh
Not all users will be writing directly to the libvirt API, so the monitor command passthrough is also wired up into virsh via the “qemu-monitor-command” API. First is an example using QMP (JSON syntax):
$ virsh qemu-monitor-command vm-vnc '{ "execute": "query-block"}'
{"return":[{"device":"drive-virtio-disk0","locked":false,"removable":false,"inserted":{"ro":false,"drv":"qcow2","encrypted":false,"file":"/home/berrange/VirtualMachines/plain.qcow"},"type":"unknown"}],"id":"libvirt-9"}
And second is an example demonstrating use of HMP with a guest that runs QMP (libvirt automagically redirects via the ‘human-monitor-command’ command)
$ virsh qemu-monitor-command --hmp vm-vnc 'info block'
drive-virtio-disk0: removable=0 file=/home/berrange/VirtualMachines/plain.qcow ro=0 drv=qcow2 encrypted=0
Tainting of guests
Anyone familiar with the kernel will know that it marks itself as tainted whenever the user does something that is outside the boundaries of normal support. We have borrowed this idea from the kernel and apply it to guests run by libvirt too. Any attempt to use either the command line argument passthrough via XML, or QEMU monitor command passthrough via libvirt-qemu.so will result in the guest domain being marked as tainted. This shows up in the libvirt log files. For example after that last example, $HOME/.libvirt/qemu/log/vm-vnc.log shows the following
Domain id=2 is tainted: custom-monitor
This allows OS distro support staff to determine if something unusal has been done to a guest when they see support tickets raised. Depending on the OS distro’s support policy they may decline to support problem arising from tainted guests. In RHEL for example, any usage of QEMU monitor command passthrough, or command line argument passthrough is outside the bounds of libvirt support, and users would normally be asked to try to reproduce any problem without a tainted guest.
A couple of releases back I completely re-structured all the RPC handling code inside libvirt to make sure it could be properly shared between the client and server, as well as decoupling the RPC handling code from the implementation of the RPC functions. As part of this work I introduced a fairly comprehensive set of DTrace static probe points into the libvirt RPC code. While one could write a WireShark plugin that is able to decode the libvirt RPC protocol (oh look Michal already has written one), that would not be able to examine encrypted libvirt connections – which is pretty much all of them. By using static probes in the libvirt RPC code we can see the RPC messages being sent and received before/after encryption has been applied.
The observant will notice that I said I inserted DTrace static probes, while this blog subject line says SystemTAP. Well the SystemTAP developers had the good sense to make their userspace probing infrastructure support the DTrace static probe marker syntax. So inserting DTrace static probes into userspace code, trivially enables support for both DTrace and SystemTAP. I previously added DTrace probe support to QEMU/KVM and was very happy when Bryan Cantrill told me (at the recent KVM forum) that the DTrace probe support I added to KVM only needed minor build system tweaks to work on Solaris, despite my only ever having tested with Linux + SystemTAP.
Along with adding the DTrace markers to the libvirt RPC code, I also created two SystemTAP tapset files to make it simpler to use the probes from SystemTAP scripts. The first, /usr/share/systemtap/tapset/libvirt_probe.stp, contains the actual probe points, grouped by functional area, while the second, /usr/share/systemtap/tapset/libvirt_functions.stp, contains a bunch of helper functions for converting enum values into human friendly strings. The idea is that instead of seeing “Procedure 53″, a sysadmin would much rather see “Procedure domain_dump_core”. I won’t go into detail about what is in those two files here, instead I’ll just illustrate their use
Tracing the RPC client
Lets says we first want to see what messages the client is sending and receiving. There are two interesting probes here, “libvirt.rpc.server_client_msg_rx” and “libvirt.rpc.server_client_msg_tx_queue“. The former is triggered when a complete RPC message has been read off the wire, while the latter is triggered when an RPC message is queued for transmission. Ideally we would also have another probe triggered when an RPC message has been completely transmitted – that’s a future todo item. Simple usage of these two probes would be
# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
printf("client=%p len=%d program=%d version=%d procedure=%d type=%d status=%d serial=%d\n",
client, len, prog, vers, proc, type, status, serial);
}
probe libvirt.rpc.client_msg_tx_queue {
printf("client=%p len=%d program=%s version=%d procedure=%s type=%s status=%d serial=%d\n",
client, len, prog, vers, proc, type, status, serial);
}
EOF
# stap demo.stp
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=66 type=0 status=0 serial=0
client=0x7f827c3b1010 len=36 program=536903814 version=1 procedure=66 type=1 status=0 serial=0
client=0x7f827c3b1010 len=40 program=536903814 version=1 procedure=1 type=0 status=0 serial=1
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=1 type=1 status=0 serial=1
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=110 type=0 status=0 serial=2
client=0x7f827c3b1010 len=48 program=536903814 version=1 procedure=110 type=1 status=0 serial=2
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=2 type=0 status=0 serial=3
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=2 type=1 status=0 serial=3
The example shows the results of running “virsh domname vm1″. There are 4 RPC calls made here, 66 (authenticate), 1 (open), 110 (get uri), 2 (close).
Tracing the client with friendly output
Unless you have memorized libvirt RPC enums, this isn’t a very friendly way to trace the code. This is where the aforementioned libvirt_functions.stp tapset comes into play.
# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
printf("R client=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
client, len,
libvirt_rpc_program_name(prog, 0),
vers,
libvirt_rpc_procedure_name(prog, vers, proc, 0),
libvirt_rpc_type_name(type, 0),
libvirt_rpc_status_name(status, 0),
serial);
}
probe libvirt.rpc.client_msg_tx_queue {
printf("T client=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
client, len,
libvirt_rpc_program_name(prog, 0),
vers,
libvirt_rpc_procedure_name(prog, vers, proc, 0),
libvirt_rpc_type_name(type, 0),
libvirt_rpc_status_name(status, 0),
serial);
}
EOF
# stap demo.stp
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
R client=0x7f3e3dec0010 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
T client=0x7f3e3dec0010 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
R client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
R client=0x7f3e3dec0010 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
R client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3
Much more friendly !
Tracing the server at the same time
It might desirable to see when the server itself receives the message, independently of when the client transmitted it. There are an identical set of probes available in the server, just replace ‘client’ with ‘server_client’ in the above examples. Thus the demo script can trivially be extended to show server messages at the same time:
# cat >> demo.stp << EOF
probe libvirt.rpc.server_client_msg_rx {
printf("R server=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
client, len,
libvirt_rpc_program_name(prog, 0),
vers,
libvirt_rpc_procedure_name(prog, vers, proc, 0),
libvirt_rpc_type_name(type, 0),
libvirt_rpc_status_name(status, 0),
serial);
}
probe libvirt.rpc.server_client_msg_tx_queue {
printf("T server=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
client, len,
libvirt_rpc_program_name(prog, 0),
vers,
libvirt_rpc_procedure_name(prog, vers, proc, 0),
libvirt_rpc_type_name(type, 0),
libvirt_rpc_status_name(status, 0),
serial);
}
# stap demo.stp
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
R server=0x17a2070 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
T server=0x17a2070 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
R client=0x7ff3c4855010 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
T client=0x7ff3c4855010 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
R server=0x17a2070 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
T server=0x17a2070 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
R client=0x7ff3c4855010 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
R server=0x17a2070 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
T server=0x17a2070 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
R client=0x7ff3c4855010 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
R server=0x17a2070 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
T server=0x17a2070 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3
R client=0x7ff3c4855010 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3
If the server is running on a different host than the client, just copy the demo.stp script to the other host and run a second copy there.
Further extensions
There are many further improvements that can be made to this script
- Display a timestamp on each message
- Associate each server side message with an individual socket
- Display payload length
- Display a message when the script is actually ready to run
To simplify life, we are maintaining a nice feature demonstration of the RPC SystemTAP probes in the libvirt GIT repository in the
examples/systemtap/rpc-monitor.stp file.
Here is what it can print out
0.000 begin
2.632 C + 0x7f1ea57dc010 local=127.0.0.1;0 remote=127.0.0.1;0
2.632 C > 0x7f1ea57dc010 msg=remote.1.auth_list(call, ok, 0) len=28
2.632 + S 0x1c1f710 local=127.0.0.1;0 remote=127.0.0.1;0
2.632 > S 0x1c1f710 msg=remote.1.auth_list(call, ok, 0) len=28
2.633 < S 0x1c1f710 msg=remote.1.auth_list(reply, ok, 0) len=36
2.633 C < 0x7f1ea57dc010 msg=remote.1.auth_list(reply, ok, 0) len=36 2.633 C > 0x7f1ea57dc010 msg=remote.1.open(call, ok, 1) len=40
2.633 > S 0x1c1f710 msg=remote.1.open(call, ok, 1) len=40
2.639 < S 0x1c1f710 msg=remote.1.open(reply, ok, 1) len=28
2.639 C < 0x7f1ea57dc010 msg=remote.1.open(reply, ok, 1) len=28 2.639 C > 0x7f1ea57dc010 msg=remote.1.get_uri(call, ok, 2) len=28
2.639 > S 0x1c1f710 msg=remote.1.get_uri(call, ok, 2) len=28
2.639 < S 0x1c1f710 msg=remote.1.get_uri(reply, ok, 2) len=48
2.640 C < 0x7f1ea57dc010 msg=remote.1.get_uri(reply, ok, 2) len=48 2.640 C > 0x7f1ea57dc010 msg=remote.1.domain_lookup_by_id(call, ok, 3) len=32
2.640 > S 0x1c1f710 msg=remote.1.domain_lookup_by_id(call, ok, 3) len=32
2.640 < S 0x1c1f710 msg=remote.1.domain_lookup_by_id(reply, error, 3) len=180
2.641 C < 0x7f1ea57dc010 msg=remote.1.domain_lookup_by_id(reply, error, 3) len=180 2.641 C > 0x7f1ea57dc010 msg=remote.1.close(call, ok, 4) len=28
2.641 > S 0x1c1f710 msg=remote.1.close(call, ok, 4) len=28
2.641 < S 0x1c1f710 msg=remote.1.close(reply, ok, 4) len=28
2.641 C < 0x7f1ea57dc010 msg=remote.1.close(reply, ok, 4) len=28
2.641 C - 0x7f1ea57dc010 local= remote=
2.641 - S 0x1c1f710 local=127.0.0.1;0 remote=127.0.0.1;0
Tracing other areas of libvirt code
The RPC code is not the only place with SystemTAP/DTrace probe markers in libvirt. We have also instrumented our main event loop and provide an examples/systemtap/events.stp demo that prints out info like this
0.000 begin
2.359 18185 + handle 1 4 1
2.360 18185 + handle 2 6 1
2.360 18185 * handle 2 0
2.360 14370 > handle 3 1
2.360 14370 + handle 33 16 1
2.361 14370 ~ 7 -1
2.361 14370 > handle 33 1
2.361 14370 * handle 33 1
2.361 14370 * handle 33 1
2.361 14370 * handle 33 3
2.361 14370 ~ 7 -1
2.361 14370 > handle 1 1
2.361 14370 ~ 7 -1
2.361 14370 > handle 33 2
2.361 14370 * handle 33 1
2.361 14370 ~ 7 -1
2.361 18185 * handle 2 1
2.362 18185 * handle 2 0
And finally we have instrumented our code which talks to the QEMU monitor, again providing a demo examples/systemtap/qemu-monitor.stp which prints out info like this
0.000 begin
3.848 ! 0x7f2dc00017b0 {"timestamp": {"seconds": 1319466931, "microseconds": 187755}, "event": "SHUTDOWN"}
5.773 > 0x7f2dc0007960 {"execute":"qmp_capabilities","id":"libvirt-1"}
5.774 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-1"} 5.774 > 0x7f2dc0007960 {"execute":"query-commands","id":"libvirt-2"}
5.777 < 0x7f2dc0007960 {"return": [{"name": "quit"}, {"name": ....snip.... 5.777 > 0x7f2dc0007960 {"execute":"query-chardev","id":"libvirt-3"}
5.778 < 0x7f2dc0007960 {"return": [{"filename": ....snip.... 5.779 > 0x7f2dc0007960 {"execute":"query-cpus","id":"libvirt-4"}
5.780 < 0x7f2dc0007960 {"return": [{"current": true, "CPU": 0, "pc": 1048560, "halted": false, "thread_id": 13299}], "id": "libvirt-4"} 5.780 > 0x7f2dc0007960 {"execute":"set_password","arguments":{"protocol":"vnc","password":"123456","connected":"keep"},"id":"libvirt-5"}
5.782 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-5"} 5.782 > 0x7f2dc0007960 {"execute":"expire_password","arguments":{"protocol":"vnc","time":"never"},"id":"libvirt-6"}
5.783 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-6"} 5.783 > 0x7f2dc0007960 {"execute":"balloon","arguments":{"value":224395264},"id":"libvirt-7"}
5.785 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-7"} 5.785 > 0x7f2dc0007960 {"execute":"cont","id":"libvirt-8"}
5.789 ! 0x7f2dc0007960 {"timestamp": {"seconds": 1319466933, "microseconds": 129980}, "event": "RESUME"}
5.789 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-8"}
7.537 ! 0x7f2dc0007960 {"timestamp": {"seconds": 1319466934, "microseconds": 881214}, "event": "SHUTDOWN"}
Conclusion
The introduction of static probes into the libvirt code has been enormously helpful in understanding the operation of libvirt. While we have comprehensive debug logging present in libvirt is it hard to tailor the output to show the precise data desired. Traditional debuggers like GDB are not very practical when trying to understand the live operation of a heavily multi-threaded system crossing multiple processes, and while strace is useful in some scenarios it is too low level to be useful in most scenarios. SystemTAP userspace probing provides the kind of debugging experience / tool that really suits understanding the complex interactions in a system like libvirt. It is no co-incidence that the first set of probes we have written have focused on the libvirt event loop, RPC code and QEMU monitor – three of the areas in libvirt which are both very critical operationally, and exceptionally hard to debug with traditional approaches. We will certainly be expanding our use of static probe markers in systemtap in the future. My real immediate wishlist is for systemtap to get better at providing userspace stack traces, since it fails to provide a useful trace far too often, as compared to GDB.
Update: Mark Wielaard showed me what I had todo to get nice stack traces from SystemTAP. Apparently it is not getting enough memory space to deal with stack traces with its default settings. Telling it to use a little more memory makes it work nicely:
# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
printf("client=%p len=%d program=%d version=%d procedure=%d type=%d status=%d serial=%d\n",
client, len, prog, vers, proc, type, status, serial);
print_ustack(ubacktrace())
}
# stap -DTASK_FINDER_VMA_ENTRY_ITEMS=7680 demo.stp
client=0x7f775cf62010 len=36 program=536903814 version=1 procedure=66 type=1 status=0 serial=0
0x3c57f0b3dd : virNetClientIOHandleInput+0x87d/0x890 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0b9a0 : virNetClientIOEventLoop+0x5b0/0x630 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0cb23 : virNetClientSend+0x2b3/0x590 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0d47c : virNetClientProgramCall+0x26c/0x8a0 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ef091e : callWithFD+0xce/0x120 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ef099c : call+0x2c/0x40 [/usr/lib64/libvirt.so.0.9.7]
0x3c57efee80 : doRemoteOpen+0x890/0x20f0 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0341b : remoteOpen+0x9b/0x290 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ec2133 : do_open+0x1f3/0x1100 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ec4616 : virConnectOpenAuth+0x76/0xb0 [/usr/lib64/libvirt.so.0.9.7]
0x40ceb1 [/usr/bin/virsh+0xceb1/0x40000]
client=0x7f775cf62010 len=28 program=536903814 version=1 procedure=1 type=1 status=0 serial=1
0x3c57f0b3dd : virNetClientIOHandleInput+0x87d/0x890 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0b9a0 : virNetClientIOEventLoop+0x5b0/0x630 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0cb23 : virNetClientSend+0x2b3/0x590 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0d47c : virNetClientProgramCall+0x26c/0x8a0 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ef091e : callWithFD+0xce/0x120 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ef099c : call+0x2c/0x40 [/usr/lib64/libvirt.so.0.9.7]
0x3c57eff57a : doRemoteOpen+0xf8a/0x20f0 [/usr/lib64/libvirt.so.0.9.7]
0x3c57f0341b : remoteOpen+0x9b/0x290 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ec2133 : do_open+0x1f3/0x1100 [/usr/lib64/libvirt.so.0.9.7]
0x3c57ec4616 : virConnectOpenAuth+0x76/0xb0 [/usr/lib64/libvirt.so.0.9.7]
0x40ceb1 [/usr/bin/virsh+0xceb1/0x40000]
....
This makes me very happy :-)
The historical philosophy of libvirt is for all our core libraries to be written in C and then create bindings to other programming languages or mappings to alternative object models. Thus far we have bindings to Python, Perl, Ruby, OCaml, Php, C#, Java and mappings to the QMF (Matahari), CIM and SNMP object models. The virt-install and virt-manager applications use the python binding to libvirt, but even very early in development of virt-manager it was clear that the libvirt python API is not a natural fit for an application using GTK, since it does not integrate with GObject and in particular GObject signals. Thus virt-manager wraps the libvirt python objects to create real GObjects it then works with. This has been quite successful, but because all the virt-manager code is in python other applications have not been able to take advantage of the higher level libvirt API virt-manager has evolved. In addition the virt-install code (which is called internally by virt-manager) contains a set of Python objects which represent the various libvirt XML schemas as plain old objects with properties and setters/getters. If you’ve developed applications against libvirt, you’ll likely appreciate just how useful such an API would be. Again though, because the API is in Python and (technically) internal to the virt-install codebase, it is not accessible to many other applications
There was clearly space for an independent library mapping the libvirt API and XML schemas to GObject, which could then be used by any application. The task of creating a libvirt GObject library API is large enough, without considering the task of also ensuring it is accessible from all the non-C programming languages. Fortunately, with the release of GNOME-3, GObject introspection has now matured to the point where it can really be used in anger for real application development. The upshot is that it is now feasible to attempt development of a proper libvirt GObject API.
The libvirt-glib package is the result, and it actually contains three related libraries
- libvirt-gib – non-object based glue code between GLib and libvirt. In particular this has APIs to convert libvirt virErrorPtr instances into GError instances, and provides an implementation of the libvirt event loop contract, using the GLib GMain APIs.
- libvirt-gconfig – object based APIs which map libvirt XML documents/schemas into GObject classes. This library explicitly has no direct link to the libvirt API, solely concerning itself with XML management. This is to allow use of libvirt-gconfig from applications which are using one of the object mappings like QMF/CIM/SNMP, instead of the direct libvirt API. This where the current virt-install XML handling objects will be replicated
- libvirt-gobject – object based APIs which map libvirt types and APIs into GObject classes. This library depends on libirt-glib and libvirt-gconfig, and is where the current virt-manager object mapping APIs will be replicated. This library is also adopting the GIO paradigm for allowing asynchronous API invocation & completion, for long running applications. This eliminates much of the need for applications to explicitly use threads (thread usage is hidden behind the async API impl).
From day 1, all the APIs are being developed with GObject introspection in mind., so all methods are fully annotated, and we are generating the glue layer for Vala bindings as standard in order to support the GNOME Boxes application. It is still very early days for development and very little of the libvirt API has been mapped into GObject thus far and work is only just starting on the XML object mappings. The overall target, however, is to develop the library to the state where it can support the aforementioned GNOME Boxes application in Fedora 17, as well as an application sandbox framework I am developing for Fedora 17 (more on that in a later blog post).
For more information