Porting NetCF to Debian/Ubuntu, Suse and Windows

Posted: September 28th, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , , | 4 Comments »

The NetCF library provides a simple API for managing network interface configuration files. Libvirt has used NetCF for several releases now to provide the internal implementation of the virInterface APIs. This allows libvirt based applications to configure plain ethernet devices, bridging, bonding and VLANs, which covers all the network configurations required for a typical virtualization host. The problem is that nearly every single OS distro has a different format for its configuration files, and NetCF has only had an implementation that works with Fedora and RHEL. This has led to a perception that NetCF is a project that is “Red Hat specific”, which is not at all the intention when NetCF was created. To try to address this problem, I have spent the last couple of weeks hacking on a driver for NetCF that knows how to deal with the Debian /etc/network/interfaces file format. As of last night, I pushed the code into the upstream NetCF git repository, so Debian & Ubuntu users have something nice to try out in the next release. Indeed, it would be good if any interested persons were to try out the latest NetCF GIT code before the next release to make sure it works for someone other than myself :-)

In the course of porting to Debian, we also became aware that there was a port of NetCF to Suse distributions, found as a patch in the netcf RPM for OpenSuse 11. We have not had chance to test it ourselves yet, but on the assumption that it must have been at least partially functional when added to OpenSuse 11 RPMs, we have merged that patch into the latest NetCF GIT. If anyone is using Suse and wants to try it out and report what works & what doesn’t, we’d appreciate the feedback.  If someone wants to actually do further development work for the Suse driver to finish it off, that would be even better !

Finally, a few months ago, there was work on creating a Windows driver for NetCF. This was posted a few times to the NetCF mailing lists, but was never completed because the original submitter ran out of time to work on it. In the hope that it will be a useful starting point for other interested developers, we have also merged the most recent Windows patch into the NetCF GIT repository. This is by no means useful yet, only able to list interfaces and bring them up/down – it can’t report their config, or create new interfaces.

Supported Debian driver configurations

Back to the Debian driver now. The Debian /etc/network/interfaces configuration file is quite nicely structured and reasonably well documented, but one of the problems faced is that there is often more than one way to get to the same end result. To make the development of a Debian driver a tractable problem, I decided to pick one specific configuration approach for each desired network interface arrangement. So while NetCF should be able to read back any configuration that it wrote itself, it may not be able to correctly read arbitrary configurations that a user created manually. I expect that over time the driver will iteratively improve its configuration parsing to cope with other styles.

AFAICT, the Debian best practice for setting up vlans, bonding & bridging is to use the extra configuration syntax offered by certain add in packages, rather than custom post scripts. So for the NetCF Debian driver to work, it is necessary to have the following DPKGs installed:

  - ifenslave    (required for any bonding config)
  - bridge-utils (required for any bridging config)
  - vlan         (required for any vlan config)

Ethernet Devices


  auto lo
  iface lo inet loopback


  auto eth0
  iface eth0 inet dhcp

Static config:

  auto eth0
  iface eth0 inet static

No IP config

  auto eth0
  iface eth0 inet manual

Config with MTU / MAC addres

  auto eth0
  iface eth0 inet dhcp
     hwaddr ether 00:11:22:33:44:55
     mtu 150


With miimon

  iface bond0 inet dhcp
     bond_slaves eth1 eth2
     bond_primary eth1
     bond_mode active-backup
     bond_miimon 100
     bond_updelay 10
     bond_use_carrier 0

With arpmon

  iface bond2 inet dhcp
     bond_slaves eth6
     bond_primary eth6
     bond_mode active-backup
     bond_arp_interval 100


  auto eth0.42
  iface eth0.42 inet static
     vlan_raw_device eth0


With single interface and IP addr

  auto br0
  iface br0 inet static
     mtu 1500
     bridge_ports eth3
     bridge_stp off
     bridge_fd 0.01

With no IP addr

  auto br0
  iface br0 inet manual
     bridge_ports eth3
     bridge_stp off
     bridge_fd 0.01

With multiple interfaces

  auto br0
  iface br0 inet static
     mtu 1500
     bridge_ports eth3 eth4
     bridge_stp off
     bridge_fd 0.01

With no interface or addr

  auto br0
  iface br0 inet manual
     mtu 1500
     bridge_ports none
     bridge_stp off
     bridge_fd 0.01

Complex Bridging

Bridging a bond:

  auto br1
  iface br1 inet manual
     mtu 1500
     bridge_ports bond1
     bridge_stp off
     pre-up ifup bond1
     post-down ifdown bond1
  iface bond1 inet manual
     bond_slaves eth4
     bond_primary eth4
     bond_mode active-backup
     bond_miimon 100
     bond_updelay 10
     bond_use_carrier 0

Bridging a VLAN:

  auto br2
  iface br2 inet manual
     mtu 1500
     bridge_ports eth0.42
     bridge_stp off
  iface eth0.42 inet manual
     vlan_raw_device eth0


Static addressing, with multiple addresses:

  auto eth5
  iface eth5 inet6 static
     address 3ffe:ffff:0:5::1
     netmask 128
     pre-up echo 0 > /proc/sys/net/ipv6/conf/eth5/autoconf
     post-down echo 1 > /proc/sys/net/ipv6/conf/eth5/autoconf
     up /sbin/ifconfig eth5 inet6 add 3ffe:ffff:0:5::5/128
     down /sbin/ifconfig eth5 inet6 del 3ffe:ffff:0:5::5/128

Stateless autoconf

  auto eth5
  iface eth5 inet6 manual

DHCPv6 with autoconf

  auto eth5
  iface eth5 inet6 dhcp

DHCPv6 without autoconf

  auto eth5
  iface eth5 inet6 dhcp
     pre-up echo 0 > /proc/sys/net/ipv6/conf/eth5/autoconf
     post-down echo 1 > /proc/sys/net/ipv6/conf/eth5/autoconf

The most recent set of example configurations can be found in the documentation in GIT.

Getting started with LXC using libvirt

Posted: September 27th, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: , , | 26 Comments »

For quite a while now, libvirt has had an LXC driver that uses Linux’s namespace + cgroups features to provide container based virtualization. Before continuing I should point out that the libvirt LXC driver does not have any direct need for the userspace tools from the LXC sf.net project, since it directly leverages APIs the Linux kernel exposes to userspace. There are in fact many other potential users of the kernel’s namespace APIs which have their own userspace, such as OpenVZ, Linux-VServer, Parallels. This blog post will just concern itself solely with the native libvirt LXC support.

Connecting to the LXC driver

At this point in time, there is only one URI available for connecting to the libvirt LXC driver, lxc:///, which gets you a privileged connection. There is not yet any support for unprivileged libvirtd instances using containers, due to restrictions of the kernel’s DAC security models. I’m hoping this may be refined in the future.

If you’re familiar with using libvirt in combination with KVM, then it is likely you are just relying on libvirt picking the right URI by default. Well each host can only have one default URI for libvirt, and KVM will usually take precedence over LXC. You can discover what libvirt has decided the default URI:

# virsh uri

So when using tools like virsh you’ll need to specify the LXC URI somehow. The first way is to use the ‘-c URI’ or ‘–connect URI’ arguments that most libvirt based applications have:

# virsh -c lxc:/// uri

The second option is to explicitly override the default libvirt URI for your session using the LIBVIRT_DEFAULT_URI environment variable.

# export LIBVIRT_DEFAULT_URI=lxc:///
# virsh uri

For the sake of brevity, all the examples that follow will presume that export LIBVIRT_DEFAULT_URI=lxc:/// has been set.

A simple “Hello World” LXC container

The Hello World equivalent for LXC is probably a container which just runs /bin/sh with the main root filesystem / network interfaces all still being visible. What you’re gaining here is not security, but a rather way to manage resource utilization of everything spawned from that initial process. The libvirt LXC driver currently does most of its resource controls using cgroups, but will also leverage the network traffic shaper directly for network controls which you want to be done per virtual network interface, not per cgroup.

Anyone familiar with libvirt will know that to create a new guest, requires an XML document specifying its configuration. Machine based virtualization requires either a kernel/initrd or a virtual BIOS to boot, and can create a fullyvirutalized (hvm) or paravirtualized machine (xen). Container virtualization by contrast, just wants to know the path to the binary to spawn as the container’s “init” (aka process with PID 1). The virtualization type for containers is thus referred to in libvirt as “exe”. Aside from the virtualization type & path of the initial process, the only other required XML parameters are the guest name, initial memory limit and a text console device. Putting this together, creating the “Hello World” container will require an XML configuration that looks like this:

# cat > helloworld.xml <<EOF
<domain type='lxc'>
    <console type='pty'/>

This configuration can be imported into libvirt in the normal manner

# virsh define helloworld.xml
Domain helloworld defined from helloworld.xml

then started

# virsh start helloworld
Domain helloworld started
# virsh list
Id Name                 State
31417 helloworld           running

The ID values assigned by the libvirt LXC driver are in the process ID of the libvirt_lxc helper process libvirt launches. This helper is what actually creates the container, spawning the initial process, after which it just sits around handling console I/O. Speaking of the console, this can now be accessed with virsh

# virsh console helloworld
Connected to domain helloworld
Escape character is ^]

That ‘sh’ prompt is the shell process inside the container. All the container processes are visible outside the container as regular proceses

# ps -axuwf


root     31417  0.0  0.0  42868  1252 ?        Ss   16:17   0:00 /usr/libexec/libvirt_lxc --name helloworld --console 27 --handshake 30 --background
root     31418  0.0  0.0  13716  1692 pts/39   Ss+  16:17   0:00  \_ /bin/sh

Inside the container, PID numbers are distinct, starting again from ‘1’.

# virsh console helloworld
Connected to domain helloworld
Escape character is ^]
sh-4.2# ps -axuwf
root         1  0.0  0.0  13716  1692 pts/39   Ss   16:17   0:00 /bin/sh

The container will shutdown when the ‘init’ process exits, so in this example when ‘exit’ is run in the container’s bash shell. Alternatively issue the usual ‘virsh destroy’ to kill it off.

# virsh destroy helloworld
Domain helloworld destroyed

Finally remove its configuration

# virsh undefine helloworld
Domain helloworld has been undefined

Adding custom mounts to the “Hello World” container

The “Hello World” container shared the same root filesystem as the primary (host) OS. What if the application inside the container requires custom data in certain locations. For example, using containers to sandbox apache servers, might require a custom /etc/httpd and /var/www. This can easily be achieved by specifying one or more filesystem devices in the initial XML configuration. Lets create some custom locations to pass to the “Hello World” container.

# mkdir /export/helloworld/config
# touch /export/helloworld/config/hello.txt
# mkdir /export/helloworld/data
# touch /export/helloworld/data/world.txt

Now edit the helloworld.xml file and add in

<filesystem type='mount'>
  <source dir='/export/helloworld/config'/>
  <target dir='/etc/httpd'/>
<filesystem type='mount'>
  <source dir='/export/helloworld/data'/>
  <target dir='/var/www'/>

Now after defining and starting the container again, it should see the custom mounts

# virsh define helloworld.xml
Domain helloworld defined from helloworld.xml
# virsh start helloworld
Domain helloworld started
# virsh console helloworld
Connected to domain helloworld
Escape character is ^]
sh-4.2# ls /etc/httpd/
sh-4.2# ls /var/www/
sh-4.2# exit

# virsh undefine helloworld
Domain helloworld has been undefined

A private root filesystem with busybox

So far the container has shared the root filesystem with the host OS. This may be OK if the application running in the container is going to an unprivileged user ID and you are careful not to mess up your host OS. If you want todo things like running DHCP inside the container, or have things running as root, then you almost certainly want a private root filesystem in the container. In this example, we’ll use the busybox tools to setup the simplest possible private root for “Hello World”. First create a new directory and copy the busybox binary into position

mkdir /export/helloworld
cd /export/helloworld
mkdir -p bin var/www etc/httpd
cd bin
cp /sbin/busybox busybox
cd /root

Next step is to setup symlinks for all the busybox commands you intend to use. For example

for i in ls cat rm find ps echo date kill sleep \
         true false test pwd sh which grep head wget
  ln -s busybox /root/helloworld/bin/$i

Now all that is required, is to add another filesystem device to the XML configuration

<filesystem type='mount'>
  <source dir='/export/helloworld/root'/>
  <target dir='/'/>

With that added to the XML, follow the same steps to define and start the guest again

# virsh define helloworld.xml
Domain helloworld defined from helloworld.xml
# virsh start helloworld
Domain helloworld started

Now when accessing the guest console a completely new filesystem should be visible

# virsh console helloworld
Connected to domain helloworld
Escape character is ^]
# ls
bin      dev      etc      proc     selinux  sys      var
# ls bin/
busybox  echo     grep     ls       rm       test     which
cat      false    head     ps       sh       true
date     find     kill     pwd      sleep    wget
# cat /proc/mounts
rootfs / rootfs rw 0 0
devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0
/dev/mapper/vg_t500wlan-lv_root / ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0
devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0
devfs /dev tmpfs rw,seclabel,nosuid,relatime,mode=755 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,relatime 0 0
/sys /sys sysfs ro,seclabel,relatime 0 0
selinuxfs /selinux selinuxfs ro,relatime 0 0
devpts /dev/ptmx devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0
/dev/mapper/vg_t500wlan-lv_root /etc/httpd ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0
/dev/mapper/vg_t500wlan-lv_root /var/www ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0

Custom networking in the container

The examples thus far have all just inherited access to the host network interfaces. This may or may not be desirable. It is of course possible to configure private networking for the container. Conceptually this works in much the same way as with KVM. Currently it is possible to choose between libvirt’s bridge, network or direct networking modes, giving ethernet bridging, NAT/routing, or VEPA respectively. When configuring private networking, the host OS will get a ‘vethNNN’ device for each container NIC, and the container will see their own ‘ethNNN’  and ‘lo’ devices. The XML configuration additions are just the same as what’s required for KVM, for example

<interface type='network'>
  <mac address='52:54:00:4d:2b:cd'/>
  <source network='default'/>

Define and start the container as before, then compare the network interfaces in the container to what is in the host

# virsh console helloworld
Connected to domain helloworld
Escape character is ^]

# ifconfig
eth0      Link encap:Ethernet  HWaddr 52:54:00:16:61:DA
inet6 addr: fe80::5054:ff:fe16:61da/64 Scope:Link
RX packets:93 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:5076 (4.9 KiB)  TX bytes:468 (468.0 B)

lo        Link encap:Local Loopback
inet addr:  Mask:
inet6 addr: ::1/128 Scope:Host
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

We have a choice of configuring the guest eth0 manually, or just launching a DHCP client. To do manual configuration try

# virsh console helloworld
Connected to domain helloworld
Escape character is ^]
# ifconfig eth0
# route add gw eth0
# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default UGH   0      0        0 eth0   *        U     0      0        0 eth0
# ping
PING ( 56 data bytes
64 bytes from seq=0 ttl=64 time=0.786 ms
64 bytes from seq=1 ttl=64 time=0.157 ms
--- ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.157/0.471/0.786 ms

Am I running in an LXC container?

Some programs may wish to know if they have been launched inside a libvirt container. To assist them, the initial process is given two environment variables, LIBVIRT_LXC_NAME and LIBVIRT_LXC_UUID


An aside about CGroups and LXC

Every libvirt LXC container gets placed inside a dedicated cgroup, $CGROUPROOT/libvirt/lxc/$CONTAINER-NAME. Libvirt expects the memory, devices, freezer, cpu and cpuacct cgroups controllers to be mounted on the host OS. Work on leveraging cgroups inside LXC with libvirt is still ongoing, but there are already APIs to set/get memory and CPU limits, with networking to follow soon. This is could be a topic for a blog post on its own, so won’t be discussed further here.

An aside about LXC security, or lack thereof

You might think that since we can create a private root filesystem, it’d be cool to run an entire Fedora/RHEL OS in the container. I strongly caution against doing this. The DAC (discretionary access control) system on which LXC currently relies for all security is known to be incomplete and so it is entirely possible to accidentally/intentionally break out of the container and/or impose a DOS attack on the host OS. Repeat after me “LXC is not yet secure. If I want real security I will use KVM”. There is a plan to make LXC DAC more secure, but that is no where near finished. We also plan to integrate sVirt with LXC to so that MAC will mitigate holes in the DAC security model.

An aside about Fedora >= 15, SystemD and autofs

If you are attempting to try any of this on Fedora 16 or later, there is currently an unresolved problem with autofs that breaks much use of containers. The root problem is that we are unable to unmount autofs mount points after switching into the private filesystem namespace. Unfortunately SystemD uses autofs in its default configuration, for several type mounts. So if you find containers fail to start, then as a temporary hack you can try disabling all SystemD’s autofs mount points

for i in `systemctl --full | grep automount | awk '{print $1}'`
  systemctl stop $i

We hope to resolve this in a more satisfactory way in the near future.

The complete final example XML configuration

# cat helloworld.xml
<domain type='lxc'>
    <console type='pty'/>
    <filesystem type='mount'>
      <source dir='/export/helloworld/root'/>
      <target dir='/'/>
    <filesystem type='mount'>
      <source dir='/export/helloworld/config'/>
      <target dir='/etc/httpd'/>
    <filesystem type='mount'>
      <source dir='/export/helloworld/data'/>
      <target dir='/var/www'/>
    <interface type='network'>
      <source network='default'/>

Testing answers to questions about sVirt SELinux policy

Posted: September 27th, 2011 | Filed under: Fedora, libvirt, Virt Tools | 1 Comment »

Yesterday on the #virt@irc.oftc.net IRC channel there was a question asked about whether sVirt+SELinux would prevent two virtual machines running under the same user ID, from ptrace()ing each other. If no SELinux is involved, there is no DAC restriction ptrace() between two PIDs with the same UID. So this is clearly the kind of thing you would expect/want sVirt to block, and indeed it does. But how can you easily prove the policy blocks ptrace ? Enter the ‘runcon’ command, which lets you impersonate VMs.

NB, when trying out the following, you want SELinux to be in “permissive” mode, not “enforcing”, since the way we do some parts of the tests will trigger other AVCs which get in the way.

Under sVirt each QEMU process is given a dedicate security label, formed by combining the base label “system_u:system_r:svirt_t:s0” with a unique MCS level. So to test our belief about ptrace(), we need to have two security labels. Lets use these two


Now we want a process to act as the target VM to be ptrace()d. With the first SELinux label above, and “runcon” we can launch a confined QEMU process in the same way libvirtd would have done:

$ runcon system_u:system_r:svirt_t:s0:c12,c34 /usr/bin/qemu -vnc :1

‘ps’ can be used to verify that ‘qemu’ really is under the confined domain

$ ps -axuwZ | grep qemu
system_u:system_r:svirt_t:s0:c12,c34 berrange 29542 0.0  0.0 106680 460 pts/12 S+   14:32   0:00 qemu

Now we have the victim running, we can try launching an attacker. Since we’re looking to see if ptrace() is blocked, ‘strace‘ is a natural command to try out. For testing other attack vectors you might want to create a tiny dedicated program. Using the second security label, and ‘runcon‘ again, we can do

$ runcon system_u:system_r:svirt_t:s0:c56,c78 strace -p $PID-OF-VICTIM

Finally, we can look at the audit log for any AVC messages about the ‘ptrace’ access vector:

# grep AVC /var/log/audit/audit.log | grep ptrace
type=AVC msg=audit(1317130603.887:33048): avc: denied { ptrace } for pid=29644 comm="strace" scontext=system_u:system_r:svirt_t:s0:c56,c78 tcontext=system_u:system_r:svirt_t:s0:c12,c34 tclass=process

What this AVC is saying is that a process under the label “scontext=system_u:system_r:svirt_t:s0:c56,c78” tried to execute ptrace() against a process under the label “system_u:system_r:svirt_t:s0:c12,c34” and it was blocked.

This exactly what we wanted to see happen. We have now proved that 2 VMs as the same user ID can not ptrace() each other.

When I outlined this on IRC, there was a follow up question. Can the attacking VM just use ‘runcon’ to change its security label ? The answer is again no. Transitions between security labels must be explicitly allowed by the policy, and sVirt policy does not allow any such transitions for the svirt_t type. Again this can be demonstrated by using ‘runcon‘ by chaining together two invocations:

runcon system_u:system_r:svirt_t:s0:c56,c78 runcon system_u:system_r:svirt_t:s0:c12,c34 strace -p $PID-OF-VICTIM

And then looking for any AVC log message about the ‘transition’ access vector

# grep AVC /var/log/audit/audit.log | grep transition
type=AVC msg=audit(1317131267.839:33153): avc:  denied  { transition } for  pid=29811 comm="runcon" path="/usr/bin/strace" dev=dm-1 ino=662333 scontext=system_u:system_r:svirt_t:s0:c56,c78 tcontext=system_u:system_r:svirt_t:s0:c12,c34 tclass=process

Updated Test-AutoBuild with new release 1.2.4 and automated builds for virtualization tools

Posted: September 23rd, 2011 | Filed under: Fedora, libvirt, Test-AutoBuild, Virt Tools | No Comments »

Test-AutoBuild is the oldest open source project of mine that I still actually work on. The original code dates all the way back to approx the year 2000, when Richard Jones and I were working at a now defunct dot-com called BiblioTech (random archive.org historical link). Rich wrote a script called “Rolling Build” which would continuously checkout & build all our software from CVS, publishing the results as RPMs. Thankfully the company allowed the script to be open sourced under the GPLv2+ and I used it as the basis for creating a project called Test-AutoBuild in ~2004. We expanded the code to cover many different SCM tools, maintain historical archives of builds to avoid rebuilding modules if no code had changed and many other things besides. I did a couple of releases a year for while, but it has been on the backburner for the last couple of years. With the increasing number of inter-related virtualization projects using to libvirt & KVM I decided it was time to put Test-AutoBuild back into use as a build server. Yes there are many of automated build systems in existence these days I could have chosen, but I was looking for an excuse to hack on mine again :-)

There was a quiet, mostly unannounced release 1.2.3 back at the start of August, primarily fix the utterly broken GIT support I originally wrote. A couple of weeks ago, immediately before going on holiday, I uploaded release 1.2.4 to the CPAN distribution page. Aside from fixing a number of horrible bugs, the 1.2.4 release brought in the ability to rsync the build results pages to a remote server, so now it is now possible to run the automated builds inside one (or more) private virtual machines and publish the results to a separate public webserver. The second major change was the incorporation of a new theme for the Test-AutoBuild project website and the build status pages. Previously we had used a pretty lame icon of a gear wheel as the logo and a fairly plain web site style. Looking around for some better ideas I happened to come across a proposal for a Fedora 10 Theme that was never taken up. Nicu Buculei and Máirín Duffy, who produced that artwork, were generous enough to grant me permission to use the graphics for Test-AutoBuild under the CC-BY-SA 3.0 and GPLv2+ licenses. Thus for the 1.2.4 release the status pages for the automated builds have been completely restyled and the project’s main website has been similarly updated.

With the 1.2.4 release out and updated RPMs pushed into Fedora, I’m now able to publish the results of the automated builds I run for nearly all the libvirt related virtualization projects. The builder currently runs in a Fedora 14 virtual machine. The plan is to install further virtual machines running important target OS, at the very least, Debian and one of the BSDs, so we can be sure we aren’t causing regressions in our codebases. If I’m feeling adventurous I might even setup a QEMU PPC instance to run some builds on a non-x86 architecture, though that will probably be painfully slow :-)

Injecting fake keyboard events to KVM guests via libvirt

Posted: September 23rd, 2011 | Filed under: Fedora, Gtk-Vnc, libvirt, Virt Tools | 3 Comments »

I’ve written before about how virtualization causes pain wrt keyboard handling and about the huge number of scancode/keycode sets you have to worry about. Following on from that investigative work I completely rewrote GTK-VNC’s keycode handling, so it is able to correctly translate the keycodes it receives from GTK on Linux, Win32 and OS-X, even when running against a remote X11 server on a different platform. In doing so I made sure that the tables used for doing conversions between keycode sets were not just big arrays of magic numbers in the code, as is common practice across the kernel or QEMU codebase. Instead GTK-VNC now has a CSV file containing the unadulterated mapping data along with a simple script to split out mapping tables. This data file and script has already been reused to solve the same keycode mapping problem in SPICE-GTK.

Fast-forward a year and a libvirt developer from Fujitsu is working on a patch to wire up QEMU’s “sendkey” monitor command to a formal libvirt API. The first design question is how should the API accept the list of keys to be injected to the guest. The QEMU monitor command accepts a list of keycode names as strings, or as keycode values as hex-encoded strings. The QEMU keycode values come from what I term the “RFB” codeset, which is just the XT codeset with a slightly unusual encoding of extended keycodes. VirtualBox meanwhile has an API which wants integer keycode values, from the regular XT codeset.

One of the problems with the XT codeset is that no one can ever quite agree on what is the official way to encode extended keycodes, or whether it is even possible to encode certain types of key. There is also a usability problem with having the API require a lowlevel hardware oriented keycode set as input, in that as an application developer you might know what Win32 virtual keycode you want to generate, but have no idea what the corresponding XT keycode is. It would be preferable if you could simply directly inject a Win32 keycode to a Windows guest, or directly inject a Linux keycode to a Linux guest, etc.

After a little bit of discussion we came to the conclusion that the libvirt API should accept an array of integer keycodes, along with a enum parameter specifying what keycode set they belong to. Internally libvirt would then translate from whatever keycode set the application used, to the  keycode set required by the hypervisor’s own API. Thus we got an API that looks like:

typedef enum {
   VIR_KEYCODE_SET_LINUX          = 0,
   VIR_KEYCODE_SET_XT             = 1,
   VIR_KEYCODE_SET_ATSET1         = 2,
   VIR_KEYCODE_SET_ATSET2         = 3,
   VIR_KEYCODE_SET_ATSET3         = 4,
   VIR_KEYCODE_SET_OSX            = 5,
   VIR_KEYCODE_SET_XT_KBD         = 6,
   VIR_KEYCODE_SET_USB            = 7,
   VIR_KEYCODE_SET_WIN32          = 8,
   VIR_KEYCODE_SET_RFB          = 9,

} virKeycodeSet;

int virDomainSendKey(virDomainPtr domain,
                     unsigned int codeset,
                     unsigned int holdtime,
                     unsigned int *keycodes,
                     int nkeycodes,
                     unsigned int flags);

As with all libvirt APIs, this is also exposed in the virsh command line tool, via a new “send-key” command. As you might expect, this accepts a list of integer keycodes as parameters, along with a keycode set name. If the keycode set is omitted, we are assuming use of the Linux keycode set by default. To be slightly more user friendly though, for the Linux, Win32 & OS-X keycode sets, we also support symbolic keycode names as an alternative to the integer values. These names are simply the name of the #define constant from corresponding header file.

Some examples of how to use the new virsh command are

# send three strokes 'k', 'e', 'y', using xt codeset
virsh send-key dom --codeset xt 37 18 21

# send one stroke 'right-ctrl+C'
virsh send-key dom KEY_RIGHTCTRL KEY_C

# send a tab, held for 1 second
virsh send-key --holdtime 1000 0xf

So when interacting with virtual guests you now have a choice of how to send fake keycodes. If you have a VNC or SPICE connection directly to the guest in question, you can inject keycodes over that channel, while if you have a libvirt connection to the hypervisor you can inject keycodes over that channel.