Last week, after many months development & testing, we finally did a new release of
libvirt which includes secure remote management. Previously usage of libvirt was restricted to apps running on the machine being managed. When you are managing a large data center of machines requiring that an admin ssh into a machine to manage virtual machines is clearly sub-optimal. XenD has had the ability to talk to its HTTP service remotely for quite a while, but this used cleartext HTTP and had zero authentication until Xen 3.1.0. We could have worked on improving XenD, but it was more compelling to work on a solution that would apply to all virtualization platforms. Thus we designed & implemented a small daemon for libvirt to expose the API to remote machines. The the communications can be run over a native SSL/TLS encrypted transport, or indirectly over an SSH tunnel. The former will offer a number of benefits in the long term – not least of which the ability to delegate management permissiosn per-VM and thus avoid the need to provide root access to virtual machine administrators.
So how do you make use of this new remote management. Well, installing libvirt 0.3.0 is the first step. Out of the box, the SSL/TLS transport is not enabled since it requires x509 certificates to be created. There are docs about certificate creation/setup so I won’t repeat it here – don’t be put off by any past experiance setting up SSL with Apache – its really not as complicated as it seems. The GNU TLS certtool
also a much more user friendly tool than the horrific openssl
command line. Once the daemon is running, then the only thing that changes is the URI you use to connect to libvirt. This is best illustrated by a couple of examples
- Connecting to Xen locally
-
$ ssh root@pumpkin.virt.boston.redhat.com
# virsh --connect xen:/// list --all
Id Name State
----------------------------------
0 Domain-0 running
6 rhel5fv blocked
- hello shut off
- rhel4x86_64 shut off
- rhel5pv shut off
- Connecting to Xen remotely using SSL/TLS
-
$ virsh --connect xen://pumpkin.virt.boston.redhat.com/ list --all
Id Name State
----------------------------------
0 Domain-0 running
6 rhel5fv blocked
- hello shut off
- rhel4x86_64 shut off
- rhel5pv shut off
- Connecting to Xen remotely using SSH
-
$ virsh --connect xen+ssh://root@pumpkin.virt.boston.redhat.com/ list --all
Id Name State
----------------------------------
0 Domain-0 running
6 rhel5fv blocked
- hello shut off
- rhel4x86_64 shut off
- rhel5pv shut off
- Connecting to QEMU/KVM locally
-
$ ssh root@celery.virt.boston.redhat.com
# virsh --connect qemu:///system list --all
Id Name State
----------------------------------
1 kvm running
- demo shut off
- eek shut off
- fc6qemu shut off
- rhel4 shut off
- wizz shut off
- Connecting to QEMU/KVM remotely using SSL/TLS
-
$ virsh --connect qemu://celery.virt.boston.redhat.com/system list --all
Id Name State
----------------------------------
1 kvm running
- demo shut off
- eek shut off
- fc6qemu shut off
- rhel4 shut off
- wizz shut off
- Connecting to QEMU/KVM remotely using SSH
-
$ virsh --connect qemu+ssh://root@celery.virt.boston.redhat.com/system list --all
Id Name State
----------------------------------
1 kvm running
- demo shut off
- eek shut off
- fc6qemu shut off
- rhel4 shut off
- wizz shut off
Notice how the only thing that changes is the URI – the information returned is identical no matter how you connect to libvirt. So if you have an application using libvirt, all you need do is adapt your connect URIs to support remote access. BTW, a quick tip – if you get tired of typing –connect arg you can set the VIRSH_DEFAULT_CONNECT_URI environment variable instead.
What about virt-install and virt-manager you might ask. Well there are slightly more complicated. During the creation of new virtual machines, both of them need to create files on the local disk (to act as virtual disks for the guest), possibly download kernel+initrd images for booting the installer, and enumerating network devices to setup networking. So while virt-manager can run remotely now – it will be restricted to monitoring existing VMs, and basic lifecycle management – it won’t be possible to provision entirely new VMs remotely. Yet. Now the basic remote management is working, we’re looking at APIs to satisfy storage management needs of virt-manager. For device enumeration we can add APIs which ask HAL questions and pass the info back to the client over our secure channel. Finally kernel+initrd downloading can be avoided with by PXE booting the guests.
There’s lots more to talk about, such as securing the VNC console with SSL/TLS, but I’ve not got time for that in this blog posting. Suffice to say, we’re well on our way to our Fedora 8 goals for secure remote management. Fedora 8 will be the best platform for virtualization management by miles.
In the latter half of last year I was mulling over the idea of writing SELinux policy for Test-AutoBuild. I played around a little bit, but never got the time to really make a serious attempt at it. Before I go on, a brief re-cap on the motivation…
Test-AutoBuild is a framework for providing continous, unattended, automated software builds. Each run of the build engine checks the latest source out from CVS, calculates a build order based on module dependancies, builds each modules, and the publishes the results. The build engine typically runs under a dedicated system user account – builder
– to avoid any risk of the module build process compromising a host (either accidentally, or delibrately). This works reasonably well if you are only protecting against accidental damage from a module build – eg building apps maintained inside your organization. If building code from source repositories out on the internet though there is a real risk of delibrately hostile module build processes. A module may be trojaned so that its build process attempts to scan your internal network, or it may trash the state files of the build engine itself – both the engine & the module being built are under the same user account. There is also the risk that the remote source control server has been trojaned to try and exploit flaws in the client.
And so enter SELinux… The build engine is highly modular in structure, with different tasks in the build workflow being pretty well isolated. So the theory was that it ought to be possible to write SELinux policy to guarentee separation of the build engine, from the SCM tools doing source code checkout, from the module build processes, and other commands being run. As an example, within a build root there a handful of core directories
root
|
+- source-root - dir in which module source is checked out
+- package-root - dir in which RPMs/Debs & other packages are generated
+- install-root - virtual root dir for installing files in 'make install'
+- build-archive - archive of previous successful module builds
+- log-root - dir for creating log files of build process
+- public_html - dir in which results are published
All these dirs are owned by the builder
user account. The build engine itself provides all the adminsitrative tasks for the build workflow, so generally requires full access to all of these directories. The SCM tools, however, merely need to be able to check out files into the source-root
and create logs in the log-root
. The module build process needs to be able to read/write in the source-root
, package-root
and install-root
, as well as creating logs in the log-root
. So, given suitable SELinux policy it ought to be possible to lock down the access of the SCM tools and build process quite significantly.
Now aside from writing the policy there are a couple of other small issues. The primary one is that the build engine has to run in a confined SELinux context, and has to be able to run SCM tools and build processes in a different context. For the former, I choose to create a ‘auto-build-secure’ command to augment the ‘auto-build’ command. This allows user to easily run the build process in SELinux enforced, or traditional unconfined modes. In the latter cases, most SELinux policy has automated process context transitions based on the binary file labels. This isn’t soo useful for autobuild though, because the script we’re running is being checked out direct from a SCM repo & thus not labelled. The solution for this is easily though – after fork()ing, but before exec()ing the SCM tools / build script we simply write the desired target context into /proc/self/attr/exec.
So with a couple of tiny modifications to the build engine, and many hours of writing suitable policy for Test-AutoBuild, its now possible to run the build engine under a strictly confined policy. There is one horrible troublespot though. Every application has its own build process & set of operations is wishes to perform. Writing a policy which confines the build process as much as possible, while still keeping it secure is very hard indeed. In fact it is effectively unsolveable in the general case.
So what to do ? SELinux booleans provide a way to toggle on/off various capabilities system wide. If building multiple applications though, it may be desirable to run some under a more confined policy than others – booleans are system wide. The solution I think is to define a set of perhaps 4 or 5 different execution contexts with differing levels of privileges. As an example, some contexts may allow outgoing network access, while others may deny all network activity. So the build admin can use the most restrictive policy by default, and a less restrictive policy for applications which are more trusted.
This weekend was just the start of experimentation with SELinux policy in regards to Test-AutoBuild, but it was more far, far successful than I ever expected it to be. The level of control afforded by SELinux is awesome, and with the flexibility of modifying the application itself too, the possibilities for fine grained access control are enourmous. One idea I’d like to investigate is whether it is possible to define new SELinux execution contexts on-the-fly. eg, instead of all application sources being checked out under a single ‘absource_t’ file context, it would be desirable to create a new source file context per-applicaiton. I’m not sure whether SELinux supports this idea, but it is interesting to push the boundaries here nonetheless…
For the last couple of years all the hype wrt open source virtualization has been about Xen. Unfortunately after several years Xen is still not upstream in LKML, the main codebase being a huge out of tree patch persistently stuck on obsoleted kernel versions. The Xen paravirt_ops implementation is showing promise, but its a long way off being a full solution since it doesn’t provide Dom0 or ia64/ppc/x86_64 yet. Then out of nowhere, 6 months ago, a newer contender arrived in the form of KVM almost immediately finding itself merged upstream. Now you can’t claim to be offerring state of the art virtualization without including both KVM and Xen. We had to have KVM in Fedora 7. With excellant forsight, when working to integrate Xen in Fedora Core 5, Daniel Veillard had the idea to create a technology independant management library for virtualization platforms. This is libvirt. The core idea was to provide a stable library API for application developers to target, insulating them from implementation specific APIs in the changing base virtualization platform. This API would be LGPL to allow both both the community and 3rd party vendors build applications.
On top of libvirt a suite of applications are evolving – virt-manager, virt-install and cobbler/koan – to name the three most popular thus far. Fast forward to Fedora 7 when we’re looking to introduce support for KVM into the distribution. We could simply have taken the approach of throwing in the kernel modules & QEMU code for KVM, sending out a fancy press release & letting users figure out the gory details. With libvirt though, we were in a position to do more…much, much more. So the precious few months between the end of our RHEL-5 GA work and the Fedora 7 release were spent hacking on libvirt to add in support for managing QEMU and KVM virtual machines.
The results speak for themselves. In Fedora 7 you can fire up the tools like virt-manager & virt-install and manage KVM virtual machines in exactly the same way you managed Xen virtual machines. You are free from being locked into Xen. You avoided having to learn new command sets. You are able to pick the best open source virtualization technology to accomplish the task at hand. This is Fedora innovation & vision at its best! No other distribution even comes close…yet… None of this technology has any serious distribution specific logic & we want everyone to reap the rewards. Thus we actively work with developers from other distributions, in particular Open Solaris and Debian to help them integrate (& thus take advantage of) libvirt and the applications built upon it.
So if you’re using Fedora 7 what are some of the things to look out for….
- Choosing which hypervisor to use
-
Both KVM and Xen require root privileges since they need to talk to various kernel devices & drivers. Not everyone has root on their machine. With KVM you can allow unprivileged users the ability to create VMs by changing the ownership on /dev/kvm. The plain unaccelerated QEMU can be used a general purpose emulator too without any elevated privileges. libvirt uses ‘hypervisor URIs’ to choose between backend implementations.
xen
(or no explicit URI): this connects to Xen. root has full access, non-root has read only access
qemu:///system
: this connects to the main QEMU/KVM backend driver. root has full access, non-root has read only access.
qemu:///session
: this connects to the per-user private QEMU backend driver. Full access irrespective of user, since there is one instance per-user account.
All libvirt tools accept a --connect
argument to choose the hypervisor URI to use. virt-manager & virt-install will try to pick either xen
or qemu://system
depending on which kernel you’re running. You can always o an explicit argument. So to use virsh
as a non-root user with the per-user QEMU instance, virsh --connect qemu:///session
.
- Using virsh instead of xm
-
You may be familiar with Xen’s xm
command line tool. The capabilities of this valuable administrative tool are also in virsh
. The benefit being that it works with Xen and KVM/QEMU, as well as any future platforms libvirt is ported to. To use with Xen, just run virsh
with no arguments. To use with KVM/QEMU add in --connect qemu:///system
. The virsh help
command shows the (huge!) list of commands at your disposal. Oh the VIRSH_DEFAULT_CONNECT_URI
environment variable can be used if you don’t want to specify --connect
everytime.
- Changing the config of an existing guest
-
Every virtualization technology has a difference configuration file format (or even none at all – relying on command line args in QEMU case). As expected libvirt provides a consistent way to change the configuration of a guest virtual machine. Every guest has an XML document describing its setup. You can see this configuration by using the virsh dumpxml [guest]
command. Save this XML to a file, edit it in VI and then use it to update the master config by running virsh define [xmlfile]
. There’s a page describing the various XML elements. Afraid of looking at XML ? Well virsh
provides commands for changing simple things like memory & CPU count, while virt-manager
also allows disk and network devices to be added/removed.
- Connecting virtual machines to the network
-
libvirt allows for many ways to connect a virtual machine to the real network, two of them stand out as the common case. First the traditional Xen approach is to setup a bridge device in the host OS with the physical ethernet device enslaved, along with zero or more guest interfaces. Xen ships with a shell script which munges networking on startup to put eth0 into a bridge, though it is preferrable to just use the standard initscripts to define the bridge & device. In libvirt we call this a “shared physical device” setup – your physical device is shared between the guest and the host in effect. Bridging works well if you are on a permanently wired LAN. It fails miserably if you have a dynamic network setup as typically found on laptops. So the second approach is to have a bridge device not connected to any physical, attach guests to this bridge, and then NAT their traffic out to the real physical device. In libvirt we call this a “virtual network” since its not seen outside of a single host, all LAN traffic appears to orginate from the single host. David Lutterkort describes the setup in a little more detail
We try to make the choice easy in virt-install – if there is a bridge device with a physical interface enslaved we setup a ‘shared physical device’ style config, otherwise we fallback to a ‘virtual network’ config. You can be explicit when choosing between the two using the --network bridge:eth0
(connect to bridge called eth0) or --network network:default
(connect to virtual network called ‘default’). Virt-manager has an extra screen in the VM creation wizard to let you choose the networking setup.
I can only just scratch the surface of this in a blog posting, and of course the work on libvirt is far from complete. A major focus for Fedora 8 will be providing secure remote management. In parallel there’s ongoing work to expand coverage of virtual hardware configuration to audio, USB, more advanced serial consoles. We’d also welcome contributions from anyone interested in adding support for OpenVZ, User Mode Linux, VMWare and Virtual Box to libvirt…
If you’ve spent any time manipulating / scanning / printing photos you’ll know that getting the colours on screen to match those of the original slide / negative and / or those of the final print is rather non-trivial. Welcome to the special 10th circle of hell that is device colour management. Fortunately there are ways out of this purgatory, in particular for monitors I’m talking about colourimeters; Unfortunately nearly all of these hardware devices require use of Windows or Mac OS-X.
After an extensive Googling session I eventually discovered that a gadget called the Monaco Optix XR is the branded name for the DTP-94 device. This device is in turn the only low cast (ie < $1000) hardware currently supported by Argyll CMS (a set of open source programs for doing colour management including display calibration & profiling).
The bad news is that the company making the Monaco Optix was acquired by x-rite whom (oblivious to this device’s unique selling point for Linux users) have apparantly discontinued its production :-( There’s still various photo stores which have them in stock, so I picked one up from Midwest Photo Exchange. On a more promising note though, I’ve also just read in the Argyll CMS lists that the current development snapshot releases have support for a couple of the Eye-One devices. So perhaps all is not lost of Linux users in the future after all!
Now that I’ve got this device its time to figure out how on earth the Argyll CMS software is actually supposed to be used. There is alot of documentation, but it is far from clear what the best process to go through is. In particular while it can be used with LCDs, the docs are not too good at describing which things need to be done differently for LCDs vs CRTs. Oh well, I’ll figure it out in the end. Getting the monitor setup correctly is also only one of the problems. I still use film for all my photography, scanning it in with a Nikon CoolScan. This needs profiling/calibrating too, so I’ve a set of IT 8.7 scanner calibration targets on order from a cheap source in Germany.
To enable me to actually test some ideas for IPv6 in libvirt’s virtual networking APIs, I recently purchased a LinkSys WRT54GL wireless router which was promptly flashed to run OpenWRT. I won’t go into all the details of the setup in this post, it suffices to say that thanks to the folks at SIXXS my home network has a globally routable IPv6 /48 subnet (no stinking NAT required!). That gives me 80 bits of addressing to use on my LAN – enough to run a good sized handful of virtual machines :-) With a little scripting completed on the LinkSys router, full IPv6 connectivity is provided to any machine on the LAN which wants it. Which is more or less where the “fun” begins
Initially I was just running radvd to advertise a /64 prefix for the LAN & letting devices do autoconfiguration (they basically combine the /64 prefix with their own MAC address to generate a globally unique 128-bit IPv6 address. As part of my exploration for IPv6 support in libvirt though, I wanted to give DHCPv6 a try too.
It was fairly straightforward – much like DHCP on IPv4 you tell the DHCPv6 server what address range it can use (make sure that matches the range that radvd is advertising) and then configure the networking scripts on the client to do DHCP instead of autoconfiguration (basically add DHCPV6C=yes to each interface config). In debugging this though I came across a fun bug in the DHCPv6 client & server in Fedora – it consistently passes in sizeof(sockaddr_in6.sa_addr) as the salen parameter to getnameinfo() which it should be sizeof(sockaddr_in6). So all getnameinfo() calls were failing – fortunately this didn’t appear to have any operational ill-effects to the DHCPv6 client/server – it just means that your logs don’t include details of the addresses being handed out / received. So that was IPv6 bug #1
With globally routable IPv6 addresses now being configured on my laptop, it was time to try browsing some IPv6 enabled websites. If you have a globally routable IPv6 address configured on your interface, then there’s no magic config needed in the web browsers – the getaddrinfo() calls will automatically return an IPv6 address for a site if it is available. BTW, if you’re still using the legacy gethostbyname() calls when writing network code you should really read Uli’s doc on modern address resolution APIs. Suffice to say, if you use getaddrinfo() and getnameinfo() correctly in your apps, IPv6 will pretty much ‘just work’. Well, while the hostname lookups & web browsers were working correctly, all outgoing connections would just hang. After much debugging I discovered that while the SYN packet was going out the default ip6tables firewall rules were not letting the reply backthrough, so the connection never got established. In IPv4 world there is a rule using conntrack to match on ‘state = RELATED,ESTABLISHED’ but there was no equivalent added in the IPv6 firewall rules. That gives us IPv6 bug #2
With that problem temporarily hacked/worked around by allowing all port 80 traffic through the firewall, web browsing was working nicely. For a while at least. I noticed that inexplicably, every now & then, my network device would loose all its IPv6 addresses – even the link local one! This was very odd indeed & I couldn’t figure out what on earth would be causing it. I was about to resort to using SystemTAP when I suddenly realized the loss of addresses co-incided with disconnecting from the office VPN. This gave two obvious targets for investigation – NetworkManager and/or VPNC. After yet more debugging and it transpired that when a VPN conenction is torn down, NetworkManager flushes all addresses & routes from the underlying physical device, but then only re-adds the IPv4 configuration. The fix was this was trivial – during the initial physical device configuration NetworkManager already has code to automatically add an IPv6 link-local address – that code just needed to be invoked from the VPN teardown script to re-add the link-local address after the device was flushed. Finally we have IPv6 bug #3. Only 3 minor, edge-case bugs is pretty good considering how few people actively use this stuff.
Overall it has been a very worthwhile learning exercise. Trying to get your head around IPv6 is non-trivial if you’re doing it merely based on reading HOWTOs & RFCs. As with many things, actually trying it out & making use of IPv6 for real is a far better way to learn just what it is all about. Second tip is to get yourself a globally routable IPv6 address & subnet right from the start – site-local addresses are deprecated & there’s not nearly as much you can do if you can’t route to the internet as a whole – remember there’s no NAT in IPv6 world. I would have been much less likely to have encounter the firewall / NetworkManager bugs if I had only been using site-local addresses, since I would not have been browsing public sites over IPv6. While there are almost certainly more IPv6 bugs lurking in various Fedora applications, on the whole Fedora Core 6 IPv6 support is working really rather well – the biggest problem is lack of documentation & the small userbase – the more people who try it, the more quickly we’ll be able to shake out & resolve the bugs.
BTW, there’s nothing stopping anyone trying out IPv6 themselves. Even if your internet connection is a crappy Verizon DSL service with a single dynamic IPv4 address that changes every time your DSL bounces, the folks as SIXXS have way to get you a tunnel into the IPv6 with a fully routable global IPv6 address & subnet.