Writing the Nova file injection code to use libguestfs APIs instead of FUSE

Posted: November 15th, 2012 | Filed under: Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , , , | No Comments »

When launching a virtual machine, Nova has the ability to inject various files into the disk image immediately prior to boot up. This is used to perform the following setup operations:

  • Add an authorized SSH key for the root account
  • Configure init to reset SELinux labelling on /root/.ssh
  • Set the login password for the root account
  • Copy data into a number of user specified files
  • Create the meta.js file
  • Configure network interfaces in the guest

This file injection is handled by the code in the nova.virt.disk.api module. The code which does the actual injection is designed around the assumption that the filesystem in the guest image can be mapped into a location in the host filesystem. There are a number of ways this can be done, so Nova has a pluggable API for mounting guest images in the host, defined by the nova.virt.disk.mount module, with the following implementations:

  • Loop – Use losetup to create a loop device. Then use kpartx to map the partitions within the device, and finally mount the designated partition. Alternatively on new enough kernels the loop device’s builtin partition support is used instead of kpartx.
  • NBD – Use qemu-nbd to run a NBD server and attach with the kernel NBD client to expose a device. Then mapping partitions is handled as per Loop module
  • GuestFS – Use libguestfs to inspect the image and setup a FUSE mount for all partitions or logical volumes inside the image.

The Loop module can only handle Raw format files, while the NBD module can handle any format that QEMU supports. While they have the ability to access partitions, the code handling this is very dumb. It requires the Nova global ‘libvirt_inject_partition’ config parameter to specify which partition number to inject. The result is that every image you upload to glance must be partitioned in exactly the same way. Much better would be if it used a metadata parameter associated with the image. The GuestFS module is much more advanced and inspects the guest OS to figure out arbitrarily partitioned images and even LVM based images.

Nova has a “img_handlers” configuration parameter which defines the order in which the 3 mount modules above are to be tried. It tries to mount the image with each one in turn, until one suceeds. This is quite crude code really – it has already been hacked to avoid trying the Loop module if Nova knows it is using QCow2. It has to be changed by the Nova admin if they’re using LXC, otherwise you can end up using KVM with LXC guests which is probably not what you want. The try-and-fallback paradigm also has the undesirable behaviour of masking errors that you would really rather consider fatal to the boot process.

As mentioned earlier, the file injection code uses the mount modules to map the guest image filesystem into a temporary directory in the host (such as /tmp/openstack-XXXXXX). It then runs various commands like chmod, chown, mkdir, tee, etc to manipulate files in the guest image. Of course Nova runs as an unprivileged user, and the guest files to be changed are typically owned as root. This means all the file injection commands need to run via Nova’s rootwrap utility to gain root privileges. Needless to say, this has the undesirable consequence that the code injecting files into a guest image in fact has privileges that allow it to write to arbitrary areas of the host filesystem. One mistake in handling symlinks and you have the potential for a carefully crafted guest image to result in compromise of the host OS. It should come as little surprise that this has already resulted in a security vulnerability / CVE against Nova.

The solution to this class of security problems is to decouple the file injection code from the host filesystem. This can be done by introducing a “VFS” (Virtual File System) interface which defines a formal API for the various logical operations that need to be performed on a guest filesystem. With that it is possible to provide an implementation that uses the libguestfs native python API, rather than FUSE mounts. As well as being inherently more secure, avoiding the FUSE layer will improve performance, and allow Nova to utilize libguestfs APIs that don’t map into FUSE, such as its Augeas support for parsing config files. Nova still needs to work in scenarios where libguestfs is not available though, so a second implementation of the VFS APIs will be required based on the existing Loop/Nbd device mount approach. The security of the non-libguestfs support has not changed with this refactoring work, but de-coupling the file injection code from the host filesystem does make it easier to write unit tests for this code. The file injection code can be tested by mocking out the VFS layer, while the VFS implementations can be tested by mocking out the libguestfs or command execution APIs.

Incidentally if you’re wondering why Libguestfs does all its work inside a KVM appliance, its man page describes the security issues this approach protects against vs just directly mounting guest images on the host

 

KVM Forum: building application sandboxes on top of KVM or LXC using libvirt

Posted: November 8th, 2012 | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , | No Comments »

This week I have spent my time at LinuxCon Europe and KVM Forum 2012. I gave a talk titled “Building application sandboxes on top of KVM or LXC using libvirt”. For those who enquired afterwards, the slides are now available.

Announce: libvirt 1.0.0 release and 7th birthday

Posted: November 2nd, 2012 | Filed under: Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , , | No Comments »

Today libvirt reached the symbolic milestone of a 1.0.0 release. This is not because of any particular major new feature compared to the previous 0.10.2 release, but rather we picked 1.0.0 as a way of celebrating our 7th birthday. From the git history we see the first commit 7 years ago today:

  commit d77e1a9642fe1efe9aa5f737a640354c27d04e02
  Author: Daniel Veillard <veillard@redhat.com>
  Date:   Wed Nov 2 12:50:21 2005 +0000

  Initial revision

And today the 1.0.0 release commit:

  commit 2b435c153e53e78092025c01ddc43265761b72fa
  Author: Daniel Veillard <veillard@redhat.com>
  Date:   Fri Nov 2 12:08:11 2012 +0800

  Release of libvirt-1.0.0

To commemorate this occasion I have prepared a new animation of libvirt development history using gource.

While I was doing that I figured I would do one for QEMU too, which is coming up to its 10 year anniversary on Feb 18th, 2013.

In both videos it should be pretty easy to spot where the projects switched from using CVS/SVN (respectively) over to GIT, since there is dramatic increase in the number of people committing changes. A large part of this is due to the fact that GIT correctly attributes authorship, but at the same time both projects also saw a significant increase in community size as barriers to contribution were lowered.

Announce: Entangle “Gluon” release 0.4.1 – An app for tethered camera control & capture

I am pleased to announce a new release 0.4.1 of Entangle is available for download from the usual location:

   http://entangle-photo.org/download/

This release focuses on bug fixing, but throws in a couple of small feature improvements too

  • Fix leak of image pixbufs when changing image in session
  • Keep toolbar directory in sync with session dir
  • Fix leak when displaying image popups
  • Fix leak when closing image popups
  • Fix key bindings in session browser
  • Add image histogram display
  • Load libpeas introspection data for plugins
  • Main plugin list in preferences
  • Add object type checking in all APIs
  • Fix image mask aspect ratio conversion to avoid locale problems
  • Fix build on GTK < 3.4
  • Remove obsolete conditionals from GTK 2.x days
  • Populate list of supported cameras in help menu dialog
  • Add a simple man page
  • Add accelerators for many menu options
  • Fix unref of cairo surface objects
  • Avoid GTK assertion when range is max-min is zero
  • Avoid crash in control panel when updating after camera disconnect

As before we still need help getting the UI translated into as many languages as possible, so if you are able to help out, please join the Fedora translation team:

     https://fedora.transifex.net/projects/p/entangle/

Thanks to everyone who helped contribute to this release & troubleshooting of the previous releases.

Podcast from the London OpenStack Meetup talk “Libvirt & KVM with OpenStack Nova”

Posted: July 27th, 2012 | Filed under: Fedora, libvirt, OpenStack, Virt Tools | No Comments »

I mentioned in my post yesterday about the 1st London OpenStack Meetup, that Richard Morrell had done an audio recording of the talk I gave. After a little post-processing of the audio capture files (with open source tools like Audacity on Linux of course – no Mac OS-X here thanks), and the recording of a short introduction, Richard has now published my talk as a podcast on his Cloud Evangelist blog. With the introduction he added, the podcast comes out at a little bit over 30 minutes – I hadn’t realized how far I went over the allocated 20 minute timeslot until I got the recording back! Next time we need a meeting room with a more clearly visible clock :-)