I pleased to announce the a new public release of libvirt-sandbox, version 0.6.0, is now available from:
http://sandbox.libvirt.org/download/
The packages are GPG signed with
Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)
The libvirt-sandbox package provides an API layer on top of libvirt-gobject which facilitates the cration of application sandboxes using virtualization technology. An application sandbox is a virtual machine or container that runs a single application binary, directly from the host OS filesystem. In other words there is no separate guest operating system install to build or manage.
At this point in time libvirt-sandbox can create sandboxes using either LXC or KVM, and should in theory be extendable to any libvirt driver.
This release contains a mixture of new features and bugfixes.
The first major feature is the ability to provide block devices to sandboxes. Most of the time sandboxes only want/need filesystems, but there are some use cases where block devices are useful. For example, some applications (like databases) can directly use raw block devices for storage. Another one is where a tool actually wishes to be able to format filesystems and have this done inside the container. The complexity with exposing block devices is giving the sandbox tools a predictable path for accessing the device which does not change across hypervisors. To solve this, instead of allowing users of virt-sandbox to specify a block device name, they provide an opaque tag name. The block device is then made available at a path /dev/disk/by-tag/TAGNAME
, which symlinks back to whatever hypervisor specific disk name was used.
The second major feature is the ability to provide a custom root filesystem for the sandbox. The original intent of the sandbox tool was that it provide an easy way to confine and execute applications that are installed on the host filesystem, so by default the host / filesystem is mapped to the sandbox / filesystem read-only. There are some use cases, however, where the user may wish to have a completely different root filesystem. For example, they may wish to execute applications from some separate disk image. So virt-sandbox now allows the user to map in a different root filesystem for the sandbox.
Both of these features were developed as part of a Google Summer of Code 2015 project which is aiming to enhance libvirt sandbox so that it is capable of executing images distributed by the Docker container image repository service. The motivation for this goes back to the original reason for creating the libvirt-sandbox project in the first place, which was to provide a hypervisor agnostic framework for sandboxing applications, as a higher level above the libvirt API. Once this is work is complete it’ll be possible to launch Docker images via libvirt QEMU, KVM or LXC, with no need for the Docker toolchain itself.
The detailed list of changes in this release is:
- API/ABI in-compatible change, soname increased
- Prevent use of virt-sandbox-service as non-root upfront
- Fix misc memory leaks
- Block SIGHUP from the dhclient binary to prevent accidental death if the controlling terminal is closed & reopened
- Add support for re-creating libvirt XML from sandbox config to facilitate upgrades
- Switch to standard gobject introspection autoconf macros
- Add ability to set filters on network interfaces
- Search /usr/lib instead of /lib for systemd unit files, as the former is the canonical location even when / and /usr are merged
- Only set SELinux labels on hosts that support SELinux
- Explicitly link to selinux, instead of relying on indirect linkage
- Update compiler warning flags
- Fix misc docs comments
- Don’t assume use of SELinux in virt-sandbox-service
- Fix path checks for SUSE in virt-sandbox-service
- Add support for AppArmour profiles
- Mount /var after other FS to ensure host image is available
- Ensure state/config dirs can be accessed when QEMU is running non-root for qemu:///system
- Fix mounting of host images in QEMU sandboxes
- Mount images as ext4 instead of ext3
- Allow use of non-raw disk images as filesystem mounts
- Check if required static libs are available at configure time to prevent silent fallback to shared linking
- Require libvirt-glib >= 0.2.1
- Add support for loading lzma and gzip compressed kmods
- Check for support libvirt URIs when starting guests to ensure clear error message upfront
- Add LIBVIRT_SANDBOX_INIT_DEBUG env variable to allow debugging of kernel boot messages and sandbox init process setup
- Add support for exposing block devices to sandboxes with a predictable name under /dev/disk/by-tag/TAGNAME
- Use devtmpfs instead of tmpfs for auto-populating /dev in QEMU sandboxes
- Allow setup of sandbox with custom root filesystem instead of inheriting from host’s root.
- Allow execution of apps from non-matched ld-linux.so / libc.so, eg executing F19 binaries on F22 host
- Use passthrough mode for all QEMU filesystems
I pleased to announce the a new public release of libvirt-sandbox, version 0.5.1, is now available from:
http://sandbox.libvirt.org/download/
The packages are GPG signed with
Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)
The libvirt-sandbox package provides an API layer on top of libvirt-gobject which facilitates the cration of application sandboxes using virtualization technology. An application sandbox is a virtual machine or container that runs a single application binary, directly from the host OS filesystem. In other words there is no separate guest operating system install to build or manage.
At this point in time libvirt-sandbox can create sandboxes using either LXC or KVM, and should in theory be extendable to any libvirt driver.
This release focused on exclusively on bugfixing
Changed in this release:
- Fix path to systemd binary (prefers dir /lib/systemd not /bin)
- Remove obsolete commands from virt-sandbox-service man page
- Fix delete of running service container
- Allow use of custom root dirs with ‘virt-sandbox –root DIR’
- Fix ‘upgrade’ command for virt-sandbox-service generic services
- Fix logrotate script to use virsh for listing sandboxed services
- Add ‘inherit’ option for virt-sandbox ‘-s’ security context option, to auto-copy calling process’ context
- Remove non-existant ‘-S’ option froom virt-sandbox-service man page
- Fix line break formatting of man page
- Mention LIBVIRT_DEFAULT_URI in virt-sandbox-service man page
- Check some return values in libvirt-sandbox-init-qemu
- Remove unused variables
- Fix crash with partially specified mount option string
- Add man page docs for ‘ram’ mount type
- Avoid close of un-opened file descriptor
- Fix leak of file handles in init helpers
- Log a message if sandbox cleanup fails
- Cope with domain being missing when deleting container
- Improve stack trace diagnostics in virt-sandbox-service
- Fix virt-sandbox-service content copying code when faced with non-regular files.
- Improve error reporting if kernel does not exist
- Allow kernel version/path/kmod to be set with virt-sandbox
- Don’t overmount ‘/root’ in QEMU sandboxes by default
- Fix nosuid / nodev mount options for tmpfs
- Force 9p2000.u protocol version to avoid QEMU bugs
- Fix cleanup when failing to start interactive sandbox
- Create copy of kernel from /boot to allow relabelling
- Bulk re-indent of code
- Avoid crash when gateway is missing in network options
- Fix symlink target created in multi-user.target.wants
- Add ‘-p PATH’ option for virt-sandbox-service clone/delete to match ‘create’ command option.
- Only allow ‘lxc:///’ URIs with virt-sandbox-service until further notice
- Rollback state if cloning a service sandbox fails
- Add more kernel modules instead of assuming they are all builtins
- Don’t complain if some kmods are missing, as they may be builtins
- Allow –mount to be repeated with virt-sandbox-service
Thanks to everyone who contributed to this release
As many readers are no doubt aware, the FOSDEM 2012 conference is taking place this weekend in Brussels. This year I was organized enough to submit a proposal for a talk and was very happy to be accepted. My talk is titled “Building app sandboxes on top of LXC and KVM with libvirt” and is part of the Virtualization & Cloud Dev Room. As you can guess from the title, I will be talking in some detail about the libvirt-sandbox project I recently announced. Richard Jones is also attending to provide a talk on libguestfs and how it is used in cloud projects like OpenStack. There will be three talks covering different aspects of the oVirt project, a general project overview, technical look at the management engine and a technical look at the node agent VDSM. Finally the GNOME Boxes project I mentioned a few weeks ago will also be represented in the CrossDesktop devroom.
Besides these virtualization related speakers, there are a great many other Red Hat people attending FOSDEM this year, so we put together a small flyer highlighting all their talks. In keeping with the spirit of FOSDEM, these talks will of course be community / technically focused, not corporate marketing ware :-) I look forward to meeting many people at FOSDEM this year, and if all goes well, make it a regular conference to attend.
I have mentioned in passing every now & then over the past few months, that I have been working on a tool for creating application sandboxes using libvirt, LXC and KVM. Last Thursday, I finally got around to creating a first public release of a package that is now called libvirt-sandbox. Before continuing it is probably worth defining what I consider the term “application sandbox” to mean. My working definition is that an “application sandbox” is simply a way to confine the execution environment of an application, limiting the access it has to OS resources. To me one notable point is that there is no need for a separate / special installation of the application to be confined. An application sandbox ought to be able to run any existing application installed in the OS.
Background motivation & prototype
For a few Fedora releases, users have had the SELinux sandbox command which will execute a command with a strictly confined SELinux context applied. It is also able to make limited use of the kernel filesystem namespace feature, to allow changes to the mount table inside the sandbox. For example, the common case is to put in place a different $HOME. The SELinux sandbox has been quite effective, but there is a limit to what can be done with SELinux policy alone, as evidenced by the need to create a setuid helper to enable use of the kernel namespace feature. Architecturally this gets even more problematic as new feature requests need to be dealt with.
As most readers are no doubt aware, libvirt provides a virtualization management API, with support for a wide variety of virtualization technologies. The KVM driver is easily the most advanced and actively developed driver for libvirt with a very wide array of features for machine based virtualization. In terms of container based virtualization, the LXC driver is the most advanced driver in libvirt, often getting new features “for free” since it shares alot of code with the KVM driver, in particular anything cgroup based. The LXC driver has always had the ability to pass arbitrary host filesystems through to the container, and the KVM driver gained similar capabilities last year with the inclusion of support for virtio 9p filesystems. One of the well known security features in libvirt is sVirt, which leverages MAC technology like SELinux to strictly confine the execution environment of QEMU. This has also now been adapted to work for the LXC driver.
Looking at the architecture of the SELinux sandbox command last year, it occurred to me that the core concepts mapped very well to the host filesystem passthrough & sVirt features in libvirt’s KVM & LXC drivers. In other words, it ought to be possible to create application sandboxes using the libvirt API and suitably advanced drivers like KVM or LXC. A few weeks hacking resulted in a proof of concept tool virt-sandbox which can run simple commands in sandboxes built on LXC or KVM.
The libvirt-sandbox API
A command line tool for running applications inside a sandbox is great, but even more useful would be an API for creating application sandboxes that programmers can use directly. While libvirt provides an API that is portable across different virtualization technologies, it cannot magically hide the differences in feature set or architecture between the technologies. Thus the decision was taken to create a new library called libvirt-sandbox that provides a higher level API for managing application sandboxes, built on top of libvirt. The virt-sandbox command from the proof of concept would then be re-implemented using this library API.
The libvirt-sandbox library is built using GObject to enable it to be accessible to any programming language via GObject Introspection. The basic idea is that programmer simply defines the desired characteristics of the sandbox, such as the command to be executed, any arguments, filesystems to be exposed from host, any bind mounts, private networking configuration, etc. From this configuration description, libvirt-sandbox will decide upon & construct a libvirt guest XML configuration that can actually provided the requested characteristics. In other words, the libvirt-sandbox API is providing a layer of policy avoid libvirt, to isolate the application developer from the implementation details of the underlying hypervisor.
Building sandboxes using LXC is quite straightforward, since application confinement is a core competency of LXC. Thus I will move straight to the KVM implementation, which is where the real fun is. Booting up an entire virtual machine probably sounds like quite a slow process, but it really need not be particularly if you have a well constrained hardware definition which avoids any need for probing. People also generally assume that running a KVM guest, means having a guest operating system install. This is absolutely something that is not acceptable for application sandboxing, and indeed not actually necessary. In a nutshell, libvirt-sandbox creates a new initrd image containing a custom init binary. This init binary simply loads the virtio-9p kernel module and then mounts the host OS’ root filesystem as the guest’s root filesystem, readonly of course. It then hands off to a second boot strap process which runs the desired application binary and forwards I/O back to the host OS, until the sandboxed application exits. Finally the init process powers off the virtual machine. To get an idea of the overhead, the /bin/false binary can be executed inside a KVM sandbox with an overall execution time of 4 seconds. That is the total time for libvirt to start QEMU, QEMU to run its BIOS, the BIOS to load the kernel + initrd, the kenrel to boot up, /bin/false to run, and the kernel to shutdown & QEMU to exit. I think 3 seconds is pretty impressive todo all that. This is a constant overhead, so for a long running command like an MP3 encoder, it disappears into the background noise. With sufficient optimization, I’m fairly sure we could get the overhead down to approx 2 seconds.
Using the virt-sandbox command
The Fedora review of the libvirt-sandbox package was nice & straightforward, so the package is already available in rawhide for ready to test the VirtSandbox F17 feature. The virt-sandbox command is provided by the libvirt-sandbox RPM package
# yum install libvirt-sandbox
Assuming libvirt is already installed & able to run either LXC or KVM guests, everything is ready to use immediately.
A first example is to run the ‘/bin/date’ command inside a KVM sandbox:
$ virt-sandbox -c qemu:///session /bin/date
Thu Jan 12 22:30:03 GMT 2012
You want proof that this really is running an entire KVM guest ? How about looking at the /proc/cpuinfo contents:
$ virt-sandbox -c qemu:///session /bin/cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 2
model name : QEMU Virtual CPU version 1.0
stepping : 3
cpu MHz : 2793.084
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good nopl pni cx16 hypervisor lahf_lm
bogomips : 5586.16
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
How about using LXC instead of KVM, and providing an interactive console instead of just a one-shot command ? Yes, we can do that too:
$ virt-sandbox -c lxc:/// /bin/sh
sh-4.2$ ps -axuwf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 165436 3756 pts/0 Ss+ 22:31 0:00 libvirt-sandbox-init-lxc
berrange 24 0.0 0.1 167680 4688 pts/0 S+ 22:31 0:00 libvirt-sandbox-init-common
berrange 47 0.0 0.0 13852 1608 pts/1 Ss 22:31 0:00 \_ /bin/sh
berrange 48 0.0 0.0 13124 996 pts/1 R+ 22:31 0:00 \_ ps -axuwf
Notice how we only see the processes from our sandbox, none from the host OS. There are many more examples I’d like to illustrate, but this post is already far too long.
Future development
This blog post might give the impression that every is complete & operational, but that is far from the truth. This is only the bare minimum functionality to enable some real world usage. Things that are yet to be dealt with include
- Write suitable SELinux policy extensions to allow KVM to access host OS filesystems in readonly mode. Currently you need to run in permissive mode which is obviously something that needs solving before F17
- Turn the virt-viewer command code for SPICE/VNC into a formal API and use that to provide a graphical sandbox running Xorg.
- Integrate a tool that is able to automatically create sandbox instances for system services like apache to facilitate confined vhosting deployments
- Correctly propagate exit status from the sandboxed command to the host OS
- Unentangle stderr and stdout from the sandboxed command
- Figure out how to make dhclient work nicely when / is readonly and resolv.conf must be updated in-place
- Expose all the libvirt performance tuning controls to allow disk / net I/O controls, CPU scheduling, NUMA affinity, etc
- Wire up libvirt’s firewall capability to allow detailed filtering of network traffic to/from sandboxes
- Much more…
For those attending FOSDEM this year, I will be giving a presentation about libvirt-sandbox in the virt/cloud track.
Oh and as well as the released tar.gz mentioned in the first paragraph, or the Fedora RPM, the code is all available in GIT