Prototype for a Fedora virtual machine appliance builder

Posted: February 17th, 2008 | Author: Daniel Berrange | Filed under: Uncategorized | 2 Comments »

For the oVirt project the end product distributed to users consists of a LiveCD image to serve as the ‘managed node’ for hosting guests, and a virtual machine appliance to serve as the ‘admin node’ for the web UI. The excellant Fedora LiveCD creator tools obviously already deal with the first use case. For the second though we don’t currently have a solution. The way we build the admin node appliance is to boot a virtual machine and run anaconda with a kickstart, and then grab the resulting installed disk image. While this works it involves a number of error-prone steps. Appliance images are not inherantly different from LiveCDs – instead of a ext3 filesystem inside an ISO using syslinux, we want a number of filesystems inside a partitioned disk using grub. The overall OS installation method is the same in both use cases.

After a day’s hacking I’ve managed to re-factor the internals of the LiveCD creator, and add a new installation class able to create virtual machine appliances. As its input it takes a kickstart file, and the names and sizes for one or more output files (which will act as the disks). It reads the ‘part’ entries from the kickstart file and uses parted to create suitable partitions across the disks. It then uses kpartx to map the partitions and mounts them all in the chroot. The regular LiveCD installation process then takes place. Once complete, it writes a grub config and installs the bootloader into the MBR. The result is one or more files representing the appliance’s virtual disks which can be directly booted in KVM / Xen / VMware.

The virt-image tool defines a simple XML format which can be used to describe a virtual appliance. It specifies things like minimum recommended RAM and VCPUs, the disks associated with the appliance, and the hypervisor requirements for booting it (eg Xen paravirt vs bare metal / fullvirt). Given one of these XML files, the virt-image tool can use libvirt to directly deploy a virtual machine without requiring any further user input. So an obvious extra feature for the virtual appliance creator is to output a virt-image XML description. With a demo kickstart file for the oVirt admin node, I end up with 2 disks:

-rwxr-xr-x 1 root     root     5242880001 2008-02-17 14:48 ovirt-wui-os.raw
-rwxr-xr-x 1 root     root     1048576001 2008-02-17 14:48 ovirt-wui-data.raw

And an associated XML file

<image>
  <name>ovirt-wui</name>
  <domain>
    <boot type='hvm'>
      <guest>
        <arch>x86_64</arch>
      </guest>
      <os>
        <loader dev='hd'/>
      </os>
      <drive disk='ovirt-wui-os.raw' target='hda'/>
      <drive disk='ovirt-wui-data.raw' target='hdb'/>
    </boot>
    <devices>
      <vcpu>1</vcpu>
      <memory>262144</memory>
      <interface/>
      <graphics/>
    </devices>
  </domain>
  <storage>
    <disk file='ovirt-wui-os.raw' use='system' format='qcow2'/>
    <disk file='ovirt-wui-data.raw' use='system' format='qcow2'/>
  </storage>
</image>

To deploy the appliance under KVM I run

# virt-image --connect qemu:///system ovirt-wui.xml
# virsh --connect qemu:///system list
 Id Name                 State
----------------------------------
  1 ovirt-wui            running

Now raw disk images are really quite large – in this example I have a 5 GB and a 1 GB image. The LiveCD creator saves space by using resize2fs to shrink the ext3 filesystem, but this won’t help disk images since the partitions are a fixed size regardless of what the filesystem size is. So to allow smaller the appliance creator is able to call out to qemu-img to convert the raw file into a qcow2 (QEMU/KVM) or vmdk (VMWare) disk image, both of which are grow on demand formats. The qcow2 image can even be compressed. Wtth the qcow2 format the disks for the oVirt WUI reduce to 600 KB and 1.9 GB.

The LiveCD tools have already seen immense popularity in the Fedora community. Once I polish off this new code to be production quality, it is my hope that we’ll see similar uptake by people interested in creating and distributing appliances. The great thing about basing the appliance creator on the Live CD codebase and using kickstart files for both, is that you can easily switch between doing regular anaconda installs, creating Live CDs and creating appliances at will, with a single kickstart file.

Progress in Fedora 9 Xen pv_ops kernel development

Posted: February 1st, 2008 | Author: Daniel Berrange | Filed under: Uncategorized | 1 Comment »

You may recall my announcement of our plans for Fedora 9 Xen kernels. The 5 second summary is that we’re throwing away the current Xen kernels, and writing Xen support on top of paravirt_ops, for both DomU and Dom0, and both i386 & x86_64. Hugely ambitious given the limited time scales involved and the fact that only i386 DomU was working when we started this project.

Just minutes ago, after many many weeks work, Stephen Tweedie reached a very important milestone – the first kernel build capable of fully booting on Dom0, including the IOAPIC & DMA support neccessary to run real hardware drivers – ie the ability to access your real disks once the initrd is done :-)

(10:22:13 AM) sct: It boots
(10:22:14 AM) sct: It runs
(10:22:16 AM) sct: I can ssh into it
(10:22:50 AM) sct: [root@ghost ~]# dmesg|grep para
(10:22:50 AM) sct: Booting paravirtualized kernel on Xen
(10:22:50 AM) sct: [root@ghost ~]#

It is beginning to look like we might actually succeed in our goals in time for Fedora 9 – congrats due to Stephen & the rest of the team working on this !

New release of Test-AutoBuild 1.2.1

Posted: December 10th, 2007 | Author: Daniel Berrange | Filed under: Uncategorized | No Comments »

I finally got around to doing some more work on Test-AutoBuild – a build and test automation framework for upstream developers. It checks sources out of SCM repos (CVS, Subversion, SVK, GNU Arch, Mercurial, Perforce), runs any build and test processes. It detects any RPMs generated during the build and publishes them in a YUM repo. It also publishes HTML status pages showing build logs, list of generated packages, any artifacts generated (eg, code test coverage reports, API documentation) and changelogs from the SCM repo. It is a similar system to CruiseControl, but is more powerful since it directly understands the idea of module dependancies, and so can intelligently manage chained builds of multiple dependant modules. We use this in the ET group for testing our virtualization stack. Our nightly builder builds libvirt and gtk-vnc first, then builds virt-viewer and virt-install against these builds, and finally builds virt-manager against all of them. So any change in libvirt gets validated to make sure it doesn’t break apps using libvirt. Since autobuild understands the dependancies, it can do intelligent build caching. eg if there were new changes in the libvirt SCM repo, but none in the virt-manager repos, it will still do a rebuild of virt-manager as a regression test

This new release version 1.2.1 was all about making the SCM checkout process more reliable. Previously if a module could not be checked out (eg due to a server being down, or a config file typo) the entire build cycle would be aborted. With the new release, the troublesome module is simply skipped and the SCM logs published for the admin to diagnose – other modules in the build cycle continue to be built

How I learned to stop worrying and love IPv6

Posted: August 16th, 2007 | Author: Daniel Berrange | Filed under: Uncategorized | No Comments »

Any OS running Fedora Core 6 or later has IPv6 networking support enabled out of the box. Most people will never notice and/or care since they’re only ever connected to IPv4 networks. A few months back now though I decided it was time to give IPv6 a try for real….

I’ve got two servers on the Internet running in UserModeLinux guests, one running Debian, the other Fedora Core 6, and then a home network provided by a LinkSys router running OpenWRT White Russian. My goal was provide full IPv6 connectivity to all of them.

Home Router

I tackled the problem of the home router first. The OpenWRT wiki has an IPv6 Howto, describing various setups. I decided to get a tunnel from the fine folks at SixXS. My Verizon DSL only provides a dynamic IPv4 address and regular IPv6 over IPv4 tunnels require the server end to know the IPv4 address of your local endpoint. Obviously this is a bit of a problem with a dynamic IPv4 endpoint. SixXS though have a funky way around this in the form of their AICCU daemon which sets up a heartbeat from your local endpoint to their server. Thus should your IPv4 address ever change it can (securely with SSL) inform the server of your changed configuration. So I registered with SixXS, requested an IPv6 tunnel and a short while later they approved me. The service is open to anyone who wants IPv6 connectivity – the approval process is mainly to help avoid abuse & frivilous requests. I was fortunate in that OCCAID are providing an IPv6 tunnel server just a few miles away in Boston – there’s other tunnel servers dotted around but mostly concentrated in America or Europe at this time.

With my IPv6 address allocated it and the OpenWRT guide handy my router was up & running with IPv6 connectivity – I could do ping sites over IPv6 eg

# ping6 www.kame.net
PING www.kame.net (2001:200:0:8002:203:47ff:fea5:3085): 56 data bytes
64 bytes from 2001:200:0:8002:203:47ff:fea5:3085: icmp6_seq=0 ttl=50 time=513.2 ms
64 bytes from 2001:200:0:8002:203:47ff:fea5:3085: icmp6_seq=1 ttl=50 time=512.5 ms
64 bytes from 2001:200:0:8002:203:47ff:fea5:3085: icmp6_seq=2 ttl=50 time=519.5 ms

OpenWRT only ships with an IPv4 firewall as standard, so I quickly added ip6tables rules to deny all incoming traffic to the router. Even though port-scanning the entire IPv6 address space is not practical, only a tiny portion is active, and nearly all tunnels end up using addresses ending in :1 and :2, so a firewall is a must no matter what.

Home Network

To ensure you are serious about making use of their services, SixXS operate a credit system for admin requests. You start off with enough credits to request a IPv6 tunnel, but not enough to request an IPv6 subnet. To gain credits you have to prove you can keep the tunnel operational 24 hours a day for 7 days in a row – you then start gaining credits for each day’s uptime. So I had a slight pause before I could move onto setting up the home network.

Fortunately the LinkSys router is very reliable and so after a week I had enough uptime and thus enough credits to request an IPv6 subnet. In the brave new world of 128 bit addressing there’s no shortage of addresses, so to simplify routing, whenever someone needs a block of addresses they’ll typically be allocated an entire /48. That’s right /48 – you’ll be given more global IPv6 addresses for your personal use, than there are total IPv4 addresses in existance. Another interesting difference is that IPv6 subnets are not technically ’sold’ – they are merely ‘loaned’ to end users. The upshot is that there’s no issue of having to pay your stinkin’ DSL/Cable ISP $$$ per month for one or two extra addresses.

Having got the subnet allocated, the first step is to configure an IP address on the LAN interface of the LinkSys box. With OpenWRT this just required editing /etc/init.d/S40network to add “ip -6 addr add 2001:XXXX:XXXX:XXXX::1/64 dev br0″ (where 2001:XXXX:XXXX:XXXX is my subnet’s prefix). When the various IPv6 protocols were specced out a big deal was made of the fact that there would be no NAT anywhere, and that client configuration would be completely automatic & be able to dynamically reconfigure itself on the fly. The key to this is what they call a ‘router advertisment daemon’. On Linux this is the ‘radvd’ program. If you only have a single outgoing net connection, and a single local network, then configuring it is incredibly easy. Simply edit /etc/radvd.conf file and fill in the IPv6 address prefix for your subnet as allocated by SixXS. Then start the daemon.

Remember I just mentioned network configuration would be automatic – well look at any Fedora box plugged into your local network at this point. You’ll see they all just got globally routable IPv6 addresses assigned to their active network interfaces. Pop up a web browser and visit Kame and you’ll see an animated dancing turtle logo! IPv4 users only see a static image…

Bytemark Server

One of my web servers is running Debian in a User Mode Linux instance at Bytemark in the UK. The good news is that Bytemark have already taken care of getting IPv6 connectivity into their network, so there’s no need to use a tunnel on any server hosted by them. Simply ask their helpdesk to allocate you an IPv6 address from their pool, and add it to your primary ethernet address. Again don’t forget to setup ip6tables firewall rules before doing this.
For Debian configuring the eth0 was a mere matter of editing /etc/network/interfaces and adding

iface eth0 inet6 static
        address 2001:XXXX:XXXX:XXXX::2
        netmask 64
        up ip route add 2000::/3 via 2001:XXXX:XXXX:XXXX::1

Again, with ‘2001:XXXX:XXXX:XXXX’ being the address they allocated to your server.
Since SSH listens for IPv6 connections by default, with the interface address configured I could now SSH from my laptop at home to my server using IPv6. Type ‘who’ and you’ll see a big long IPv6 address against your username if its working correctly.

Linode Server

My other web server is hosted by Linode. Unfortunately they don’t provide direct IPv6 connectivity so I had to use a tunnel. Since I do have a permanent static IPv4 address though I could use a regular IPv6-over-IPv4 tunnel rather than the dynamic heartbeat one I used at home with SixXS. For the sake of redundancy I decided to get my tunnel from a different provider, this time choosing Hurricane. When registering with them you provide a little contact info and the IPv4 address of your server. A short while later they’ll typically approve the request & activate their end of the tunnel. It is then a matter of configuring your end. This machine was running Fedora Core 6, so creating a tunnel requires adding a file /etc/sysconfig/network-scripts/ifcfg-sit1 containing something like

DEVICE=sit1
BOOTPROTO=none
ONBOOT=yes
IPV6INIT=yes
IPV6TUNNELIPV4=YY.YY.YY.YY
IPV6ADDR=2001:XXXX:XXXX:XXXX::2/64

Where YY.YY.YY.YY was the IPv4 address of hurricane’s tunnel server, and 2001:XXXX:XXXX:XXXX was the IPv6 address prefix they allocated for my server. A quick ifup later and this server too has IPv6 connectivity.

The summary

This was all spread out over a couple of weeks, but by the end of it I had got both servers and my entire home network all operational with fully routable, global IPv6 connectivity. I have three differents types of IPv6 connectivity – direct (from Bytemark), static tunnel (from Hurricane), and a dynamic tunnel (from SixXS – they offer static tunnels too). If you have a static IPv4 address there’s a fourth way to get connected called 6-to-4 which maps your Ipv4 address into the IPv6 space and uses anycast routing. With so many ways to get IPv6 connectivity it doesn’t matter if your crappy DSL/Cable ISP doesn’t offer IPv6 – simply take them out of the equation.

One of the great things about being rid of NAT is that I can directly SSH into any machine at home from outside my network – no need for VPNs, or special reverse proxy rules through the NAT gateway. IPv6 addresses are crazily long, so the one final thing I did was to setup DNS entries for all my boxes, including a DNS zone for my home network. Remember how all clients on the home network auto-configure themselves, well this is done based on their network prefix and their MAC address, so they’ll always auto-configure themselves to the same IPv6 address. Makes it easy to give them permanent DNS mappings, without needing to manually administer a DHCP server.

Security Enhanced Test-AutoBuild

Posted: July 2nd, 2007 | Author: Daniel Berrange | Filed under: Uncategorized | No Comments »

In the latter half of last year I was mulling over the idea of writing SELinux policy for Test-AutoBuild. I played around a little bit, but never got the time to really make a serious attempt at it. Before I go on, a brief re-cap on the motivation…

Test-AutoBuild is a framework for providing continous, unattended, automated software builds. Each run of the build engine checks the latest source out from CVS, calculates a build order based on module dependancies, builds each modules, and the publishes the results. The build engine typically runs under a dedicated system user account – builder – to avoid any risk of the module build process compromising a host (either accidentally, or delibrately). This works reasonably well if you are only protecting against accidental damage from a module build – eg building apps maintained inside your organization. If building code from source repositories out on the internet though there is a real risk of delibrately hostile module build processes. A module may be trojaned so that its build process attempts to scan your internal network, or it may trash the state files of the build engine itself – both the engine & the module being built are under the same user account. There is also the risk that the remote source control server has been trojaned to try and exploit flaws in the client.

And so enter SELinux… The build engine is highly modular in structure, with different tasks in the build workflow being pretty well isolated. So the theory was that it ought to be possible to write SELinux policy to guarentee separation of the build engine, from the SCM tools doing source code checkout, from the module build processes, and other commands being run. As an example, within a build root there a handful of core directories

root
 |
 +- source-root   - dir in which module source is checked out
 +- package-root  - dir in which RPMs/Debs & other packages are generated
 +- install-root  - virtual root dir for installing files in 'make install'
 +- build-archive - archive of previous successful module builds
 +- log-root      - dir for creating log files of build process
 +- public_html   - dir in which results are published

All these dirs are owned by the builder user account. The build engine itself provides all the adminsitrative tasks for the build workflow, so generally requires full access to all of these directories. The SCM tools, however, merely need to be able to check out files into the source-root and create logs in the log-root. The module build process needs to be able to read/write in the source-root, package-root and install-root, as well as creating logs in the log-root. So, given suitable SELinux policy it ought to be possible to lock down the access of the SCM tools and build process quite significantly.

Now aside from writing the policy there are a couple of other small issues. The primary one is that the build engine has to run in a confined SELinux context, and has to be able to run SCM tools and build processes in a different context. For the former, I choose to create a ‘auto-build-secure’ command to augment the ‘auto-build’ command. This allows user to easily run the build process in SELinux enforced, or traditional unconfined modes. In the latter cases, most SELinux policy has automated process context transitions based on the binary file labels. This isn’t soo useful for autobuild though, because the script we’re running is being checked out direct from a SCM repo & thus not labelled. The solution for this is easily though – after fork()ing, but before exec()ing the SCM tools / build script we simply write the desired target context into /proc/self/attr/exec.

So with a couple of tiny modifications to the build engine, and many hours of writing suitable policy for Test-AutoBuild, its now possible to run the build engine under a strictly confined policy. There is one horrible troublespot though. Every application has its own build process & set of operations is wishes to perform. Writing a policy which confines the build process as much as possible, while still keeping it secure is very hard indeed. In fact it is effectively unsolveable in the general case.

So what to do ? SELinux booleans provide a way to toggle on/off various capabilities system wide. If building multiple applications though, it may be desirable to run some under a more confined policy than others – booleans are system wide. The solution I think is to define a set of perhaps 4 or 5 different execution contexts with differing levels of privileges. As an example, some contexts may allow outgoing network access, while others may deny all network activity. So the build admin can use the most restrictive policy by default, and a less restrictive policy for applications which are more trusted.

This weekend was just the start of experimentation with SELinux policy in regards to Test-AutoBuild, but it was more far, far successful than I ever expected it to be. The level of control afforded by SELinux is awesome, and with the flexibility of modifying the application itself too, the possibilities for fine grained access control are enourmous. One idea I’d like to investigate is whether it is possible to define new SELinux execution contexts on-the-fly. eg, instead of all application sources being checked out under a single ‘absource_t’ file context, it would be desirable to create a new source file context per-applicaiton. I’m not sure whether SELinux supports this idea, but it is interesting to push the boundaries here nonetheless…