Running a full Fedora OS inside a libvirt LXC guest

Posted: August 12th, 2013 | Filed under: libvirt, Virt Tools | Tags: , , , | 2 Comments »

Historically, running a Linux OS inside an LXC guest, has required a execution of a set of hacky scripts which do a bunch of customizations to the default OS install to make it work in the constrained container environment. One of the many benefits to Fedora, of the switch over to systemd, has been that a default Fedora install has much more sensible behaviour when run inside containers. For example, systemd will skip running udev inside a container since containers do not get given permission to mknod – the /dev is pre-populated with the whitelist of devices the container is allowed to use. As such running Fedora inside a container is really not much more complicated than invoking yum to install desired packages into a chroot, then invoking virt-install to configure the LXC guest.

As a proof of concept, on Fedora 19 I only needed to do the following to setup a Fedora 19 environment suitable for execution inside LXC

 # yum -y --releasever=19 --nogpg --installroot=/var/lib/libvirt/filesystems/mycontainer \
          --disablerepo='*' --enablerepo=fedora install \
          systemd passwd yum fedora-release vim-minimal openssh-server procps-ng
 # echo "pts/0" >> /var/lib/libvirt/filesystems/mycontainer/etc/securetty
 # chroot /var/lib/libvirt/filesystems/mycontainer /bin/passwd root

It would be desirable to avoid the manual editing of /etc/securetty. LXC guests get their default virtual consoles backed by a /dev/pts/0 device, which isn’t listed in the securetty file by default. Perhaps it is a simple as just adding that device node unconditionally. Just have to think about whether there’s a reason to not do that which would impact bare metal. With the virtual root environment ready, now virt-install can be used to configure the container with libvirt

# virt-install --connect lxc:/// --name mycontainer --ram 800 \
              --filesystem /var/lib/libvirt/filesystems/mycontainer,/

virt-install will create the XML config libvirt wants, and boot the guest, opening a connection to the primary text console. This should display boot up messages from the instance of systemd running as the container’s init process, and present a normal text login prompt.

If attempting this with systemd-nspawn command, login would fail because the PAM modules audit code will reject all login attempts. This is really unhelpful behaviour by PAM modules which can’t be disabled by any config, except for booting the entire host with audit=0 which is not very desirable. Fortunately, however, virt-install will configure a separate network namespace for the container by default, which will prevent the PAM module from talking to the kernel audit service entirely, giving it a ECONNREFUSED error. By a stoke of good luck, the PAM modules treat ECONNREFUSED as being equivalent to booting with audit=0, so everything “just works”. This is nice case of two bugs cancelling out to leave no bug :-)

While the above commands are fairly straightforward, it is a goal of ours to simplify live even further, into a single command. We would like to provide a command that looks something like this:

# virt-bootstrap --connect lxc:/// --name mycontainer --ram 800 \
                 --root /var/lib/libvirt/filesystems/mycontainer \
                 --osid fedora19

The idea is that the ‘–osid’ value will be looked up in the libosinfo database. This will have details of the software repository for that OS, and whether it uses yum/apt/ebuild/somethingelse. virt-bootstrap will then invoke the appropriate packaging tool to populate the root filesystem, and then boot the container in one single step.

One final point is that LXC in Fedora still can’t really be considered to be secure without the use of SELinux. The commands I describe above don’t do anything to enable SELinux protection of the container at this time. This is obviously something that ought to be fixed. Separate from this, upstream libvirt now has support for the kernel user namespace feature. This enables plain old the DAC framework to provide a secure container environment. Unfortunately this kernel feature is still not available in Fedora kernel builds. It is blocked on upstream completion of patches for XFS. Fortunately this work seems to be moving forward again, so if we’re lucky it might just be possible to enable user namespaces in Fedora 20, finally making LXC reasonably secure by default even without SELinux.

Fine grained access control in libvirt using polkit

Posted: August 12th, 2013 | Filed under: Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , | 3 Comments »

Historically access control to libvirt has been very coarse, with only three privilege levels “anonymous” (only authentication APIs are allowed), “read-only” (only querying information is allowed) and “read-write” (anything is allowed). Over the past few months I have been working on infrastructure inside libvirt to support fine grained access control policies. The initial code drop arrived in libvirt 1.1.0, and the wiring up of authorization checks in drivers was essentially completed in libvirt 1.1.1 (with the exception of a handful of APIs in the legacy Xen driver code). We did not wish to tie libvirt to any single access control system, so the framework inside libvirt is modular, to allow for multiple plugins to be developed. The only plugin provided at this time makes use of polkit for its access control checks. There was a second proof of concept plugin that used SELinux to provide MAC, but there are a number of design issues still to be resolved with that, so it is not merged at this time.

The basic framework integration

The libvirt library exposes a number of objects (virConnectPtr, virDomainPtr, virNetworkPtr, virNWFilterPtr, virNodeDevicePtr, virIntefacePtr, virSecretPtr, virStoragePoolPtr, virStorageVolPtr), with a wide variety of operations defined in the public API. Right away it was clear that we did not wish to describe access controls based on the names of the APIs themselves. For each object there are a great many APIs which all imply the same level of privilege, so it made sense to collapse those APIs onto single permission bits. At the same time, some individual APIs could have multiple levels of privilege depending on the flags set in parameters, so would expand to multiple permission bits. Thus the first task to was come up with a list of permission bits which were able to cover all APIs. This was encoded in the internal viraccessperm.h header file. With the permissions defined, the next big task was to define a mapping between permissions and APIs. This mapping was encoded as magic comments in the RPC protocol definition file. This in turn allows the code for performing access control checks to be automatically generated, thus minimizing scope for coding errors, such as forgetting to perform checks in a method, or performing the wrong checks.

The final coding step was for the automatically generated ACL check methods to be inserted into each of the libvirt driver APIs. Most of the ACL checks validate the input parameters to ensure the caller is authorized to operate on the object in question. In a number of methods, the ACL checks are used to restrict / filter the data returned. For example, if asking for a list of domains, the returned list must be filtered to only those the client is authorized to see. While the code for checking permissions was auto-generated, it is not practical to automatically insert the checks into each libvirt driver. It was, however, possible to write scripts to perform static analysis on the code to validate that each driver had the full set of access control checks present. Of course it helps to tell developers / administrators which permissions apply to each API, so the code which generates the API reference documentation was also enhanced so that the API reference lists the permissions required in each circumstance.

The polkit access control driver

Libvirt has long made use of polkit for authenticating connections over its UNIX domain sockets. It was thus natural to expand on this work to make use of polkit as a driver for the access control framework. Historically this would not have been practical, because the polkit access control rule format did not provide a way for the admin to configure access control checks on individual object instances – only object classes. In polkit 0.106, however, a new engine was added which allowed admins to use javascript to write access control policies. The libvirt polkit driver takes object class names and permission names to form polkit action names. For example, the “getattr” permission on the virDomainPtr class maps to the polkit org.libvirt.api.domain.getattr permission. When performing a access control check, libvirt then populates the polkit authorization “details” map with one or more attributes which uniquely identify the object instance. For example, the virDomainPtr object gets “connect_driver” (libvirt driver name), “domain_uuid” (globally unique UUID), and “domain_name” (host local unique name) details set. These details can be referenced in the javascript policy to scope rules to individual object instances.

Consider a local user berrange who has been granted permission to connect to libvirt in full read-write mode. The goal is to only allow them to use the QEMU driver and not the Xen or LXC drivers which are also available in libvirtd. To achieve this we need to write a rule which checks whether the connect_driver attribute is QEMU, and match on an action name of org.libvirt.api.connect.getattr. Using the javascript rules format, this ends up written as

polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.api.connect.getattr" &&
        subject.user == "berrange") {
          if (action.lookup("connect_driver") == 'QEMU') {
            return polkit.Result.YES;
          } else {
            return polkit.Result.NO;
          }
    }
});

As another example, consider a local user berrange who has been granted permission to connect to libvirt in full read-write mode. The goal is to only allow them to see the domain called demo on the LXC driver. To achieve this we need to write a rule which checks whether the connect_driver attribute is LXC and the domain_name attribute is demo, and match on a action name of org.libvirt.api.domain.getattr. Using the javascript rules format, this ends up written as

polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.api.domain.getattr" &&
        subject.user == "berrange") {
          if (action.lookup("connect_driver") == 'LXC' &&
              action.lookup("domain_name") == 'demo') {
            return polkit.Result.YES;
          } else {
            return polkit.Result.NO;
          }
    }
});

Futher work

While the access control support in libvirt 1.1.1 provides a useful level of functionality, there is still more that can be done in the future. First of all, the polkit driver needs to have some performance optimization work done. It currently relies on invoking the ‘pkcheck’ binary to validate permissions. While this is fine for hosts with small numbers of objects, it will quickly become too costly. The solution here is to directly use the DBus API from inside libvirt.

The latest polkit framework is fairly flexible in terms of letting us identify object instances via the details map it associates with every access control check. It is far less flexible in terms of identifying the client user. It is fairly locked into the idea of identifying users via remote PID or DBus service name, and then exposing the username/groupnames to the javascript rules files. While this works fine for local libvirt connections over UNIX sockets, it is pretty much useless for connections arriving on libvirt’s TCP sockets. In the latter case the libvirt user is identied by a SASL username (typically a Kerberos principal name), or via an x509 certificate distinguished name (when using client certs with TLS). There’s no way official way to feed the SASL username of x509 dname down to the polkit javascript authorization rules files. Requests upstream to allow extra identifying attributes to be provide for the authorization subject have not been productive, so I’m considering (ab-)using the “details” map to provide identifying info for the users, alongside the identifying info for the object.

As mentioned earlier, there was a proof of concept SELinux driver written, that is yet to be finished. The work there is around figuring out / defining what the SELinux context is for each object to be checked and doing some work on SELinux policy. I think of this work as providing a capability similar to that done in PostgreSQL to enable SELinux MAC checks. It would be very nice to have a system which provides end-to-end MAC. I refer to this as sVirt 2.0 – the first (current) version of sVirt protected the host from guests – the second (future) version would also protect the host from management clients.

The legacy XenD based Xen driver has a couple of methods which lack access control, due to the inability to get access to the identifying attributes for the objects being operated upon. While we encourage people to use the new libxl based Xen driver, it is desirable to have the legacy Xen driver fully controlled for those people using legacy virtualization hosts. Some code refactoring will be required to fix the legacy Xen driver, likely at the cost of making some methods less efficient.

If there is user demand, work may be done to write an access control driver which is natively implemented entirely within libvirt. While the polkit javascript engine is fairly flexible, I’m not much of a fan of having administrators write code to define their access control policy. It would be preferable to have a way to describe the policy that was entirely declarative. With a libvirt native access control driver, it would be possible to create a simple declarative policy file format tailored to our precise needs. This would let us solve the problem of providing identifying info about the subject being checked. It would also have the potential to be more scalable by avoiding the need to interact with any remote authorization deamons over DBus. The latter could be a big deal when an individual API call needs to check 1000’s of permissions at once. The flipside of course, is that a libvirt specific access control driver is not good for interoperability across the broader system – the standardized use of polkit is good in that respect. There’s no technical reason why we can’t support multiple access control drivers to give the administrator choice / flexibility.

Finally, this work is all scheduled to arrive in Fedora 20, so anyone interested in testing it should look at current rawhide, or keep an eye out for the Fedora 20 virtualization test day.

EDITED: Aug 15th: Change example use of ‘action._detail_connect_driver’ to ‘action.lookup(“connect_driver”)’

A new (configurable) cgroups layout for libvirt with QEMU, KVM & LXC

Posted: May 13th, 2013 | Filed under: Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , , | 1 Comment »

Several years ago I wrote a bit about libvirt and cgroups in Fedora 12. Since that time, much has changed, and we’ve learnt alot about the use of cgroups, not all of it good.

Perhaps the biggest change has been the arrival of systemd, which has brought cgroups to the attention of a much wider audience. One of the biggest positive impacts of systemd on cgroups, has been a formalization of how to integrate with cgroups as an application developer. Libvirt of course follows these cgroups guidelines, has had input into their definition & continues to work with the systemd community to improve them.

One of the things we’ve learnt the hard way is that the kernel implementation of control groups is not without cost, and the way applications use cgroups can have a direct impact on the performance of the system. The kernel developers have done a great deal of work to improve the performance and scalability of cgroups but there will always be a cost to their usage which application developers need to be aware of. In broad terms, the performance impact is related to the number of cgroups directories created and particularly to their depth.

To cut a long story short, it became clear that the directory hierarchy layout libvirt used with cgroups was seriously sub-optimal, or even outright harmful. Thus in libvirt 1.0.5, we introduced some radical changes to the layout created.

Historically libvirt would create a cgroup directory for each virtual machine or container, at a path $LOCATION-OF-LIBVIRTD/libvirt/$DRIVER-NAME/$VMNAME. For example, if libvirtd was placed in /system/libvirtd.service, then a QEMU guest named “web1” would live at /system/libvirtd.service/libvirt/qemu/web1. That’s 5 levels deep already, which is not good.

As of libvirt 1.0.5, libvirt will create a cgroup directory for each virtual machine or container, at a path /machine/$VMNAME.libvirt-$DRIVER-NAME. First notice how this is now completely disassociated from the location of libvirtd itself. This allows the administrator greater flexibility in controlling resources for virtual machines independently of system services. Second notice that the directory hierarchy is only 2 levels deep by default, so a QEMU guest named “web” would live at /machine/web1.libvirt-qemu

The final important change is that the location of virtual machine / container can now be configured on a per-guest basis in the XML configuration, to override the default of /machine. So if the guest config says

  <resource>
    <partition>/virtualmachines/production</partition>
  </resource>

then libvirt will create the guest cgroup directory /virtualmachines.partition/production.partition/web1.libvirt-qemu. Notice that there will always be a .partition suffix on these user defined directories. Only the default top level directories /machine, /system and /user will be without a suffix. The suffix ensures that user defined directories can never clash with anything the kernel will create. The systemd PaxControlGroups will be updated with this & a few escaping rules soon.

There is still more we intend todo with cgroups in libvirt, in particular adding APIs for creating & managing these partitions for grouping VMs, so you don’t need to go to a tool outside libvirt to create the directories.

One final thing, libvirt now has a bit of documentation about its cgroups usage which will serve as the base for future documentation in this area.

Automated install of Fedora 18 ARM on a Samsung Google Chromebook

Posted: March 31st, 2013 | Filed under: Fedora | Tags: , , , , , | 23 Comments »

Back in November last year, I wrote about running Fedora 17 ARM on a Samsung Google Chromebook, via an external SD card. With Fedora 18 now out, I thought it time to try again, this time replacing ChromeOS entirely, installing Fedora 18 ARM to the 16GB internal flash device. Igor Mammedov of the Red Hat KVM team, has previously written a script for automating the install of Fedora 17 onto the internal flash device, including the setup of chained bootloader with nv-uboot. I decided to take his start, update it to Fedora 18 and then extend its capabilities.

If you don’t want to read about what the script does, skip to the end

ChromeOS bootloader

The Samsung ARM Chromebook bootloader is a fork of u-boot. The bootloader is setup todo “SecureBoot” of Google ChromeOS images only by default. There is no provision for providing your own verification keys to the bootloader, so the only way to run non-ChromeOS images is to switch to “Developer Mode” and sign kernels using the developer keys. The result is that while you can run non-ChromeOS operating systems, they’ll always be a second class citizen – since the developer keys are publically available, in developer mode, it’ll happily boot anyone’s (potentially backdoored) kernels. You’re also stuck with an annoying 30 second sleep in the bootloader splash screen which you can only get around by pressing ‘Ctrl-D’ on every startup. The bootloader is also locked down, so you can’t get access to the normal u-boot console – if you want to change the kernel args you need to re-generate the kernel image, which is not so much fun when troubleshooting boot problems with new kernels.

The Chromebook bootloader can’t be (easily) replaced since the flash it is stored in is set read-only. I’ve seen hints in Google+ that you can get around this by opening up the case and working some magic with a soldering iron to set the flash writable again, but I don’t fancy going down that route.

It is, however, possible to setup a chained bootloader, so that the built-in uboot will first boot nv-uboot, which is a variant of the bootloader that has the console enabled and boots any kernel without requiring them to be signed. We still have the annoying 30 second sleep at boot time, and we still can’t do secure boot of our Fedora install, but we at least get an interactive boot console for troubleshooting which is important for me.

ChromeOS Partition Layout

Before continuing it is helpful to understand how ChromeOS partitions the internal flash. It uses GPT rather than MBR, and sets up 12 partitions, though 4 of these (ROOT-C, KERB-C, reserved, reserved) are completely unused and 2 are effectively empty (OEM, RWFW) on my system.

# Device            Label      Offset    Length     Size
# /dev/mmcblk0p1  - STATE        282624  11036672   10 GB
# /dev/mmcblk0p2  - KERN-A        20480     16384   16 MB
# /dev/mmcblk0p3  - ROOT-A     26550272   2097154    2 GB
# /dev/mmcblk0p4  - KERN-B        53248     16384   16 MB
# /dev/mmcblk0p5  - ROOT-B     22355968   2097154    2 GB
# /dev/mmcblk0p6  - KERN-C        16448         0    0 MB
# /dev/mmcblk0p7  - ROOT-C        16449         0    0 MB
# /dev/mmcblk0p8  - OEM           86016     16384   16 MB
# /dev/mmcblk0p9  - reserved      16450         0    0 MB
# /dev/mmcblk0p10 - reserved      16451         0    0 MB
# /dev/mmcblk0p11 - RWFW             64      8192    8 MB
# /dev/mmcblk0p12 - EFI-SYSTEM   249856     16384   16 MB

The important partitions are

  • KERN-A – holds the 1st (primary) kernel image
  • KERN-B – holds the 2nd (backup) kernel image
  • ROOT-A – ChromeOS root filesystem to go with primary kernel
  • ROOT-B – ChromeOS root filesystem to go with backup kernel
  • EFI-SYSTEM – EFI firmware files – empty by default
  • STATE – ChromeOS user data partition

Notice from the offsets, that the order of the partitions on flash, does not match the partition numbers. The important thing is that STATE, ROOT-A and ROOT-B are all at the end of the partition table.

Desired Fedora partition layout

The goal for the Fedora installation is to delete the ROOT-A, ROOT-B and STATE partitions from ChromeOS, and replace them with 3 new partitions:

  • ROOT – hold the Fedora root filesystem (ext4, unencrypted, 4 GB)
  • BOOT – hold the /boot filesystem (ext2, unencrypted, 200 MB)
  • HOME – hold the /home filesystem (ext4, LUKS encrypted, ~11 GB)

The /boot partition must sadly be ext2, since the nv-uboot images Google provide don’t have ext4 support enabled, and I don’t fancy building new images myself. It would be possible to have 1 single partition for both the root and home directories, but keeping them separate should make it easier to upgrade by re-flashing the entire ROOT partition, and also avoids the need to build an initrd to handle unlock of the LUKS partition.

Chained bootloader process

The KERN-A and KERN-B partitions will be used to hold the chained nv-uboot bootloader image, so the built-in bootloader will first load the nv-uboot loader image. nv-uboot will then look for a file /u-boot/boot.scr.img file in the EFI-SYSTEM partition. This file is a uboot script telling nv-uboot what partitions the kernel and root filesystem are stored in, as well as setting the kernel boot parameters. The nv-uboot image has an annoying assumption that kernel image is stored on the root filesystem, which isn’t the case since we want a separate /boot, so we must override some of the nv-uboot environment variables to force the name of the root partition for the kernel command line. The upshot is that that the boot.scr.img file is generated from the following configuration

setenv kernelpart 2                                                                                                                               
setenv rootpart 1                                                                                                             
setenv cros_bootfile /vmlinux.uimg                                                                                                                
setenv regen_all ${regen_all} root=/dev/mmcblk0p3                                                                          
setenv common_bootargs                                                                                                                            
setenv dev_extras console=tty1 lsm.module_locking=0 quiet      

The actual kernel to be booted is thus ‘/vmlinux.uimg’ in the /boot partition of the Fedora install. There is no Fedora kernel yet that boots on the ARM ChromeBook, so this is a copy of the kernel from the ChromeOS install. Hopefully there will be official Fedora kernels in Fedora 19, or at least a re-mix with them available. The lsm.module_locking=0 argument here is needed to tell the ChromeOS kernel LSM to allow kernel module loading.

Installation process

With all this in mind, the script does its work in several stages, requiring a reboot after each stage

  1. Running from a root shell in ChromeOS (which must be in developer mode), the filesystem in the ROOT-B partition is deleted and replaced with a temporary Fedora ARM filesystem. The KERN-A and KERN-B partitions have their contents replaced with the nv-uboot image. The kernel image from ChromeOS is copied into the Fedora root filesystem, and the keyboard/timezone/locale settings are also copied over. The installation script is copied to /etc/rc.d/rc.local, so that stage 2 will run after reboot. The system is now rebooted, so that nv-uboot will launch the Fedora root filesystem
  2. Running from rc.local in the temporary Fedora root filesystem, the ROOT-A and STATE partitions are now deleted to remove the last traces of ChromeOS. The ROOT and BOOT partitions are then created and formatted. The contents of the temporary Fedora root filesystem are now copied into the new ROOT partition. The system is now rebooted, to get out of the temporary Fedora root filesystem and into the new root.
  3. Running from rc.local in the final Fedora root filesystem, the ROOT-B partition is now deleted to remove the temporary Fedora root filesystem. In the free space that is now available, a HOME partition is created. At this point the user is prompted to provide the LUKS encryption passphrase they wish to use for /home. The ALSA UCM profiles for the ChromeBook are now loaded and the ALSA config saved. This will help avoid users accidentally melting their speakers later. An Xorg config file is created to configure the touchpad sensitivity, firstboot is enabled and the root account is locked. Installation is now complete and the system will reboot for the final time.
  4. The final system will now boot normally. There will be a prompt for the LUKS passphrase during boot up. Unfortunately the prompt text gets mixed up with systemd boot messages, which I’m not sure how to fix. Just keep an eye out for it. Once the key is entered boot up will complete and firstboot should launch allowing the creation of a user account. Since the root account is locked, this user will be added to the wheel group, giving it sudo privileges.

If everything went to plan, the ChromeBook should now have a fully functional Fedora 18 install on its internal flash, with the XFCE desktop environment. Compared to running off an external SD card, the boot up speed is quite alot faster. The time to get to the desktop login screen is not all that much longer than with ChromeOS (obviously I’m ignoring the pause to enter the LUKS passphrase here).

Some things I’m not happy with

  • Only /home is encrypted. I’d like to figure out how to build an initrd for the ChromeOS kernel capable of unlocking a LUKS encrypted root filesystem
  • The boot up is in text mode. I’d like to figure out how to do graphical boot with plymouth, mostly to get a better prompt for the LUKS passphrase
  • The image is not using GNOME 3. I much prefer the GNOME Shell experience over the “traditional” desktop model seen with XFCE / GNOME 2 / etc

Running the script

You run this script AT YOUR OWN RISK. It completely erases all personal data on your ChromeBook and erases ChromeOS itself. If something goes wrong with the script, you’ll likely end up with an unbootable machine. To fix this you’ll need an SD card / USB stick to follow the ChromeOS recovery procedures. I’ve been through the recovery process perhaps 20 times now and it doesn’t always go 100% smoothly. Sometimes it complains that it has hit an unrecoverable error. Despite the message, ChromeOS still appears to have been recovered & will boot, but there’s something fishy going on. Again you run this script AT YOUR OWN RISK.

  1. Download http://berrange.fedorapeople.org/install-f18-arm-chromebook-luks.sh to any random machine
  2. Optionally edit the script to change the FEDORA_ROOT_IMAGE_URL and UBOOT_URL env variables to point to a local mirror of the files.
  3. Optionally edit the script to set the ssid and psk parameters with the wifi connection details. If not set, the script will prompt for them
  4. Boot the ChromeBook in Developer Mode and login as a guest
  5. Use Ctrl+Alt+F2 to switch to the ChromeOS root shell (F2 is the key with the forward arrow on it, in the usual location you’d expect F2 to be)
  6. Copy the script downloaded earlier to /tmp in the ChromeOS root and give it executable permission
  7. Run bash /tmp/install-f18-arm-chromebook-luks.sh
  8. Watch as it reboots 3 times (keep an eye out for the LUKS key prompts on boots 3 and 4.
  9. Then either rejoice when firstboot appears and you subsequently get a graphical login prompt, or weep as you need to run the ChromeOS recovery procedure.

The script will save logs from stages 1 / 2 / 3 into /root of the final filesystem. It also copies over a couple of interesting log files from ChromeOS for reference.

Announce: NoZone 1.0 – a Bind DNS zone generator

Posted: March 17th, 2013 | Filed under: Coding Tips, Fedora, Virt Tools | Tags: , , , , | 6 Comments »

My web servers host a number of domains for both personal sites and open source projects, which of course means managing a number of DNS zone files. I use Gandi as my registrar, and they throw in free DNS hosting when you purchase a domain. When you have more than 2-3 domains to manage and want to keep the DNS records consistent across all of them, dealing with the pointy-clicky web form interfaces is really incredibly tedious. Thus I have traditionally hosted my own DNS servers, creating the Bind DNS zone files in emacs. Anyone who has ever used Bind though, will know that its DNS zone file syntax is one of the most horrific formats you can imagine. It is really easy to make silly typos which will screw up your zone in all sorts of fun ways. Keeping the DNS records in sync across domains is also still somewhat tedious.

What I wanted is a simpler, safer configuration file format for defining DNS zones, which can minimise the duplication of data across different domains. There may be tools which do this already, but I fancied writing something myself tailored to my precise use case, so didn’t search for any existing solutions. The result of a couple of evenings hacking efforts is a tool I’m calling NoZone, which now has its first public release, version 1.0. The upstream source is available in a GIT repository

The /etc/nozone.cfg configuration file

The best way to illustrate what NoZone can do, is to simply show a sample configuration file. For reasons of space, I’m cutting out all the comments – the copy that is distributed contains copious comments. In this example, 3 (hypothetical) domain names are being configured, nozone.com, nozone.org which are the public facing domains, and an internal domain for testing purposes qa.nozone.org. All three domains are intended to be configured with the same DNS records, the only difference is that the internal zone (qa.nozone.org) needs to have different IP addresses for its records. For each domain, there will be three physical machines involved, gold, platinum and silver

The first step is to define a zone with all the common parameters specified. Note that this zone isn’t specifying any machine IP addresses, or domain names. It is just referring to the machine names to define an abstract base for the child zones

zones = {
  common = {
    hostmaster = dan-hostmaster

    lifetimes = {
      refresh = 1H
      retry = 15M
      expire = 1W
      negative = 1H
      ttl = 1H
    }

    default = platinum

    mail = {
      mx0 = {
        priority = 10
        machine = gold
      }
      mx1 = {
        priority = 20
        machine = silver
      }
    }

    dns = {
      ns0 = gold
      ns1 = silver
    }

    names = {
      www = platinum
    }

    aliases = {
      db = gold
      backup = silver
    }

    wildcard = platinum
  }

With the common parameters defined, a second zone is defined called “production” which lists the domain names nozone.org and nozone.com and the IP details for the physical machines hosting the domains.

  production = {
    inherits = common

    domains = (
        nozone.org
        nozone.com
    )

    machines = {
      platinum = {
        ipv4 = 12.32.56.1
        ipv6 = 2001:1234:6789::1
      }
      gold = {
        ipv4 = 12.32.56.2
        ipv6 = 2001:1234:6789::2
      }
      silver = {
        ipv4 = 12.32.56.3
        ipv6 = 2001:1234:6789::3
      }
    }
  }

The third zone is used to define the internal qa.nozone.org domain.

  testing = {
    inherits = common

    domains = (
      qa.nozone.org
    )

    machines = {
      platinum = {
        ipv4 = 192.168.1.1
        ipv6 = fc00::1:1
      }
      gold = {
        ipv4 = 192.168.1.2
        ipv6 = fc00::1:2
      }
      silver = {
        ipv4 = 192.168.1.3
        ipv6 = fc00::1:3
      }
    }
  }
}

Generating the Bind DNS zone files

With the /etc/nozone.org configuration file created, the Bind9 DNS zone files can now be generated by invoking the nozone command.

$ nozone

This generates a number of files

# ls /etc/named
nozone.com.conf  nozone.conf  nozone.org.conf  qa.nozone.org.conf
$ ls /var/named/data/
named.run           named.run-20130317  nozone.org.data
named.run-20130315  nozone.com.data     qa.nozone.org.data

The final step is to add one line to /etc/named.conf and then restart bind.

$ echo 'include "/etc/named/nozone.conf";' >> /etc/named.conf
$ systemctl restart named.service

The generated files

The /etc/named/nozone.conf file is always generated and contains references to the conf files for each domain named

include "/etc/named/nozone.com.conf";
include "/etc/named/nozone.org.conf";
include "/etc/named/qa.nozone.org.conf";

Each of these files defines a domain name and links to the zone file definition. For example, nozone.com.conf contains

zone "nozone.com" in {
    type master;
    file "/var/named/data/nozone.com.data";
};

Finally, the interesting data is in the actual zone files, in this case /var/named/data/nozone.com.data

$ORIGIN nozone.com.
$TTL     1H ; queries are cached for this long
@        IN    SOA    ns1    hostmaster (
                           1363531990 ; Date 2013/03/17 14:53:10
                           1H  ; slave queries for refresh this often
                           15M ; slave retries refresh this often after failure
                           1W ; slave expires after this long if not refreshed
                           1H ; errors are cached for this long
         )

; Primary name records for unqualfied domain
@                    IN    A               12.32.56.1 ; Machine platinum
@                    IN    AAAA            2001:1234:6789::1 ; Machine platinum

; DNS server records
@                    IN    NS              ns0
@                    IN    NS              ns1
ns0                  IN    A               12.32.56.2 ; Machine gold
ns0                  IN    AAAA            2001:1234:6789::2 ; Machine gold
ns1                  IN    A               12.32.56.3 ; Machine silver
ns1                  IN    AAAA            2001:1234:6789::3 ; Machine silver

; E-Mail server records
@                    IN    MX       10     mx0
@                    IN    MX       20     mx1
mx0                  IN    A               12.32.56.2 ; Machine gold
mx0                  IN    AAAA            2001:1234:6789::2 ; Machine gold
mx1                  IN    A               12.32.56.3 ; Machine silver
mx1                  IN    AAAA            2001:1234:6789::3 ; Machine silver

; Primary names
gold                 IN    A               12.32.56.2
gold                 IN    AAAA            2001:1234:6789::2
platinum             IN    A               12.32.56.1
platinum             IN    AAAA            2001:1234:6789::1
silver               IN    A               12.32.56.3
silver               IN    AAAA            2001:1234:6789::3

; Extra names
www                  IN    A               12.32.56.1 ; Machine platinum
www                  IN    AAAA            2001:1234:6789::1 ; Machine platinum

; Aliased names
backup               IN    CNAME           silver
db                   IN    CNAME           gold

; Wildcard
*                    IN    A               12.32.56.1 ; Machine platinum
*                    IN    AAAA            2001:1234:6789::1 ; Machine platinum

As of 2 days ago, I’m using nozone to manage the DNS zones for all the domains I own. If it is useful to anyone else, it can be downloaded from CPAN. I’ll likely be submitting it for a Fedora review at some point too.