The Folsom release of OpenStack has been out for a few weeks now, and I had intended to write this post much earlier, but other things (moving house, getting married & travelling to LinuxCon Europe / KVM Forum all in the space of a month) got in the way. There are highlighted release notes, but I wanted to give a little more detail on some of the changes I was involved with making to the libvirt driver and what motivated them.
XML configuration
First off was a change in the way Nova generates libvirt XML configurations. Previously the libvirt driver in Nova used the Cheetah templating system to generate its XML configurations. The problem with this is that there was alot of information that needed to be passed into the template as parameters, so Nova was inventing a adhoc configuration format for libvirt guest config internally which then was further translated into proper guest config in the template. The resulting code was hard to maintain and understand, because the logic for constructing the XML was effectively spread across both the template file and the libvirt driver code with no consistent structure. Thus the first big change that went into the libvirt driver during Folsom was to introduce a formal set of python configuration objects to represent the libvirt XML config. The libvirt driver code now directly populates these config objects with the required data, and then simply serializes the objects to XML. The use of Cheetah has been completely eliminated, and the code structure is clarified significantly as a result. There is a wiki page describing this in a little more detail.
CPU model configuration
The primary downside from the removal of the Cheetah templating, is that it is no longer possible for admins deploying Nova to make adhoc changes to the libvirt guest XML that is used. Personally I’d actually argue that this is a good thing, because the ability to make adhoc changes meant that there was less motivation for directly addressing the missing features in Nova, but I know plenty of people would disagree with this view :-) It was quickly apparent that the one change a great many people were making to the libvirt XML config was to specify a guest CPU model. If no explicit CPU model is requested in the guest config, KVM will start with a generic, lowest common denominator model that will typically work everywhere. As can be expected, this generic CPU model is not going to offer optimal performance for the guests. For example, if your host has shiny new CPUs with builtin AES encryption instructions, the guest is not going to be able to take advantage of them. Thus the second big change in the Nova libvirt driver was to introduce explicit support for configuration the CPU model. This involves two new Nova config parameters, libvirt_cpu_mode which chooses between “host-model”, “host-passthrough” and “custom”. If mode is set to “custom”, then the libvirt_cpu_model parameter is used to specify the name of the custom CPU that is required. Again there is a wiki page describing this in a little more details.
Once the ability to choose CPU models was merged, it was decided that the default behaviour should also be changed. Thus if Nova is configured to use KVM as its hypervisor, then it will use the “host-model” CPU mode by default. This causes the guest CPU model to be a (almost) exact copy of the host CPU model, offering maximum out of the box performance. There turned out to be one small wrinkle in this choice when using nested KVM though. Due to a combination of problems in libvirt and KVM, use of “host-model” fails for nested KVM. Thus anyone using nested KVM needs to set libvirt_cpu_model=”none” as a workaround for now. If you’re using KVM on bare metal everything should be fine, which is of course the normal scenario for production deployments.
Time keeping policy
Again on the performance theme, the libvirt Nova driver was updated to set time time keeping policies for KVM guests. Virtual machines on x86 have a number of timers available including the PIT, RTC, PM-Timer, HPET. Reliable timers are one of the hardest problems to solve in full machine virtualization platforms, and KVM is no exception. If all comes down to the question of what to do when the hypervisor cannot inject a timer interrupt at the correct time, because a different guest is running. There are a number of policies available, inject the missed tick as soon as possible, merged all missed ticks into 1 and deliver it as soon as possible, temporarily inject missed ticks at a higher rate than normal to “catch up”, or simply discard the missed tick entirely. It turns out that Windows 7 is particularly sensitive to timers and the default KVM policies for missing ticks were causing frequent crashes, while older Linux guests would often experience severe time drift. Research validated by the oVirt project team has previously identified an optimal set of policies that should keep the majority of guests happy. Thus the libvirt Nova driver was updated to set explicit policies for time keeping with the PIT and RTC timers when using KVM, which should make everything time related much more reliable.
Libvirt authentication
The libvirtd daemon can be configured with a number of different authentication schemes. Out of the box it will use PolicyKit to authenticate clients, and thus Nova packages on Fedora / RHEL / EPEL include a policykit configuration file which grants Nova the ability to connect to libvirt. Administrators may, however, decide to use a different configuration scheme, for example, SASL. If the scheme chosen required a username+password, there was no way for Nova’s libvirt driver to provide these authentication credentials. Fortunately the libvirt client has the ability to lookup credentials in a local file. Unfortunately the way Nova connected to libvirt prevented this from working. Thus the way the Nova libvirt driver used openAuth() was fixed to allow the default credential lookup logic to work. It is now possible to require authentication between Nova and libvirt thus:
# augtool -s set /files/etc/libvirt/libvirtd.conf/auth_unix_rw sasl
Saved 1 file(s)
# saslpasswd -a libvirt nova
Password: XYZ
Again (for verification): XYZ
# su – nova -s /bin/sh
$ mkdir -p $HOME/.config/libvirt
$ cat > $HOME/.config/libvirt/auth.conf <<EOF
[credentials-nova]
authname=nova
password=XYZ
[auth-libvirt-localhost]
credentials=nova
EOF
Other changes
Obviously I was not the only person working on the libvirt driver in Folsom, many others contributed work too. Leander Beernaert provided an implementation of the ‘nova diagnostics’ command that works with the libvirt driver, showing the virtual machine cpu, memory, disk and network interface utilization statistics. Pádraig Brady improved the performance of migration, by sending the qcow2 image between hosts directly, instead of converting it to raw file, sending that, and then converting it back to qcow2. Instead of transferring 10 G of raw data, it can now send just the data actually used which may be as little as a few 100 MB. In his test case, this reduced the time to migrate from 7 minutes to 30 seconds, which I’m sure everyone will like to hear :-) Pádraig also optimized the file injection code so that it only mounts the guest image once to inject all data, instead of mounting it separately for each injected item. Boris Filippov contributed support for storing VM disk images in LVM volumes, instead of qcow2 files, while Ben Swartzlander contributed support for using NFS files as the backing for virtual block volumes. Vish updated the way libvirt generates XML configuration for disks, to include the “serial” property against each disk, based on the nova volume ID. This allows the guest OS admin to reliably identify the disk in the guest, using the /dev/disk/by-id/virtio-<volume id> paths, since the /dev/vdXXX device numbers are pretty randomly assigned by the kernel.
Not directly part of the libvirt driver, but Jim Fehlig enhanced the Nova VM schedular so that it can take account of the hypervisor, architecture and VM mode (paravirt vs HVM) when choosing what host to boot an image on. This makes it much more practical to run mixed environments of say, Xen and KVM, or Xen fullvirt vs Xen paravirt, or Arm vs x86, etc. When uploading an image to glance, the admin can tag it with properties specifying the desired hypervisor/architecture/vm_mode. The compute drivers then report what combinations they can support, and the scheduler computes the intersection to figure out which hosts are valid candidates for running the image.
Last year I migrated my website off Blogger to a WordPress installation hosted on my Debian server. Historically my website has only been exposed over plain old HTTP, which was fine since the Blogger publishing UI was running HTTPS. With the migration to WordPress install though, the publishing UI is now running on my own webserver and thus the lack of HTTPS on my server becomes a reasonably serious problem. Most people’s first approach to fixing this would be to just generate a self-signed certificate and deploy that for their server, but I rather wanted to have a x509 certificate that would be immediately trusted by any visiting browser.
Getting free x509 certificates from StartSSL
The problem is that the x509 certificate authority system is a bit of a protection racket with recurring fees that just cannot justify the level of integrity they provide. There is one exception to the norm though, StartSSL offer some basic x509 certificates at zero cost. In particular you can get Class1 web server certificates and personal client identity certificates. The web server certificates are restricted in that you can only include 2 domain names in them, your basic domain name & the same domain name with ‘www.’ prefixed. If you want wildcard domains, or multiple different domain names in a single certificate you’ll have to go for their pay-for offerings. For many people, including myself, this limitation will not be a problem.
StartSSL have a nice self-service web UI for generating the various certificates. The first step is to generate a personal client identity certificate, which the rest of their administrative control panel relies on for authentication. After generation is complete, it automatically gets installed into firefox’s certificate database. You are wisely reminded to export the database to a pkcs12 file and back it up somewhere securely. If you loose this personal client certificate, you will be unable to access their control panel for managing your web server certificates. The validation they do prior to issuing the client certificate is pretty minimal, but fully automated, in so much as they send a message to the email address you provide with a secret URL you need to click on. This “proves” that the email address is yours, so you can’t request certificates for someone else’s email address, unless you can hack their email accounts…
Generating certificates for web servers is not all that much more complicated. There are two ways to go about it though, either you can fill in their interactive web form & let their site generate the private key, or you can generate a private key offline and just provide them with a CSR (Certificate Signing Request). I tried todo the former first of all, but for some reason it didn’t work – it got stuck generating the private key, so I switched to generating a CSR instead. The validation they do prior to issuing a certificate for a web server is also automated. This time they do a whois lookup on the domain name you provide, and send a message with a secret URL to the admin, technical & owner email addresses in the whois record. This “proves” that the domain is yours, so you can’t requests certificates for someone else’s domain name, unless you can hack their whois data or admin/tech/owner email accounts…
Setting up Apache to enable SSL
The next step is to configure apache to enable SSL for the website as a whole. There are four files that need to be installed to provide the certificates to mod_ssl
- ssl-cert-berrange.com.pem – this is the actual certificate StartSSL issued for my website, against StartSSL’s Class1 root certificate
- ssl-cert-berrange.com.key – this is the private key I generated and used with my CSR
- ssl-ca-start.com.pem – this is the master StartSSL CA certificate
- ssl-ca-chain-start.com-class1-server.pem – this is the chain of trust between your website’s certificate and StartSSL’s master CA certificate, via their Class1 root certificate
On my Debian Lenny host, they were installed to the following locations
- /etc/ssl/certs/ssl-cert-berrange.com.pem
- /etc/ssl/private/ssl-cert-berrange.com.key
- /etc/ssl/certs/ssl-ca-chain-start.com-class1-server.pem
- /etc/ssl/certs/ssl-ca-start.com.pem
The only other bit I needed todo was to setup a new virtual host in the apache config file, listening on port 443
<VirtualHost *:443>
ServerName www.berrange.com
ServerAlias berrange.com
DocumentRoot /var/www/berrange.com
ErrorLog /var/log/apache2/berrange.com/error_log
CustomLog /var/log/apache2/berrange.com/access_log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/ssl-cert-berrange.com.pem
SSLCertificateKeyFile /etc/ssl/private/ssl-cert-berrange.com.key
SSLCertificateChainFile /etc/ssl/certs/ssl-ca-chain-start.com-class1-server.pem
SSLCACertificateFile /etc/ssl/certs/ssl-ca-start.com.pem
</VirtualHost>
After restarting Apache, I am now able to connect to https://berrange.com/ and that my browser trusts the site with no exceptions required.
Setting up Apache to require SSL client cert for WordPress admin pages
The next phase is to mandate use of a client certificate when accessing any of the WordPress administration pages. Should there be any future security flaws in the WordPress admin UI, this will block any would be attackers since they will not have the requisite client SSL certificate. Mnadating use of client certificates is done with the “SSLVerifyClient require” directive in Apache. This allows the client to present any client certificate that is signed by the CA configured earlier – that is potentially any user of StartSSL. My intention is to restrict access exclusively to the certificate that I was issued. This requires specification of some match rules against various fields in the certificate. First lets see the Apache virtual host configuration additions:
<Location /wp-admin>
SSLVerifyClient require
SSLVerifyDepth 3
SSLRequire %{SSL_CLIENT_I_DN_C} eq "IL" and \
%{SSL_CLIENT_I_DN_O} eq "StartCom Ltd." and \
%{SSL_CLIENT_I_DN_OU} eq "Secure Digital Certificate Signing" and \
%{SSL_CLIENT_I_DN_CN} eq "StartCom Class 1 Primary Intermediate Client CA" and \
%{SSL_CLIENT_S_DN_CN} eq "dan@berrange.com" and \
%{SSL_CLIENT_S_DN_Email} eq "dan@berrange.com"
</Location>
The first 4 match rules here are saying that the client certificate must have been issued by the StartSSL Class1 client CA, while the last 2 matches are saying that the client certificate must contain my email address. The security thus relies on StartSSL not issuing anyone else a certificate using my email address. The whole lot appears inside a location match against ‘/wp-admin’ which is the URL prefix all the WordPress administration pages have. The entire block must also be duplicated using a location match against ‘/wp-login.php’ to protect the user login page too.
<Location /wp-login.php>
SSLVerifyClient require
SSLVerifyDepth 3
SSLRequire %{SSL_CLIENT_I_DN_C} eq "IL" and \
%{SSL_CLIENT_I_DN_O} eq "StartCom Ltd." and \
%{SSL_CLIENT_I_DN_OU} eq "Secure Digital Certificate Signing" and \
%{SSL_CLIENT_I_DN_CN} eq "StartCom Class 1 Primary Intermediate Client CA" and \
%{SSL_CLIENT_S_DN_CN} eq "dan@berrange.com" and \
%{SSL_CLIENT_S_DN_Email} eq "dan@berrange.com"
</Location>
Preventing access to the WordPress admin pages via non-HTTPS connections.
Finally, to ensure the login & admin pages cannot be accessed over plain HTTP, it is necessary to alter the virtual host config for port 80, to include
RewriteEngine On
RewriteRule ^(/wp-admin/.*) https://www.berrange.com$1 [L,R=permanent]
RewriteRule ^(/wp-login.php.*) https://www.berrange.com$1 [L,R=permanent]
To be honest, I should just put a redirect on ‘/’ to prevent any use of the plain HTTP site at all, but I want to test how well my tiny virtual server copes with the load before enabling HTTPs for everything.
Hopefully this blog post has demonstrated that setting up your personal webserver with certificates that any browser will trust, is both easy and cheap (free), so there is no reason to use self-signed certificates unless you need multiple domain names / wildcard addresses in your certificates and you’re unwilling to pay money for them.
UPDATE: the setup described here is flawed because it only correctly secures the primary SSH channel. ie if you use port redirection like ‘ssh -L 80:localhost:80 example.com’ then the shell session will require you to enter the yubikey code, but the port redirect will be activated and usable prior to you entering the yubikey. I’d thus strongly recommend NOT FOLLOWING the instructions in this blog post, and instead upgrade to OpenSSH >= 6.2 which has proper built-in support for multi-factor authentication, avoiding the need for this hack.
A month or two ago I purchased a couple of YubiKey USB tokens, one for authentication with Fedora infrastructure and the other for authentication of my personal servers. The reason I need two separate tokens is that Fedora uses its own YubiKey authentication server, thus requiring that you burn a new secret key into the token. For my personal servers I decided that I would simply authenticate against the central YubiKey authentication server hosted by YubiCo themselves. While some people might not be happy trusting a 3rd party service for authentication of their servers, I decided this was not a big problem since I intend to combine the YubiKey authentication with the existing strong SSH RSA public key authentication which is entirely under my control.
YubiKey authentication via PAM
To start off with I decided to follow a well documented configuration path, enabling YubiKey authentication for SSH via PAM. This was pretty straightforward and worked first time. The configuration steps were
- Build and install the yubico-pam module. You might be lucky and find your distro already ships packages for this, but I was doing this on my Debian Lenny server which did not appear to have any pre-built PAM module.
- Create a file /etc/yubikey_mappings which contains a list of usernames and their associated yubikey token IDs. The Token ID is the first 12 characters of a OTP generated from a keypress of the token. Multiple token IDs can be listed for each user.
$ cat > /etc/yubikey_mappings <<EOF
fred:cccccatsdogs:ccccdogscats
EOF
- Get a unique API key and secret for personal use from https://upgrade.yubico.com/getapikey/
- Add the yubico-pam module to the SSHD PAM configuration module using the previously obtained API key ID in place of
XXXX
$ cat /etc/pam.d/sshd
# PAM configuration for the Secure Shell service
# Read environment variables from /etc/environment and
# /etc/security/pam_env.conf.
auth required pam_env.so # [1]
# In Debian 4.0 (etch), locale-related environment variables were moved to
# /etc/default/locale, so read that as well.
auth required pam_env.so envfile=/etc/default/locale
auth sufficient pam_yubico.so id=XXXX authfile=/etc/yubikey_mappings
# Standard Un*x authentication.
@include common-auth
...snip...
This all worked fine, with one exception, if I had an authorized SSH public key then SSH would skip straight over the PAM “auth” phase. This is not what I wanted, since my intention was to use YubiKey and SSH public keys for login. The yubico-pam website has instructions for setting up two-factor authentication but this only works if both your factors are configured via PAM. SSH public key authentication is completely outside the realm of PAM. AFAICT from a bit of googling, it is not possible to configure OpenSSH to require PAM and public key authentication together; it considers either one of them to be sufficient on their own.
After a little more googling though, I came across an interesting hack utilizing the ForceCommand
configuration parameter of SSHD. The gist of the idea is that instead of configuring YubiKey authentication via PAM, you use the ForceCommand
parameter to get SSHD to invoke a helper script which performs a YubiKey authentication check and only then executes the real command (ie login shell).
I made a few modifications to Alexandre’s script mentioned in the blog post just linked
- Use the same configuration file, /etc/yubimap_mappings, as used for centralized yubico-pam setup
- Allow the verbose debugging information to be turned off
- Load the API key ID from /etc/yubikey_shell instead of requiring editing of the helper script itself
Usage of the script is quite simple
- Create /etc/yubikey_shell containing
$ cat /etc/yubikey_shell
# Configuration for /sbin/yubikey_shell
# Replace XXXX with your 4 digit API key ID as obtained
# from https://upgrade.yubico.com/getapikey/
YUBICO_API_ID="XXXX"
# Change to 1 to enable debug logs for troubleshooting login
#DEBUG=1
# To override stanard key mapping location. This file
# should contain 1 or more lines like
#
# USERNAME:YUBI_KEY_ID:YUBI_KEY_ID:...
#
# This is the same syntax used for yubico-pam
#TRUSTED_KEYS_FILE=/etc/yubikey_mappings
- Create the /etc/yubikey_mappings file, if not already present from a previous yubico-pam setup
$ cat /etc/yubikey_mappings
fred:cccccatsdogs:ccccdogscats
- Append to the /etc/ssh/sshd_config file a directive to enable YubiKey auth for selected users
Match User fred
ForceCommand /sbin/yubikey_shell
- Save the wrapper script itself to /sbin/yubikey_shell
x
DEBUG=0
TRUSTED_KEYS_FILE=/etc/yubikey_mappings
# This default works, but you really want to use your
# own ID for greater security
YUBICO_API_ID=16
test -f /etc/yubikey_shell && source /etc/yubikey_shell
STD="\\033[0;39m"
OK="\\033[1;32m[i]$STD"
ERR="\\033[1;31m[e]$STD"
##################################################
## Disconnect clients trying to exit the script ##
##################################################
trap disconnect INT
disconnect() {
sleep 1
kill -9 $PPID
exit 1
}
debug() {
if test "$DEBUG" = 1 ; then
echo -e "$@"
fi
}
if test -z "$USER"
then
debug "$ERR USER environment variable is not set" > /dev/stderr
disconnect
fi
####################################
## Get user-trusted yubikeys list ##
####################################
if [ ! -f $TRUSTED_KEYS_FILE ]
then
debug "$ERR Unable to find trusted keys list" > /dev/stderr
disconnect
fi
TRUSTED_KEYS=`grep "${USER}:" $TRUSTED_KEYS_FILE | sed -e "s/${USER}://" | sed -e 's/:/\n/g'`
for k in $TRUSTED_KEYS
do
debug "$OK Possible key '$k'"
done
#######################################
## Get the actual OTP ##
#######################################
echo -n "Please provide Yubi OTP: "
read -s OTP
echo
KEY_ID=${OTP:0:12}
#######################################
## Iterate through trusted keys list ##
#######################################
for trusted in ${TRUSTED_KEYS[@]}
do
if test "$KEY_ID" = "$trusted"
then
debug "$OK Found key in $TRUSTED_KEYS_FILE - validating OTP now ..."
if wget "https://api.yubico.com/wsapi/verify?id=$YUBICO_API_ID&otp=$OTP" -O - 2> /dev/null | grep "status=OK" > /dev/null
then
debug "$OK OTP validated"
if test -z "$SSH_ORIGINAL_COMMAND"
then
exec `grep "^$(whoami)" /etc/passwd | cut -d ":" -f 7`
else
exec "$SSH_ORIGINAL_COMMAND"
fi
debug "$ERR failed to execute shell / command" > /dev/stderr
disconnect
else
debug "$ERR Unable to validate generated OTP" > /dev/stderr
disconnect
fi
fi
done
debug "$ERR Key not trusted" > /dev/stderr
disconnect
The avoid the need to cut+paste, here are links to the full script and the configuration file.
After restarting the SSHD service, all was working nicely. Authentication now requires a combination of a valid SSH public key and a valid YubiKey token. Alternatively, if SSH public keys are not in use for a user, authentication will require the login password and a valid YubiKey token.
I still feel a little dirty about having to use the ForceCommand hack though, because it means yubikey auth failures don’t appear in your audit logs – as far as SSHD is concerned everything was successful. It would nice to be able to figure out how to make OpenSSH properly combine SSH public key and PAM for authentication…