Securing the WordPress admin interface using (Free!) SSL certificates

Posted: December 19th, 2011 | Filed under: Fedora | Tags: apache, authentication, debian, login, ssl, startssl, wordpress | 1 Comment »

Last year I migrated my website off Blogger to a WordPress installation hosted on my Debian server. Historically my website has only been exposed over plain old HTTP, which was fine since the Blogger publishing UI was running HTTPS. With the migration to WordPress install though, the publishing UI is now running on my own webserver and thus the lack of HTTPS on my server becomes a reasonably serious problem. Most people’s first approach to fixing this would be to just generate a self-signed certificate and deploy that for their server, but I rather wanted to have a x509 certificate that would be immediately trusted by any visiting browser.

Getting free x509 certificates from StartSSL

The problem is that the x509 certificate authority system is a bit of a protection racket with recurring fees that just cannot justify the level of integrity they provide. There is one exception to the norm though, StartSSL offer some basic x509 certificates at zero cost. In particular you can get Class1 web server certificates and personal client identity certificates. The web server certificates are restricted in that you can only include 2 domain names in them, your basic domain name & the same domain name with ‘www.’ prefixed. If you want wildcard domains, or multiple different domain names in a single certificate you’ll have to go for their pay-for offerings. For many people, including myself, this limitation will not be a problem.

StartSSL have a nice self-service web UI for generating the various certificates. The first step is to generate a personal client identity certificate, which the rest of their administrative control panel relies on for authentication. After generation is complete, it automatically gets installed into firefox’s certificate database. You are wisely reminded to export the database to a pkcs12 file and back it up somewhere securely. If you loose this personal client certificate, you will be unable to access their control panel for managing your web server certificates. The validation they do prior to issuing the client certificate is pretty minimal, but fully automated, in so much as they send a message to the email address you provide with a secret URL you need to click on. This “proves” that the email address is yours, so you can’t request certificates for someone else’s email address, unless you can hack their email accounts…

Generating certificates for web servers is not all that much more complicated. There are two ways to go about it though, either you can fill in their interactive web form & let their site generate the private key, or you can generate a private key offline and just provide them with a CSR (Certificate Signing Request). I tried todo the former first of all, but for some reason it didn’t work – it got stuck generating the private key, so I switched to generating a CSR instead. The validation they do prior to issuing a certificate for a web server is also automated. This time they do a whois lookup on the domain name you provide, and send a message with a secret URL to the admin, technical & owner email addresses in the whois record. This “proves” that the domain is yours, so you can’t requests certificates for someone else’s domain name, unless you can hack their whois data or admin/tech/owner email accounts…

Setting up Apache to enable SSL

The next step is to configure apache to enable SSL for the website as a whole. There are four files that need to be installed to provide the certificates to mod_ssl

ssl-cert-berrange.com.pem – this is the actual certificate StartSSL issued for my website, against StartSSL’s Class1 root certificate
ssl-cert-berrange.com.key – this is the private key I generated and used with my CSR
ssl-ca-start.com.pem – this is the master StartSSL CA certificate
ssl-ca-chain-start.com-class1-server.pem – this is the chain of trust between your website’s certificate and StartSSL’s master CA certificate, via their Class1 root certificate

On my Debian Lenny host, they were installed to the following locations

/etc/ssl/certs/ssl-cert-berrange.com.pem
/etc/ssl/private/ssl-cert-berrange.com.key
/etc/ssl/certs/ssl-ca-chain-start.com-class1-server.pem
/etc/ssl/certs/ssl-ca-start.com.pem

The only other bit I needed todo was to setup a new virtual host in the apache config file, listening on port 443

<VirtualHost *:443>
  ServerName www.berrange.com
  ServerAlias berrange.com

  DocumentRoot /var/www/berrange.com
  ErrorLog /var/log/apache2/berrange.com/error_log
  CustomLog /var/log/apache2/berrange.com/access_log combined

  SSLEngine on

  SSLCertificateFile    /etc/ssl/certs/ssl-cert-berrange.com.pem
  SSLCertificateKeyFile /etc/ssl/private/ssl-cert-berrange.com.key
  SSLCertificateChainFile /etc/ssl/certs/ssl-ca-chain-start.com-class1-server.pem
  SSLCACertificateFile /etc/ssl/certs/ssl-ca-start.com.pem
</VirtualHost>

After restarting Apache, I am now able to connect to https://berrange.com/ and that my browser trusts the site with no exceptions required.

Setting up Apache to require SSL client cert for WordPress admin pages

The next phase is to mandate use of a client certificate when accessing any of the WordPress administration pages. Should there be any future security flaws in the WordPress admin UI, this will block any would be attackers since they will not have the requisite client SSL certificate. Mnadating use of client certificates is done with the “SSLVerifyClient require” directive in Apache. This allows the client to present any client certificate that is signed by the CA configured earlier – that is potentially any user of StartSSL. My intention is to restrict access exclusively to the certificate that I was issued. This requires specification of some match rules against various fields in the certificate. First lets see the Apache virtual host configuration additions:

<Location /wp-admin>
  SSLVerifyClient require
  SSLVerifyDepth  3
  SSLRequire %{SSL_CLIENT_I_DN_C} eq "IL" and \
             %{SSL_CLIENT_I_DN_O} eq "StartCom Ltd." and \
             %{SSL_CLIENT_I_DN_OU} eq "Secure Digital Certificate Signing" and \
             %{SSL_CLIENT_I_DN_CN} eq "StartCom Class 1 Primary Intermediate Client CA" and \
             %{SSL_CLIENT_S_DN_CN} eq "dan@berrange.com" and \
             %{SSL_CLIENT_S_DN_Email} eq "dan@berrange.com"
</Location>

The first 4 match rules here are saying that the client certificate must have been issued by the StartSSL Class1 client CA, while the last 2 matches are saying that the client certificate must contain my email address. The security thus relies on StartSSL not issuing anyone else a certificate using my email address. The whole lot appears inside a location match against ‘/wp-admin’ which is the URL prefix all the WordPress administration pages have. The entire block must also be duplicated using a location match against ‘/wp-login.php’ to protect the user login page too.

<Location /wp-login.php>
  SSLVerifyClient require
  SSLVerifyDepth  3
  SSLRequire %{SSL_CLIENT_I_DN_C} eq "IL" and \
             %{SSL_CLIENT_I_DN_O} eq "StartCom Ltd." and \
             %{SSL_CLIENT_I_DN_OU} eq "Secure Digital Certificate Signing" and \
             %{SSL_CLIENT_I_DN_CN} eq "StartCom Class 1 Primary Intermediate Client CA" and \
             %{SSL_CLIENT_S_DN_CN} eq "dan@berrange.com" and \
             %{SSL_CLIENT_S_DN_Email} eq "dan@berrange.com"
</Location>

Preventing access to the WordPress admin pages via non-HTTPS connections.

Finally, to ensure the login & admin pages cannot be accessed over plain HTTP, it is necessary to alter the virtual host config for port 80, to include

RewriteEngine On
RewriteRule ^(/wp-admin/.*) https://www.berrange.com$1 [L,R=permanent]
RewriteRule ^(/wp-login.php.*) https://www.berrange.com$1 [L,R=permanent]

To be honest, I should just put a redirect on ‘/’ to prevent any use of the plain HTTP site at all, but I want to test how well my tiny virtual server copes with the load before enabling HTTPs for everything.

Hopefully this blog post has demonstrated that setting up your personal webserver with certificates that any browser will trust, is both easy and cheap (free), so there is no reason to use self-signed certificates unless you need multiple domain names / wildcard addresses in your certificates and you’re unwilling to pay money for them.

Multi-factor SSH authentication using YubiKey and SSH public keys together

Posted: December 18th, 2011 | Filed under: Fedora | Tags: authentication, public key, ssh, two-factor, yubikey | 11 Comments »

UPDATE: the setup described here is flawed because it only correctly secures the primary SSH channel. ie if you use port redirection like ‘ssh -L 80:localhost:80 example.com’ then the shell session will require you to enter the yubikey code, but the port redirect will be activated and usable prior to you entering the yubikey. I’d thus strongly recommend NOT FOLLOWING the instructions in this blog post, and instead upgrade to OpenSSH >= 6.2 which has proper built-in support for multi-factor authentication, avoiding the need for this hack.

A month or two ago I purchased a couple of YubiKey USB tokens, one for authentication with Fedora infrastructure and the other for authentication of my personal servers. The reason I need two separate tokens is that Fedora uses its own YubiKey authentication server, thus requiring that you burn a new secret key into the token. For my personal servers I decided that I would simply authenticate against the central YubiKey authentication server hosted by YubiCo themselves. While some people might not be happy trusting a 3rd party service for authentication of their servers, I decided this was not a big problem since I intend to combine the YubiKey authentication with the existing strong SSH RSA public key authentication which is entirely under my control.

YubiKey authentication via PAM

To start off with I decided to follow a well documented configuration path, enabling YubiKey authentication for SSH via PAM. This was pretty straightforward and worked first time. The configuration steps were

Build and install the yubico-pam module. You might be lucky and find your distro already ships packages for this, but I was doing this on my Debian Lenny server which did not appear to have any pre-built PAM module.
Create a file /etc/yubikey_mappings which contains a list of usernames and their associated yubikey token IDs. The Token ID is the first 12 characters of a OTP generated from a keypress of the token. Multiple token IDs can be listed for each user.
```
$ cat > /etc/yubikey_mappings <<EOF
fred:cccccatsdogs:ccccdogscats
EOF
```
Get a unique API key and secret for personal use from https://upgrade.yubico.com/getapikey/

Add the yubico-pam module to the SSHD PAM configuration module using the previously obtained API key ID in place of XXXX

$ cat /etc/pam.d/sshd
# PAM configuration for the Secure Shell service

# Read environment variables from /etc/environment and
# /etc/security/pam_env.conf.
auth       required     pam_env.so # [1]
# In Debian 4.0 (etch), locale-related environment variables were moved to
# /etc/default/locale, so read that as well.
auth       required     pam_env.so envfile=/etc/default/locale

auth sufficient pam_yubico.so id=XXXX authfile=/etc/yubikey_mappings

# Standard Un*x authentication.
@include common-auth

...snip...

This all worked fine, with one exception, if I had an authorized SSH public key then SSH would skip straight over the PAM “auth” phase. This is not what I wanted, since my intention was to use YubiKey and SSH public keys for login. The yubico-pam website has instructions for setting up two-factor authentication but this only works if both your factors are configured via PAM. SSH public key authentication is completely outside the realm of PAM. AFAICT from a bit of googling, it is not possible to configure OpenSSH to require PAM and public key authentication together; it considers either one of them to be sufficient on their own.

After a little more googling though, I came across an interesting hack utilizing the ForceCommand configuration parameter of SSHD. The gist of the idea is that instead of configuring YubiKey authentication via PAM, you use the ForceCommand parameter to get SSHD to invoke a helper script which performs a YubiKey authentication check and only then executes the real command (ie login shell).

I made a few modifications to Alexandre’s script mentioned in the blog post just linked

Use the same configuration file, /etc/yubimap_mappings, as used for centralized yubico-pam setup
Allow the verbose debugging information to be turned off
Load the API key ID from /etc/yubikey_shell instead of requiring editing of the helper script itself

Usage of the script is quite simple

Create /etc/yubikey_shell containing

$ cat /etc/yubikey_shell
# Configuration for /sbin/yubikey_shell

# Replace XXXX with your 4 digit API key ID as obtained
# from https://upgrade.yubico.com/getapikey/
YUBICO_API_ID="XXXX"

# Change to 1 to enable debug logs for troubleshooting login
#DEBUG=1

# To override stanard key mapping location. This file
# should contain 1 or more lines like
#
#    USERNAME:YUBI_KEY_ID:YUBI_KEY_ID:...
#
# This is the same syntax used for yubico-pam
#TRUSTED_KEYS_FILE=/etc/yubikey_mappings

Create the /etc/yubikey_mappings file, if not already present from a previous yubico-pam setup
```
$ cat /etc/yubikey_mappings
fred:cccccatsdogs:ccccdogscats
```
Append to the /etc/ssh/sshd_config file a directive to enable YubiKey auth for selected users
```
Match User fred
  ForceCommand /sbin/yubikey_shell
```

Save the wrapper script itself to /sbin/yubikey_shell

x
DEBUG=0
TRUSTED_KEYS_FILE=/etc/yubikey_mappings
# This default works, but you really want to use your
# own ID for greater security
YUBICO_API_ID=16

test -f /etc/yubikey_shell && source /etc/yubikey_shell

STD="\\033[0;39m"
OK="\\033[1;32m[i]$STD"
ERR="\\033[1;31m[e]$STD"

##################################################
## Disconnect clients trying to exit the script ##
##################################################
trap disconnect INT

disconnect() {
  sleep 1
  kill -9 $PPID
  exit 1
}

debug() {
  if test "$DEBUG" = 1 ; then
    echo -e "$@"
  fi
}

if test -z "$USER"
then
  debug "$ERR USER environment variable is not set" > /dev/stderr
  disconnect
fi  
####################################
## Get user-trusted yubikeys list ##
####################################
if [ ! -f $TRUSTED_KEYS_FILE ]
then
  debug "$ERR Unable to find trusted keys list" > /dev/stderr
  disconnect
fi

TRUSTED_KEYS=`grep "${USER}:" $TRUSTED_KEYS_FILE | sed -e "s/${USER}://" | sed -e 's/:/\n/g'`
for k in $TRUSTED_KEYS
do
  debug "$OK Possible key '$k'"
done

#######################################
## Get the actual OTP                ##
#######################################

echo -n "Please provide Yubi OTP: "
read -s OTP
echo
KEY_ID=${OTP:0:12}
#######################################
## Iterate through trusted keys list ##
#######################################
for trusted in ${TRUSTED_KEYS[@]}
do
  if test "$KEY_ID" = "$trusted"
  then
    debug "$OK Found key in $TRUSTED_KEYS_FILE - validating OTP now ..."
    if wget "https://api.yubico.com/wsapi/verify?id=$YUBICO_API_ID&otp=$OTP" -O - 2> /dev/null | grep "status=OK" > /dev/null
    then
      debug "$OK OTP validated"
      if test -z "$SSH_ORIGINAL_COMMAND"
      then
        exec `grep "^$(whoami)" /etc/passwd | cut -d ":" -f 7`
      else
        exec "$SSH_ORIGINAL_COMMAND"
      fi
      debug "$ERR failed to execute shell / command" > /dev/stderr
      disconnect
    else
      debug "$ERR Unable to validate generated OTP" > /dev/stderr
      disconnect
    fi
  fi
done
debug "$ERR Key not trusted" > /dev/stderr
disconnect

The avoid the need to cut+paste, here are links to the full script and the configuration file.

After restarting the SSHD service, all was working nicely. Authentication now requires a combination of a valid SSH public key and a valid YubiKey token. Alternatively, if SSH public keys are not in use for a user, authentication will require the login password and a valid YubiKey token.

I still feel a little dirty about having to use the ForceCommand hack though, because it means yubikey auth failures don’t appear in your audit logs – as far as SSHD is concerned everything was successful. It would nice to be able to figure out how to make OpenSSH properly combine SSH public key and PAM for authentication…

Watching the libvirt RPC protocol using SystemTAP

Posted: November 30th, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: debugging, dtrace, libvirt, markers, probes, rpc, systemtap | 1 Comment »

A couple of releases back I completely re-structured all the RPC handling code inside libvirt to make sure it could be properly shared between the client and server, as well as decoupling the RPC handling code from the implementation of the RPC functions. As part of this work I introduced a fairly comprehensive set of DTrace static probe points into the libvirt RPC code. While one could write a WireShark plugin that is able to decode the libvirt RPC protocol (oh look Michal already has written one), that would not be able to examine encrypted libvirt connections – which is pretty much all of them. By using static probes in the libvirt RPC code we can see the RPC messages being sent and received before/after encryption has been applied.

The observant will notice that I said I inserted DTrace static probes, while this blog subject line says SystemTAP. Well the SystemTAP developers had the good sense to make their userspace probing infrastructure support the DTrace static probe marker syntax. So inserting DTrace static probes into userspace code, trivially enables support for both DTrace and SystemTAP. I previously added DTrace probe support to QEMU/KVM and was very happy when Bryan Cantrill told me (at the recent KVM forum) that the DTrace probe support I added to KVM only needed minor build system tweaks to work on Solaris, despite my only ever having tested with Linux + SystemTAP.

Along with adding the DTrace markers to the libvirt RPC code, I also created two SystemTAP tapset files to make it simpler to use the probes from SystemTAP scripts. The first, /usr/share/systemtap/tapset/libvirt_probe.stp, contains the actual probe points, grouped by functional area, while the second, /usr/share/systemtap/tapset/libvirt_functions.stp, contains a bunch of helper functions for converting enum values into human friendly strings. The idea is that instead of seeing “Procedure 53”, a sysadmin would much rather see “Procedure domain_dump_core”. I won’t go into detail about what is in those two files here, instead I’ll just illustrate their use

Tracing the RPC client

Lets says we first want to see what messages the client is sending and receiving. There are two interesting probes here, “libvirt.rpc.server_client_msg_rx” and “libvirt.rpc.server_client_msg_tx_queue“. The former is triggered when a complete RPC message has been read off the wire, while the latter is triggered when an RPC message is queued for transmission. Ideally we would also have another probe triggered when an RPC message has been completely transmitted – that’s a future todo item. Simple usage of these two probes would be

# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
  printf("client=%p len=%d program=%d version=%d procedure=%d type=%d status=%d serial=%d\n",
         client, len, prog, vers, proc, type, status, serial);
}
probe libvirt.rpc.client_msg_tx_queue {
  printf("client=%p len=%d program=%s version=%d procedure=%s type=%s status=%d serial=%d\n",
         client, len, prog, vers, proc, type, status, serial);
}
EOF
# stap demo.stp
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=66 type=0 status=0 serial=0
client=0x7f827c3b1010 len=36 program=536903814 version=1 procedure=66 type=1 status=0 serial=0
client=0x7f827c3b1010 len=40 program=536903814 version=1 procedure=1 type=0 status=0 serial=1
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=1 type=1 status=0 serial=1
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=110 type=0 status=0 serial=2
client=0x7f827c3b1010 len=48 program=536903814 version=1 procedure=110 type=1 status=0 serial=2
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=2 type=0 status=0 serial=3
client=0x7f827c3b1010 len=28 program=536903814 version=1 procedure=2 type=1 status=0 serial=3

The example shows the results of running “virsh domname vm1”. There are 4 RPC calls made here, 66 (authenticate), 1 (open), 110 (get uri), 2 (close).

Tracing the client with friendly output

Unless you have memorized libvirt RPC enums, this isn’t a very friendly way to trace the code. This is where the aforementioned libvirt_functions.stp tapset comes into play.

# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
  printf("R client=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
         client, len,
         libvirt_rpc_program_name(prog, 0),
         vers,
         libvirt_rpc_procedure_name(prog, vers, proc, 0),
         libvirt_rpc_type_name(type, 0),
         libvirt_rpc_status_name(status, 0),
         serial);
}
probe libvirt.rpc.client_msg_tx_queue {
  printf("T client=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
         client, len,
         libvirt_rpc_program_name(prog, 0),
         vers,
         libvirt_rpc_procedure_name(prog, vers, proc, 0),
         libvirt_rpc_type_name(type, 0),
         libvirt_rpc_status_name(status, 0),
         serial);
}
EOF
# stap demo.stp
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
R client=0x7f3e3dec0010 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
T client=0x7f3e3dec0010 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
R client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
R client=0x7f3e3dec0010 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
T client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
R client=0x7f3e3dec0010 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3

Much more friendly !

Tracing the server at the same time

It might desirable to see when the server itself receives the message, independently of when the client transmitted it. There are an identical set of probes available in the server, just replace ‘client’ with ‘server_client’ in the above examples. Thus the demo script can trivially be extended to show server messages at the same time:

# cat >> demo.stp << EOF
probe libvirt.rpc.server_client_msg_rx {
  printf("R server=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
         client, len,
         libvirt_rpc_program_name(prog, 0),
         vers,
         libvirt_rpc_procedure_name(prog, vers, proc, 0),
         libvirt_rpc_type_name(type, 0),
         libvirt_rpc_status_name(status, 0),
         serial);
}
probe libvirt.rpc.server_client_msg_tx_queue {
  printf("T server=%p len=%d program=%s version=%d procedure=%s type=%s status=%s serial=%d\n",
         client, len,
         libvirt_rpc_program_name(prog, 0),
         vers,
         libvirt_rpc_procedure_name(prog, vers, proc, 0),
         libvirt_rpc_type_name(type, 0),
         libvirt_rpc_status_name(status, 0),
         serial);
}
# stap demo.stp
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
R server=0x17a2070 len=28 program=remote version=1 procedure=auth_list type=call status=ok serial=0
T server=0x17a2070 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
R client=0x7ff3c4855010 len=36 program=remote version=1 procedure=auth_list type=reply status=ok serial=0
T client=0x7ff3c4855010 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
R server=0x17a2070 len=40 program=remote version=1 procedure=open type=call status=ok serial=1
T server=0x17a2070 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
R client=0x7ff3c4855010 len=28 program=remote version=1 procedure=open type=reply status=ok serial=1
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
R server=0x17a2070 len=28 program=remote version=1 procedure=get_uri type=call status=ok serial=2
T server=0x17a2070 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
R client=0x7ff3c4855010 len=48 program=remote version=1 procedure=get_uri type=reply status=ok serial=2
T client=0x7ff3c4855010 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
R server=0x17a2070 len=28 program=remote version=1 procedure=close type=call status=ok serial=3
T server=0x17a2070 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3
R client=0x7ff3c4855010 len=28 program=remote version=1 procedure=close type=reply status=ok serial=3

If the server is running on a different host than the client, just copy the demo.stp script to the other host and run a second copy there.

Further extensions

There are many further improvements that can be made to this script

Display a timestamp on each message
Associate each server side message with an individual socket
Display payload length
Display a message when the script is actually ready to run

To simplify life, we are maintaining a nice feature demonstration of the RPC SystemTAP probes in the libvirt GIT repository in the
examples/systemtap/rpc-monitor.stp file.

Here is what it can print out

  0.000 begin
  2.632 C + 0x7f1ea57dc010   local=127.0.0.1;0 remote=127.0.0.1;0
  2.632 C > 0x7f1ea57dc010   msg=remote.1.auth_list(call, ok, 0) len=28
  2.632 + S 0x1c1f710        local=127.0.0.1;0 remote=127.0.0.1;0
  2.632 > S 0x1c1f710        msg=remote.1.auth_list(call, ok, 0) len=28
  2.633 < S 0x1c1f710        msg=remote.1.auth_list(reply, ok, 0) len=36
  2.633 C < 0x7f1ea57dc010   msg=remote.1.auth_list(reply, ok, 0) len=36   2.633 C > 0x7f1ea57dc010   msg=remote.1.open(call, ok, 1) len=40
  2.633 > S 0x1c1f710        msg=remote.1.open(call, ok, 1) len=40
  2.639 < S 0x1c1f710        msg=remote.1.open(reply, ok, 1) len=28
  2.639 C < 0x7f1ea57dc010   msg=remote.1.open(reply, ok, 1) len=28   2.639 C > 0x7f1ea57dc010   msg=remote.1.get_uri(call, ok, 2) len=28
  2.639 > S 0x1c1f710        msg=remote.1.get_uri(call, ok, 2) len=28
  2.639 < S 0x1c1f710        msg=remote.1.get_uri(reply, ok, 2) len=48
  2.640 C < 0x7f1ea57dc010   msg=remote.1.get_uri(reply, ok, 2) len=48   2.640 C > 0x7f1ea57dc010   msg=remote.1.domain_lookup_by_id(call, ok, 3) len=32
  2.640 > S 0x1c1f710        msg=remote.1.domain_lookup_by_id(call, ok, 3) len=32
  2.640 < S 0x1c1f710        msg=remote.1.domain_lookup_by_id(reply, error, 3) len=180
  2.641 C < 0x7f1ea57dc010   msg=remote.1.domain_lookup_by_id(reply, error, 3) len=180   2.641 C > 0x7f1ea57dc010   msg=remote.1.close(call, ok, 4) len=28
  2.641 > S 0x1c1f710        msg=remote.1.close(call, ok, 4) len=28
  2.641 < S 0x1c1f710        msg=remote.1.close(reply, ok, 4) len=28
  2.641 C < 0x7f1ea57dc010   msg=remote.1.close(reply, ok, 4) len=28
  2.641 C - 0x7f1ea57dc010   local= remote=
  2.641 - S 0x1c1f710        local=127.0.0.1;0 remote=127.0.0.1;0

Tracing other areas of libvirt code

The RPC code is not the only place with SystemTAP/DTrace probe markers in libvirt. We have also instrumented our main event loop and provide an examples/systemtap/events.stp demo that prints out info like this

  0.000 begin
  2.359 18185 + handle 1 4 1
  2.360 18185 + handle 2 6 1
  2.360 18185 * handle 2 0
  2.360 14370 > handle 3 1
  2.360 14370 + handle 33 16 1
  2.361 14370 ~ 7 -1
  2.361 14370 > handle 33 1
  2.361 14370 * handle 33 1
  2.361 14370 * handle 33 1
  2.361 14370 * handle 33 3
  2.361 14370 ~ 7 -1
  2.361 14370 > handle 1 1
  2.361 14370 ~ 7 -1
  2.361 14370 > handle 33 2
  2.361 14370 * handle 33 1
  2.361 14370 ~ 7 -1
  2.361 18185 * handle 2 1
  2.362 18185 * handle 2 0

And finally we have instrumented our code which talks to the QEMU monitor, again providing a demo examples/systemtap/qemu-monitor.stp which prints out info like this

  0.000 begin
  3.848 ! 0x7f2dc00017b0 {"timestamp": {"seconds": 1319466931, "microseconds": 187755}, "event": "SHUTDOWN"}
  5.773 > 0x7f2dc0007960 {"execute":"qmp_capabilities","id":"libvirt-1"}
  5.774 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-1"}   5.774 > 0x7f2dc0007960 {"execute":"query-commands","id":"libvirt-2"}
  5.777 < 0x7f2dc0007960 {"return": [{"name": "quit"}, {"name": ....snip....   5.777 > 0x7f2dc0007960 {"execute":"query-chardev","id":"libvirt-3"}
  5.778 < 0x7f2dc0007960 {"return": [{"filename": ....snip....   5.779 > 0x7f2dc0007960 {"execute":"query-cpus","id":"libvirt-4"}
  5.780 < 0x7f2dc0007960 {"return": [{"current": true, "CPU": 0, "pc": 1048560, "halted": false, "thread_id": 13299}], "id": "libvirt-4"}   5.780 > 0x7f2dc0007960 {"execute":"set_password","arguments":{"protocol":"vnc","password":"123456","connected":"keep"},"id":"libvirt-5"}
  5.782 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-5"}   5.782 > 0x7f2dc0007960 {"execute":"expire_password","arguments":{"protocol":"vnc","time":"never"},"id":"libvirt-6"}
  5.783 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-6"}   5.783 > 0x7f2dc0007960 {"execute":"balloon","arguments":{"value":224395264},"id":"libvirt-7"}
  5.785 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-7"}   5.785 > 0x7f2dc0007960 {"execute":"cont","id":"libvirt-8"}
  5.789 ! 0x7f2dc0007960 {"timestamp": {"seconds": 1319466933, "microseconds": 129980}, "event": "RESUME"}
  5.789 < 0x7f2dc0007960 {"return": {}, "id": "libvirt-8"}
  7.537 ! 0x7f2dc0007960 {"timestamp": {"seconds": 1319466934, "microseconds": 881214}, "event": "SHUTDOWN"}

Conclusion

The introduction of static probes into the libvirt code has been enormously helpful in understanding the operation of libvirt. While we have comprehensive debug logging present in libvirt is it hard to tailor the output to show the precise data desired. Traditional debuggers like GDB are not very practical when trying to understand the live operation of a heavily multi-threaded system crossing multiple processes, and while strace is useful in some scenarios it is too low level to be useful in most scenarios. SystemTAP userspace probing provides the kind of debugging experience / tool that really suits understanding the complex interactions in a system like libvirt. It is no co-incidence that the first set of probes we have written have focused on the libvirt event loop, RPC code and QEMU monitor – three of the areas in libvirt which are both very critical operationally, and exceptionally hard to debug with traditional approaches. We will certainly be expanding our use of static probe markers in systemtap in the future. My real immediate wishlist is for systemtap to get better at providing userspace stack traces, since it fails to provide a useful trace far too often, as compared to GDB.

Update: Mark Wielaard showed me what I had todo to get nice stack traces from SystemTAP. Apparently it is not getting enough memory space to deal with stack traces with its default settings. Telling it to use a little more memory makes it work nicely:

# cat > demo.stp <<EOF
probe libvirt.rpc.client_msg_rx {
  printf("client=%p len=%d program=%d version=%d procedure=%d type=%d status=%d serial=%d\n",
         client, len, prog, vers, proc, type, status, serial);
  print_ustack(ubacktrace())
}
# stap -DTASK_FINDER_VMA_ENTRY_ITEMS=7680 demo.stp
client=0x7f775cf62010 len=36 program=536903814 version=1 procedure=66 type=1 status=0 serial=0
 0x3c57f0b3dd : virNetClientIOHandleInput+0x87d/0x890 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0b9a0 : virNetClientIOEventLoop+0x5b0/0x630 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0cb23 : virNetClientSend+0x2b3/0x590 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0d47c : virNetClientProgramCall+0x26c/0x8a0 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ef091e : callWithFD+0xce/0x120 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ef099c : call+0x2c/0x40 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57efee80 : doRemoteOpen+0x890/0x20f0 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0341b : remoteOpen+0x9b/0x290 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ec2133 : do_open+0x1f3/0x1100 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ec4616 : virConnectOpenAuth+0x76/0xb0 [/usr/lib64/libvirt.so.0.9.7]
 0x40ceb1 [/usr/bin/virsh+0xceb1/0x40000]
client=0x7f775cf62010 len=28 program=536903814 version=1 procedure=1 type=1 status=0 serial=1
 0x3c57f0b3dd : virNetClientIOHandleInput+0x87d/0x890 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0b9a0 : virNetClientIOEventLoop+0x5b0/0x630 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0cb23 : virNetClientSend+0x2b3/0x590 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0d47c : virNetClientProgramCall+0x26c/0x8a0 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ef091e : callWithFD+0xce/0x120 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ef099c : call+0x2c/0x40 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57eff57a : doRemoteOpen+0xf8a/0x20f0 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57f0341b : remoteOpen+0x9b/0x290 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ec2133 : do_open+0x1f3/0x1100 [/usr/lib64/libvirt.so.0.9.7]
 0x3c57ec4616 : virConnectOpenAuth+0x76/0xb0 [/usr/lib64/libvirt.so.0.9.7]
 0x40ceb1 [/usr/bin/virsh+0xceb1/0x40000]
....

This makes me very happy :-)

Announce: Release of Entangle v0.3.0 – An app for tethered camera control & capture

Posted: November 29th, 2011 | Filed under: Entangle, Fedora, Photography | 2 Comments »

Over a year has gone by since I released Entangle 0.2.0, so I am very pleased to be able to announce that I have now just released Entangle 0.3.0, available from the usual download place which also contains the concise list of changes.

There have in fact been a great many changes in this release, but many of them will not be immediately obvious from looking at the updated screenshots.

First of all, has been a big effort to port to the latest best practice desktop application libraries

Port Gtk2 to GTK3. To enable use of many of the new desktop features, in particular, libpeas, Entangle now targets Gtk3 instead of Gtk2
Port LibGlade to GtkBuilder. With the use of Gtk3 there is no longer any point in using the external Glade library for UI building. Instead Entangle now uses the GtkBuilder infrastructure that is part of regular Gtk
Port GConf to GSettings. With the use of Gtk3, a newer Glib2 is required, which in turn brings in the GSettings APis. With these present, there is no longer any point in using the external GConf library
Port Unique/StartupNotifications to GApplication. Again, since a newer Glib2 is required, it is possible to take advantage of the GApplication APIs, to avoid using the external Unique/StartupNotifcation libraries.

Next up was a major internal rewrite of the way the UI handles camera operations. In previous releases, most operations were handed off to a camera scheduler thread. The design of this was overly complicated and not friendly to future extension. Having recently gained experience with the way asynchronous operations are done by the GIO library, I decided that this would be an effective approach for Entangle. So all the internal thread scheduling code was ripped out, and the GIO style asynchronous APIs were added in its place. Doing this work was a major blocking item in why it has taken so long to release 0.3.0. Now it is all complete, I am very pleased with the way it has turned out. The code for dealing with the camera is so much simpler & more flexible at the same time.

With that out of the way, there are the general user visible feature improvements

Config refresh. For Nikon cameras, Entangle automatically updates the UI whenever any camera configuration setting changes
Continuous monitoring. Instead of having to explicitly start/stop monitoring, Entangle now monitors the camera for new images at all times, and auto-downloads them as they are captured
Continuous preview mode. Previously preview would be stopped after an image was captured. Now it is possible to capture many images in sequence, while remaining in preview mode
Folder preserved. Previously when connecting to a camera, the session folder would be reset to a default location based on camera model name. Now Entangle simply always remembers the user’s last folder
Avoid delete after download. The default behaviour is to delete files from the camera after downloading. This can now be disabled, to allow images to remain on the memory card.
Config UI improvements. The UI for displaying camera settings has been improved & simplified. The “Other PTP properties” panel has been removed to improve UI performance. The ‘Camera Status’ panel now just uses labels, instead of readonly text fields for a more compact display.
Image metadata summary. When moving the mouse over the Entangle window, a summary of the image metadata (extracted with GExiv2) will be displayed, showing the aperture, focal length, shutter speed, ISO and resolution.

With such large changes in the basic infrastructure, there are bound to be new wierd bugs introduced, but overall this release should be a good foundation for ongoing incremental development of Entangle.

GNOME-3 desktop virtualization support from GNOME Boxes (and the future for virt-manager)

Posted: November 22nd, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: boxes, gnome, gnome boxes, libvirt, openstack, ovirt, virt-manager | 4 Comments »

For many years now, the virt-manager application has been the primary open source tool for managing virtual machines under libvirt for Fedora/Linux hosts, attempting to satisfy both server and desktop users alike, with the result that often neither userbase were really too happy. The decision to use libvirt as the foundation of OpenStack, OpenNebula and various other cloud projects has been a great validation of libvirt’s capabilities. More recently, the open sourcing of the RHEV-M product to create the oVirt community project, has seen another step forward for open source data center virtualization management based upon libvirt. Finally, with today’s very first release of GNOME Boxes, the same step forward is also happening for Linux desktop virtualization. No longer will desktop virtualization (or remote desktop access) feel like an afterthought, but rather it will be a seamless part of the GNOME-3 desktop experience. This is coming to a Fedora release near you soon….the target is Fedora 17.

What does this mean for virt-manager you might wonder ? Well first of all let me reassure people that virt-manager isn’t going away anytime in the forseeable future. There will always be people who prefer straightforward, directly controllable applications which do not try to impose clever policies on their usage. virt-manager, virsh, virt-install, etc all fill this gap and we don’t want to take that control away from people. With the growth in usage of OpenStack for cloud, oVirt for data center management, and GNOME Boxes for desktop virtualization, I think it is clear though, that virt-manager will have a diminished role / userbase in the future. I don’t consider this to be bad thing, on the contrary, it shows just how strong & diverse the open source virtualization community has become. Where once there was only virt-manager, today we have a wide choice of applications providing highly effective virtualization solutions targeted towards the needs of their respective userbases.

Daniel P. Berrangé

Writing about open source software, virtualization & more