This blog is part 5 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
For many years now QEMU has had code to support the NBD protocol, either as a client or as a server. The qemu-nbd command line tool can be used to export a disk image over NBD to a remote machine, or connect it directly to the local kernel’s NBD block device driver. The QEMU system emulators also have a block driver that acts as an NBD client, allowing VMs to be run from NBD volumes. More recently the QEMU system emulators gained the ability to export the disks from a running VM as named NBD volumes. The latter is particularly interesting because it is the foundation of live migration with block device replication, allowing VMs to be migrated even if you don’t have shared storage between the two hosts. In common with most network block device protocols, NBD has never offered any kind of data security capability. Administrators are recommended to run NBD over a private LAN/vLAN, use network layer security like IPSec, or tunnel it over some other kind of secure channel. While all these options are capable of working, none are very convenient to use because they require extra setup steps outside of the basic operation of the NBD server/clients. Libvirt has long had the ability to tunnel the QEMU migration channel over its own secure connection to the target host, but this has not been extended to cover the NBD channel(s) opened when doing block migration. While it could theoretically be extended to cover NBD, it would not be ideal from a performance POV because the libvirtd architecture means that the TLS encryption/decryption for multiple separate network connections would be handled by a single thread. For fast networks (10-GigE), libvirt will quickly become the bottleneck on performance even if the CPU has native support for AES.
Thus it was decided that the QEMU NBD client & server would need to be extended to support TLS encryption of the data channel natively. Initially the thought was to just add a flag to the client/server code to indicate that TLS was desired and run the TLS handshake before even starting the NBD protocol. After some discussion with the NBD maintainers though, it was decided to explicitly define a way to support TLS in the NBD protocol negotiation phase. The primary benefit of doing this is to allow clearer error reporting to the user if the client connects to a server requiring use of TLS and the client itself does not support TLS, or vica-verca – ie instead of just seeing what appears to be a mangled NBD handshake and not knowing what it means, the client can clearly report “This NBD server requires use of TLS encryption”.
The extension to the NBD protocol was fairly straightforward. After the initial NBD greeting (where the client & server agree the NBD protocol variant to be used) the client is able to request a number of protocol options. A new option was defined to allow the client to request TLS support. If the server agrees to use TLS, then they perform a standard TLS handshake and the rest of the NBD protocol carries on as normal. To prevent downgrade attacks, if the NBD server requires TLS and the client does not request the TLS option, then it will respond with an error and drop the client. In addition if the server requires TLS, then TLS must be the first option that the client requests – other options are only permitted once the TLS session is active & the server will again drop the client if it tries to request non-TLS options first.
The QEMU NBD implementation was originally using plain POSIX sockets APIs for all its I/O. So the first step in enabling TLS was to update the NBD code so that it used the new general purpose QEMU I/O channel APIs instead. With that done it was simply a matter of instantiating a new QIOChannelTLS object at the correct part of the protocol handshake and adding various command line options to the QEMU system emulator and qemu-nbd program to allow the user to turn on TLS and configure x509 certificates.
Running a NBD server using TLS can be done as follows:
$ qemu-nbd --object tls-creds-x509,id=tls0,endpoint=server,dir=/home/berrange/qemutls \
--tls-creds tls0 /path/to/disk/image.qcow2
On the client host, a QEMU guest can then be launched, connecting to this NBD server:
$ qemu-system-x86_64 -object tls-creds-x509,id=tls0,endpoint=client,dir=/home/berrange/qemutls \
-drive driver=nbd,host=theotherhost,port=10809,tls-creds=tls0 \
...other QEMU options...
Finally to enable support for live migration with block device replication, the QEMU system monitor APIs gained support for a new parameter when starting the internal NBD server. All of this code was merged in time for the forthcoming QEMU 2.6 release. Work has not yet started to enable TLS with NBD in libvirt, as there is little point securing the NBD protocol streams, until the primary live migration stream is using TLS. More on live migration in a future blog post, as that’s going to be QEMU 2.7 material now.
In this blog series:
This blog is part 4 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
Part 2 of this series described the creation of a general purpose API for simplifying TLS session handling inside QEMU, particularly with a view to hiding the complexity of the handshake and x509 certificate validation. The VNC server was converted to use this API, which was a big benefit, but there was still a need to add extra code to support TLS in the I/O paths. Specifically, anywhere that the VNC server would read/write on the network socket, had to be made TLS aware so that it would use plain POSIX send/recv functions vs the TLS wrapped send/recv functions as appropriate. For the VNC server it is actually even more complex, because it also supports websockets, so each I/O point had to choose between plain, TLS, websockets and websockets plus TLS. As TLS support extends to other areas of QEMU this pattern would continue to complicate I/O paths in each backend.
Clearly there was a need for some form of I/O channel abstraction that would allow TLS to be enabled in each QEMU network backend without having to add conditional logic at every I/O send/recv call. Looking around at the QEMU subsystems that would ultimately need TLS support, showed a variety of approaches currently in use
- Character devices use combination of POSIX sockets APIs to establish connections and GIOChannel for performing I/O on them
- Migration has a QEMUFile abstraction which provides read/write facilities for a number of underlying transports, TCP sockets, UNIX sockets, STDIO, external command, in memory buffer and RDMA. The various QEMUFile impls all uses the plain POSIX sockets APIs and for TCP/UNIX sockets the sendmsg/recvmsg functions for I/O
- NBD client & server use plain POSIX sockets APIs and sendmsg/recvmsg for I/O
- VNC server uses plain POSIX sockets APIs and sendmsg/recvmsg for I/O
The GIOChannel APIs used by the character device backend theoretically provide an extensible framework for I/O and there is even a TLS implementation of the GIOChannel API. The two limitations of GIOChannel for QEMU though are that it does not support scatter / gather / vectored I/O APIs and that it does not support file descriptor passing over UNIX sockets. The latter is not a show stopper, since you can still access the socket handle directly to send/recv file descriptors. The lack of vectored I/O though would be a significant issue for migration and NBD servers where performance is very important. While we could potentially extend GIOChannel to add support for new callbacks to do vectored I/O, by the time you’ve done that most of the original GIOChannel code isn’t going to be used, limiting the benefit of starting from GIOChannel as a base. It is also clear that GIOChannel is really not something that is going to get any further development from the GLib maintainers, since their focus is on the new and much better GIO library. This supports file descriptor passing and TLS encryption, but again lacks support for vectored I/O. The bigger show stopper though is that to get access to the TLS support requires depending on a version on GLib that is much newer than what QEMU is willing to use. The existing QEMUFile APIs could form the basis of a general purpose I/O channel system if they were untangled & extracted from migration codebase. One limitation is that QEMUFile only concerns itself with I/O, not the initial channel establishment which is left to the migration core code to deal with, so did not actually provide very much of a foundation on which to build.
After looking through the various approaches in use in QEMU, and potentially available from GLib, it was decided that QEMU would be best served by creating a new general purpose I/O channel API. Thus a new QEMU subsystem was added in the io/ and include/io/ directories to provide a set of classes for I/O over a variety of different data channels. The core design aims were to use the QEMU object model (QOM) framework to provide a standard pattern for extending / subclassing, use the QEMU Error object for all error reporting, file descriptor passing, main loop watch integration and coroutine integration. Overall the new design took many elements of its design from GIOChannel and the GIO library, and blended them with QEMU’s own codebase design. The initial goal was to provide enough functionality to convert the VNC server as a proof of concept. To this end the following classes were created
- QIOChannel – the abstract base defining the overall interface for the I/O framework
- QIOChannelSocket – implementation targeting TCP, UDP and UNIX sockets
- QIOChannelTLS – layer that can provide a TLS session over any other channel
- QIOChannelWebsock – layer that can run the websockets protocol over any other channel
To avoid making this blog posting even larger, I won’t go into details of these (the code is available in QEMU git for anyone who’s really interesting), but instead illustrate it with a comparison of the VNC code before & after. First consider the original code in the VNC server for dealing with writing a buffer of data over a plain socket or websocket either with TLS enabled. The following functions existed in the VNC server code to handle all the combinations:
ssize_t vnc_tls_push(const char *buf, size_t len, void *opaque)
{
VncState *vs = opaque;
ssize_t ret;
retry:
ret = send(vs->csock, buf, len, 0);
if (ret < 0) {
if (errno == EINTR) {
goto retry;
}
return -1;
}
return ret;
}
ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
ssize_t ret;
int err = 0;
if (vs->tls) {
ret = qcrypto_tls_session_write(vs->tls, (const char *)data, datalen);
if (ret < 0) {
err = errno;
}
} else {
ret = send(vs->csock, (const void *)data, datalen, 0);
if (ret < 0) {
err = socket_error();
}
}
return vnc_client_io_error(vs, ret, err);
}
long vnc_client_write_ws(VncState *vs)
{
long ret;
vncws_encode_frame(&vs->ws_output, vs->output.buffer, vs->output.offset);
buffer_reset(&vs->output);
return vnc_client_write_buf(vs, vs->ws_output.buffer, vs->ws_output.offset);
}
static void vnc_client_write_locked(void *opaque)
{
VncState *vs = opaque;
if (vs->encode_ws) {
vnc_client_write_ws(vs);
} else {
vnc_client_write_plain(vs);
}
}
After conversion to use the new QIOChannel classes for sockets, websockets and TLS, all of the VNC server code above turned into
ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
{
Error *err = NULL;
ssize_t ret;
ret = qio_channel_write(vs->ioc, (const char *)data, datalen, &err);
return vnc_client_io_error(vs, ret, &err);
}
It is clearly a major win for maintainability of the VNC server code to have all the TLS and websockets I/O support handled by the QIOChannel APIs. There is no impact to supporting TLS and websockets anywhere in the VNC server I/O paths now. The only place where there is new code is the point where the TLS or websockets session is initiated and this now only requires instantiation of a suitable QIOChannel subclass and registering a callback to be run when the session handshake completes (or fails).
tls = qio_channel_tls_new_server(vs->ioc, vs->vd->tlscreds, vs->vd->tlsaclname, &err);
if (!tls) {
vnc_client_error(vs);
return 0;
}
object_unref(OBJECT(vs->ioc));
vs->ioc = QIO_CHANNEL(tls);
qio_channel_tls_handshake(tls, vnc_tls_handshake_done, vs, NULL);
Notice that the code is simply replacing the current QIOChannel handle ‘vs->ioc’ with an instance of the QIOChannelTLS class. The vnc_tls_handshake_done method is invoked when the TLS handshake is complete or failed and lets the VNC server continue with the next part of its authentication protocol, or drop the client connection as appropriate. So adding TLS session support to the VNC server comes in at about 10 lines of code now.
In this blog series:
This blog is part 3 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
When configuring a virtual machine, there are a number of places where QEMU requires some form of security sensitive credentials, typically passwords or encryption keys. Historically QEMU has had no standard approach for getting these credentials from the user, so things have grown in an adhoc manner with predictably awful results. If the VNC server is configured to use basic VNC authentication, then it requires a password to be set. When I first wrote patches to add password auth to QEMU’s VNC server it was clearly not desirable to expose the password on the command line, so when configuring VNC you just request password authentication be enabled using -vnc 0.0.0.0:0,password
and then have to use the monitor interface to set the actual password value “change vnc password
“. Until a password has been set via the monitor, the VNC server should reject all clients, except that we’ve accidentally broken this in the past, allowing clients when no server password is set :-( The qcow & qcow2 disk image formats support use of AES for encryption (remember this is horribly broken) and so there needs to be a way to provide the decryption password for this. Originally you had to wait for QEMU to prompt for the disk password on the interactive console. This clearly doesn’t work very nicely when QEMU is being managed by libvirt, so we added another monitor command which allows apps to provide the disk password upfront, avoiding the need to prompt. Fast forward a few years and QEMU’s block device layer gained support for various network protocols including iSCSI, RBD, FTP and HTTP(s). All of these potentially require authentication and thus a password needs to be provided to QEMU. The CURL driver for ftp, http(s) simply skipped support for authentication since there was no easy way to provide the passwords securely. Sadly, the iSCSI and RBD drivers simply decided to allow the password to be provided in the command line. Hence the passwords for RBD and iSCSI are visible in plain text in the process listing and in libvirt’s QEMU log files, which often get attached to bug reports, which has resulted in a CVE being filed against libvirt. I had an intention to add support for the LUKS format in the QEMU block layer which will also require passwords to be provided securely to QEMU, and it would be desirable if the x509 keys provided to QEMU could be encrypted too.
Looking at this mess and the likely future requirements, it was clear that QEMU was in desperate need of a standard mechanism for securely receiving credentials from the user / management app (libvirt). There are a variety of channels via which credentials can be theoretically passed to QEMU:
- Command line argument
- Environment variable
- Plain file
- Anonymous pipe
- Monitor command
As mentioned previously, using command line arguments or environment variables is not secure if the credential is passed in plain text, because they are visible in the processing list and log files. It would be possible to create a plain file on disk and write each password to it and use file permissions to ensure only QEMU can read it. Using files is not too bad as long as your host filesystem is on encrypted storage. It has a minor complexity of having to dynamically create files on the fly each time you want to hotplug a new device using a password. Most of these problems can be avoided by using an anonymous pipe, but this is more complicated for end users because for hotplugging devices it would require passing file descriptors over a UNIX socket. Finally the monitor provides a decent secure channel which users / mgmt apps will typically already have open via a UNIX socket. There is a chicken & egg problem with it though, because the credentials are often required at initial QEMU startup when parsing the command line arguments, and the monitor is not available that early.
After considering all the options, it was decided that using plain files and/or anonymous pipes to pass credentials would be the most desirable approach. The qemu_open() method has a convenient feature whereby there is a special path prefix that allows mgmt apps to pass a file descriptor across instead of a regular filename. To enable reuse of existing -object
command line argument and object_add
monitor commands for definin credentials, the QEMU object model framework (QOM) was used to define a ‘secret’ object class. The ‘secret
‘ class has a ‘path
‘ property which is the filename containing the credential. For example it could be used
# echo "letmein" > mydisk.pw
# $QEMU -object secret,id=sec0,file=mydisk.pw
Having written this, I realized that it would be possible to allow passwords to be provided directly via the command line if we allowed secret data to be encrypted with a master key. The idea would be that when a QEMU process is first started, it gets given a new unique AES key via a file. The credentials for individual disks / servers would be encrypted with the master key and then passed directly on the command line. The benefit of this is that the mgmt app only needs to deal with a single file on disk with a well defined lifetime.
First a master key is generated and saved to a file in base64 format
# openssl rand -base64 32 > master-key.b64
Lets say we have two passwords we need to give to QEMU. We will thus need two initialization vectors
# openssl rand -base64 16 > sec0-iv.b64
# openssl rand -base64 16 > sec1-iv.b64
Each password is now encrypted using the master key and its respective initialization vector
# SEC0=$(printf "letmein" |
openssl enc -aes-256-cbc -a \
-K $(base64 -d master-key.b64 | hexdump -v -e '/1 "%02X"') \
-iv $(base64 -d sec0-iv.b64 | hexdump -v -e '/1 "%02X"'))
# SEC1=$(printf "1234567" |
openssl enc -aes-256-cbc -a \
-K $(base64 -d master-key.b64 | hexdump -v -e '/1 "%02X"') \
-iv $(base64 -d sec1-iv.b64 | hexdump -v -e '/1 "%02X"'))
Finally when QEMU is launched, three secrets are defined, the first gives the master key via a file, and the others provide the two encrypted user passwords
# $QEMU \
-object secret,id=secmaster,format=base64,file=key.b64 \
-object secret,id=sec0,keyid=secmaster,format=base64,\
data=$SECRET,iv=$(<sec0-iv.b64) \
-object secret,id=sec1,keyid=secmaster,format=base64,\
data=$SECRET,iv=$(<sec1-iv.b64) \
...other args using the secrets...
Now we have a way to securely get credentials into QEMU, there just remains the task of associating the secrets with the things in QEMU that need to use them. The TLS credentials object previously added originally required the x509 server key to be provided in an unencrypted PEM file. The tls-creds-x509
object can now gain a new property “passwordid
” which provides the ID of a secret object that defines the password to use for decrypting the x509 key.
# $QEMU \
-object secret,id=secmaster,format=base64,file=key.b64 \
-object secret,id=sec0,keyid=secmaster,format=base64,\
data=$SECRET,iv=$(<sec0-iv.b64) \
-object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,endpoint=server,passwordid=sec0 \
-vnc 0.0.0.0:0,tls-creds=tls0
Aside from adding support for encrypted x509 certificates, the RBD, iSCSI and CURL block drivers in QEMU have all been updated to allow authentication passwords to be provided using the ‘secret
‘ object type. Libvirt will shortly be gaining support to use this facility which will address the long standing problem of RBD/ISCSI passwords being visible in clear text in the QEMU process command line arguments. All the enhancements described in this posting have been merged for the forthcoming QEMU 2.6.0 release so will soon be available to users. The corresponding enhancements to libvirt to make use of these features are under active development.
In this blog series:
This blog is part 2 of a series I am writing about work I’ve completed over the past few releases to improve QEMU security related features.
After the initial consolidation of cryptographic APIs into one area of the QEMU codebase, it was time to move onto step two, which is where things start to directly benefit the users. With the original patches to support TLS in the VNC server, configuration of the TLS credentials was done as part of the -vnc
command line argument. The downside of such an approach is that as we add support for TLS to other arguments like -chardev, the user would have to supply the same TLS information in multiple places. So it was necessary to isolate the TLS credential configuration from the TLS session handling code, enabling a single set of TLS credentials to be associated with multiple network servers (they can of course each have a unique set of credentials if desired). To achieve this, the new code made use of QEMU’s object model framework (QOM) to define a general TLS credentials interface and then provide implementations for anonymous credentials (totally insecure but needed for back compat with existing QEMU features) and for x509 certificates (the preferred & secure option). There are now two QOM types tls-creds-anon
and tls-creds-x509
that can be created on the command line via QEMU’s -object argument, or in the monitor using the ‘object_add’ command. The VNC server was converted to use the new TLS credential objects for its configuration, so whereas in QEMU 2.4 VNC with TLS would be configured using
-vnc 0.0.0.0:0,tls,x509verify=/path/to/certificates/directory
As of QEMU 2.5 the preferred approach is to use the new credential objects
-object tls-creds-x509,id=tls0.endpoint=server,dir=/path/to/certificates/directory
-vnc 0.0.0.0:0,tls-creds=tls0
The old CLI syntax is still supported, but gets translated internally to create the right TLS credential objects. By default the x509 credentials will require that the client provide a certificate, which is equivalent to the traditional ‘x509verify
‘ option for VNC. To remove the requirement for client certs, the ‘verify-peer=no
‘ option can be given when creating the x509 credentials object.
Generating correct x509 certificates is something that users often struggle with and when getting it wrong the failures are pretty hard to debug – usually just resulting in an unhelpful “handshake failed” type error message. To help troubleshoot problems, the new x509 credentials code in QEMU will sanity check all certificates it loads prior to using them. For example, it will check that the CA certificate has basic constraints set to indicate usage as a CA, catching problems where people give a server/client cert instead of a CA cert. Likewise it will check that the server certificate has basic constraints set to indicate usage in a server. It’ll check that the server certificate is actually signed by the CA that is provided and that none of the certs have expired already. These are all things that the client would check when it connects, so we’re not adding / removing security here, just helping administrators to detect misconfiguration of their TLS certificates as early as possible. These same checks have been done in libvirt for several years now and have been very beneficial in reducing the bugs reports we get related to misconfiguration of TLS.
With the generic TLS credential objects created, the second step was to create a general purpose API for handling the TLS protocol inside QEMU, especially the simplifying the handshake which requires a non-negligible amount of code. The TLS session APIs were designed such that they are independent of the underling data transport since while the VNC server always runs over TCP/UNIX sockets, other QEMU backends may wish to run TLS over non-socket based transports. Overall the API for dealing with TLS session establishment in QEMU can be used as follows
static ssize_t mysock_send(const char *buf, size_t len,
void *opaque)
{
int fd = GPOINTER_TO_INT(opaque);
return write(*fd, buf, len);
}
static ssize_t mysock_recv(const char *buf, size_t len,
void *opaque)
{
int fd = GPOINTER_TO_INT(opaque);
return read(*fd, buf, len);
}
static int mysock_run_tls(int sockfd,
QCryptoTLSCreds *creds,
Error *erp)
{
QCryptoTLSSession *sess;
sess = qcrypto_tls_session_new(creds,
"vnc.example.com",
NULL,
QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT,
errp);
if (sess == NULL) {
return -1;
}
qcrypto_tls_session_set_callbacks(sess,
mysock_send,
mysock_recv,
GINT_TO_POINTER(fd));
while (1) {
if (qcrypto_tls_session_handshake(sess, errp) < 0) {
qcrypto_tls_session_free(sess);
return -1;
}
switch(qcrypto_tls_session_get_handshake_status(sess)) {
case QCRYPTO_TLS_HANDSHAKE_COMPLETE:
if (qcrypto_tls_session_check_credentials(sess, errp) < )) {
qcrypto_tls_session_free(sess);
return -1;
}
goto done;
case QCRYPTO_TLS_HANDSHAKE_RECVING:
...wait for GIO_IN event on fd...
break;
case QCRYPTO_TLS_HANDSHAKE_SENDING:
...wait for GIO_OUT event on fd...
break;
}
}
done:
....send/recv payload data on sess...
qcrypto_tls_session_free(sess):
}
The particularly important thing to note with this example is how the network service (eg VNC, NBD, chardev) that is enabling TLS no longer has to have any knowledge of x509 certificates. They are loaded automatically when the user provides the ‘-object tls-creds-x509
‘ argument to QEMU, and they are validated automatically by the call to qcrypto_tls_session_handshake()
. This makes it easy to add TLS support to other network backends in QEMU, with minimal overhead significantly lowering the risk of screwing up the security. Since it already had TLS support, the VNC server was converted to use this new TLS session API instead of using the gnutls APIs directly. Once again this had a very positive impact on maintainability of the VNC code, since it allowed countless #ifdef CONFIG_GNUTLS conditionals to be removed which clarified the code flow significantly. This work on TLS and the VNC server all merged for the 2.5 release of QEMU, so is already available to benefit users. There is corresponding libvirt work to be done still to convert over to use the new command line syntax for configuring TLS with QEMU.
In this blog series: