Over the past month I’ve been working on a few different areas of libvirt to address some performance bottlenecks the code has. This article is about improvements to the logging subsystem in libvirt, primarily to reduce the CPU overhead that the libvirtd daemon imposes on virtualization hosts. This might not seem like a big deal, but when you have many 100’s or 1000’s of containers packed into a machine, libvirtd CPU usage could start to become quite noticable at times.
Over the years the libvirt codebase has gained more and more debugging statements which have been worth their weight in gold when it comes to troubleshooting problems in deployments. Through the configuration files or environment variables it is possible to tailor logging to output to stderr, plain files, syslog or, most recently, the systemd journal, as well as filters to restrict which files will emit log messages. One less well known feature though is that libvirt has an internal logging ring buffer which records every single log message, no matter what logging output / filter settings are configured. It was possible to send libvirt a signal to dump this ring buffer to stderr, and it would also be dumped automatically upon crash.
For some reason, we had never previously thought to measure the impact of our logging framework on the performance of libvirt. Investigating an unrelated performance problem one day I noticed that oprofile was showing unexpectedly high counts against asprintf/memcpy/malloc functions in an area of code that I wasn’t expecting to be utilizing those functions. I recently had time to investigate this in more detail and discovered that the cause was the internal logging ring buffer. We should have realized long ago that the task of asprintf formatting the log messages and copying them into the ring buffer, would have a non-negligible CPU overhead as the number of log statements in our codebase increased.
To enable the overhead to be more precisely quantified, I wrote a short benchmark program for libvirt which simulated the behaviour of the libvirtd main thread running the event loop with 500 virtual machines / containers all doing simultaneous virtual console I/O. With the existing code, running ~50,000 iterations of this simulated event loop took 1 minute 40 seconds. After a quick hack to turn our logging code into a complete no-op, the demo program completed in only 3 seconds. IOW, the logging code was responsible for 97% of the CPU time consumed by the benchmark. This is somewhat embarrassing, to say the least :-)
Obviously we can not leave the logging code as a complete no-op permanently, since it is incredibly valuable to have it. What we need is for it to be as close to a no-op as possible, unless a specific logging output / filter is actually enabled. With this goal, unfortunately, it became clear that the idea of an always-on internal ring buffer for collecting all logs just had to go. There is simply no way to reduce the overhead of this ring buffer, since it inherently has to asprintf the log messages, which is where the majority of the performance hit comes from. At first I just tweaked the way the ring buffer operated so that it would only collect messages that were already generated for an explicitly configured log output. This reduced the runtime of the benchmark program to 4.6 seconds which is pretty decent. After doing this though, we came to the conclusion that the ring buffer as a concept was now pretty much worthless, since the messages it contains were already being output to another log target. So in the end I just deleted the ring buffer code entirely.
The remaining delta of 1.6 seconds over the base line 3 second execution time was potentially a result of two things. First, for each log message generated, there would be multiple string comparisons against the filename of the code emitting it, in order to determine if log messages for that file were requested for output to a log target. Second, the logging code is inside a mutex protected block to ensure serialization of outputs. The latter did not affect the benchmark program since it was single threaded, and uncontended mutex locks on Linux have negligible overhead. It would, however, affect real world usage of libvirtd since the daemon is heavily threaded. To address both of these potential performance problems, we introduced the idea of a log “category” object which has to be statically declared in each source file that intends to emit log messages. This category object has internal state which determines whether log messages from that source file will be sent on to a log output or just dropped. Without going into too much detail, the key benefit this brought, is that we can determine if any single log message needs to be output with only 2 lockless, integer comparisons in the common case. IOW this removed the repeated string comparisons (only need to be done once now) and removed the mutex locking. With this improvement the logging benchmark takes only 3 seconds. IOW there is no appreciable overhead from logging, except for those files where it is explicitly requested.
This performance improvement was released last month in the 1.2.3 release of libvirt, and thus in Fedora rawhide repositories.
EDIT: BTW, if you have an older libvirt you can edit /etc/libvirt/libvirtd.conf, set “log_buffer_size=0” and then restart libvirtd, which will disable the internal ring buffer and thus avoid the worst of the CPU wastage.
In “fantasy island” the libvirt and KVM/LXC code is absolutely perfect and always does exactly what you want it todo. Back in the real world, however, there may be annoying bugs in libvirt, KVM/LXC, the kernel and countless other parts of the OS that conspire to cause you great pain and suffering. This blog post contains a very quick introduction to debugging/troubleshooting libvirt problems, particularly focusing on the KVM and LXC drivers.
libvirt logging capabilities
The libvirt code is full of logging statements which can be instrumental in understanding where a problem might lie.
Configuring libvirtd logging
Current releases of libvirt will log problems occurring in libvirtd at level WARNING/ERROR to a dedicated log file /var/log/libvirt/libvirtd.log
, while older releases would be send them to syslog, typically ending up in /var/log/messages
. The libvirtd configuration file has two parameters that can be used to increase the amount of logging information printed.
log_filters="...filter string..."
log_outputs="...destination config..."
The logging documentation describes these in some detail. If you just want to quickly get started though, it suffices to understand that filter strings are simply doing substring matches against libvirt source filenames. So to enable all debug information from ‘src/util/event.c’ (the libvirt event loop) you would set
log_filters="1:event"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
If you wanted to enable logging for everything in ‘src/util’, except for ‘src/util/event.c’ you would set
log_filters="3:event 1:util"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
Configuring libvirt client logging
On the client side of libvirt there is no configuration file to put log settings in, so instead, there are a couple of environment variables. These take exactly the same type of strings as the libvirtd configuration file
LIBVIRT_LOG_FILTERS="...filter string..."
LIBVIRT_LOG_OUTPUTS="...destination config..."
export LIBVIRT_LOG_FILTERS LIBVIRT_LOG_OUTPUTS
One thing to be aware of is that with the KVM and LXC drivers in libvirt, very little code is ever run on the libvirt client. The only interesting pieces are the RPC code, event loop and main API entrypoints. To enable debugging of the RPC code you might use
LIBVIRT_LOG_FILTERS="1:rpc" LIBVIRT_LOG_OUTPUTS="1:stderr" virsh list
Useful log filter settings for KVM and LXC
The following are some useful values for logging wrt the KVM and LXC drivers
- All libvirt public APIs invoked
- 1:libvirt
- All external commands run by libvirt
- 1:command
- Cgroups management
- 1:cgroup
- All QEMU driver code
- 1:qemu
- QEMU text monitor commands
- 1:qemu_monitor_text
- QEMU JSON/QMP monitor commands
- 1:qemu_monitor_json
- All LXC driver code
- 1:lxc
- All lock management code
- 1:locking
- All security manager code
- 1:security
QEMU driver logfiles
Every QEMU process run by libvirt has a dedicated log file /var/log/libvirt/qemu/$VMNAME.log
which captures any data that QEMU writes to stderr/stdout. It also contains timestamps written by libvirtd whenever the QEMU process is started, and exits. Finally, prior to starting a guest, libvirt will write out the full set of environment variables and command line arguments it intends to launch QEMU with.
If you are running libvirtd with elevated log settings, there is also the possibility that some of the logging output will end up in the per-VM logfile, instead of the location set by the log_outputs
configuration parameter. This is because a little bit of libvirt code will run in the child process between the time it is forked and QEMU is exec()d.
LXC driver logfiles
Every LXC process run by libvirt has a dedicated log file /var/log/libvirt/qemu/$VMNAME.log
which captures any data that QEMU writes to stderr/stdout. As with QEMU it will also contain the command line args libvirt uses, though these are much less interesting in the LXC case. The LXC logfile is mostly useful for debugging the initial container bootstrap process.
Troubleshooting SELinux / sVirt
On a RHEL or Fedora host, the out of the box configuration will run all guests under confined SELinux contexts. One common problem that may affect developers running libvirtd straight from the source tree is that libvirtd itself will run under the wrong context, which in turn prevents guests from running correctly. This can be addressed in two ways, first by manually labelling the libvirtd binary after each rebuild
chcon system_u:object_r:virtd_exec_t:s0 $SRCTREE/daemon/.libs/lt-libvirtd
Or by specifying a label when executing libvirtd
runcon system_u:object_r:virtd_exec_t:s0 $SRCTREE/daemon/libvirtd
Another problem might be with libvirt not correctly labelling some device needed by the QEMU process. The best way to see what’s going on here, is to enable libvirtd logging with a filter of “1:security_selinux”, which will print out a message for every single file path that libvirtd labels. Then look at the log to see that everything expected is present:
14:36:57.223: 14351: debug : SELinuxGenSecurityLabel:284 : model=selinux label=system_u:system_r:svirt_t:s0:c669,c903 imagelabel=system_u:object_r:svirt_image_t:s0:c669,c903 baselabel=(null)
14:36:57.350: 14351: info : SELinuxSetFilecon:402 : Setting SELinux context on '/var/lib/libvirt/images/f16x86_64.img' to 'system_u:object_r:svirt_image_t:s0:c669,c903'
14:36:57.350: 14351: info : SELinuxSetFilecon:402 : Setting SELinux context on '/home/berrange/boot.iso' to 'system_u:object_r:virt_content_t:s0'
14:36:57.551: 14351: debug : SELinuxSetSecurityDaemonSocketLabel:1129 : Setting VM f16x86_64 socket context unconfined_u:unconfined_r:unconfined_t:s0:c669,c903
If a guest is failing to start, then there are two ways to double check if it really is SELinux related. SELinux can be put into permissive mode on the virtualization host
setenforce 0
Or the sVirt driver can be disabled in libvirt entirely
# vi /etc/libvirt/qemu.conf
...set 'security_driver="none" ...
# service libvirtd restart
Troubleshooting cgroups
When libvirt runs guests on modern Linux systems, cgroups will be used to control aspects of the guests’ execution. If any cgroups are mounted on the host when libvirtd starts up, it will create a basic hierarchy
$MOUNT_POINT
|
+- libvirt
|
+- qemu
+- lxc
When starting a KVM or LXC guest, further directories will be created, one per guest, so that after a while the tree will look like
$MOUNT_POINT
|
+- libvirt
|
+- qemu
| |
| +- VMNAME1
| +- VMNAME1
| +- VMNAME1
| +- ...
| ...
+- lxc
|
+- VMNAME1
+- VMNAME1
+- VMNAME1
+- ...
Assuming the host administrator has not changed the policy in the top level cgroups, there should be no functional change to operation of the guests with this default setup. There are possible exceptions though if you are trying something unusal. For example, the ‘devices’ cgroups controller will be used to setup a whitelist of block / character devices that QEMU is allowed to access. So if you have modified QEMU to access to funky new device, libvirt will likely block this via the cgroups device ACL. Due to various kernel bugs, some of the cgroups controllers have also had a detrimental performance impact on both QEMU guest and the host OS as a whole.
libvirt will never try to mount any cgroups itself, so the quickest way to stop libvirt using cgroups is to stop the host OS from mounting them. This is not always desirable though, so there is a configuration parameter in /etc/libvirt/qemu.conf which can be used to restrict what cgroups libvirt will use.
Running from the GIT source tree
Sometimes when troubleshooting a particularly hard problem it might be desirable to build libvirt from the latest GIT source and run that. When doing this is a good idea not to overwrite your distro provided installation with a GIT build, but instead run libvirt directly from the source tree. The first thing to be careful of is that the custom build uses the right installation prefix (ie /etc, /usr, /var and not /usr/local). To simplify this libvirt provides an ‘autogen.sh’ script to run all the right libtool commands and set the correct prefixes. So to build libvirt from GIT, in a way that is compatible with a typical distro build use:
./autogen.sh --system --enable-compile-warnings=error
make
Hint: use make -j 4 (or larger) to significantly speed up the build on multi-core systems
To run libvirtd from the source tree, as root, stop the existing daemon and invoke the libtool wrapper script
# service libvirtd stop
# ./daemon/libvirtd
Or to run with SELinux contexts
# service libvirtd stop
# runcon system_u:system_r:virtd_t:s0-s0:c0.c1023 ./daemon/libvirtd
virsh can easily be run from the source tree in the same way
# ./tools/virsh ....normal args...
Running python programs against a non-installed libvirt gets a little harder, but that can be overcome too
$ export PYTHONPATH=$SOURCETREE/python:$SOURCETREE/python/.libs
$ export LD_LIBRARY_PATH=$SOURCETREE/src/.libs
$ python virt-manager --no-fork
When running the LXC driver, it is necessary to make a change to the guest XML to point it to a different emulator. Running ‘virsh edit $GUEST’ change
/usr/libexec/libvirt_lxc
to
$SOURCETREE/src/libvirt_lxc
(expand $SOURCETREE to be the actual path of the GIT checkout – libvirt won’t interpret env vars in the XML)