ANNOUNCE: gtk-vnc 0.7.0 release including 2 security fixes

Posted: February 10th, 2017 | Filed under: Fedora, Gtk-Vnc, libvirt, Security, Virt Tools | No Comments »

I’m pleased to announce a new release of GTK-VNC, vesion 0.7.0. The release focus is on bug fixing and includes fixes for two publically reported security bugs which allow a malicious server to exploit the client. Similar bugs were recently reported & fixed in other common VNC clients too.

  • CVE-2017-5884 – fix bounds checking for RRE, hextile and copyrect encodings
  • CVE-2017-5885 – fix color map index bounds checking
  • Add API to allow smooth scaling to be disabled
  • Workaround to help SPICE servers quickly drop VNC clients which mistakenly connect, by sending “RFB ” signature bytes early
  • Don’t accept color map entries for true-color pixel formats
  • Add missing vala .deps files for gvnc & gvncpulse
  • Avoid crash if host/port is NULL
  • Add precondition checks to some public APIs
  • Fix link to home page in README file
  • Fix misc memory leaks
  • Clamp cursor hot-pixel to within cursor region

Thanks to all those who reported bugs and provides patches that went into this new release.

The surprisingly complicated world of disk image sizes

Posted: February 10th, 2017 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Virt Tools | Tags: , , | No Comments »

When managing virtual machines one of the key tasks is to understand the utilization of resources being consumed, whether RAM, CPU, network or storage. This post will examine different aspects of managing storage when using file based disk images, as opposed to block storage. When provisioning a virtual machine the tenant user will have an idea of the amount of storage they wish the guest operating system to see for their virtual disks. This is the easy part. It is simply a matter of telling ‘qemu-img’ (or a similar tool) ’40GB’ and it will create a virtual disk image that is visible to the guest OS as a 40GB volume. The virtualization host administrator, however, doesn’t particularly care about what size the guest OS sees. They are instead interested in how much space is (or will be) consumed in the host filesystem storing the image. With this in mind, there are four key figures to consider when managing storage:

  • Capacity – the size that is visible to the guest OS
  • Length – the current highest byte offset in the file.
  • Allocation – the amount of storage that is currently consumed.
  • Commitment – the amount of storage that could be consumed in the future.

The relationship between these figures will vary according to the format of the disk image file being used. For the sake of illustration, raw and qcow2 files will be compared since they provide an examples of the simplest file format and the most complicated file format used for virtual machines.

Raw files

In a raw file, the sectors visible to the guest are mapped 1-2-1 onto sectors in the host file. Thus the capacity and length values will always be identical for raw files – the length dictates the capacity and vica-verca. The allocation value is slightly more complicated. Most filesystems do lazy allocation on blocks, so even if a file is 10 GB in length it is entirely possible for it to consume 0 bytes of physical storage, if nothing has been written to the file yet. Such a file is known as “sparse” or is said to have “holes” in its allocation. To maximize guest performance, it is common to tell the operating system to fully allocate a file at time of creation, either by writing zeros to every block (very slow) or via a special system call to instruct it to immediately allocate all blocks (very fast). So immediately after creating a new raw file, the allocation would typically either match the length, or be zero. In the latter case, as the guest writes to various disk sectors, the allocation of the raw file will grow. The commitment value refers the upper bound for the allocation value, and for raw files, this will match the length of the file.

While raw files look reasonably straightforward, some filesystems can create surprises. XFS has a concept of “speculative preallocation” where it may allocate more blocks than are actually needed to satisfy the current I/O operation. This is useful for files which are progressively growing, since it is faster to allocate 10 blocks all at once, than to allocate 10 blocks individually. So while a raw file’s allocation will usually never exceed the length, if XFS has speculatively preallocated extra blocks, it is possible for the allocation to exceed the length. The excess is usually pretty small though – bytes or KBs, not MBs. Btrfs meanwhile has a concept of “copy on write” whereby multiple files can initially share allocated blocks and when one file is written, it will take a private copy of the blocks written. IOW, to determine the usage of a set of files it is not sufficient sum the allocation for each file as that would over-count the true allocation due to block sharing.

QCow2 files

In a qcow2 file, the sectors visible to the guest are indirectly mapped to sectors in the host file via a number of lookup tables. A sector at offset 4096 in the guest, may be stored at offset 65536 in the host. In order to perform this mapping, there are various auxiliary data structures stored in the qcow2 file. Describing all of these structures is beyond the scope of this, read the specification instead. The key point is that, unlike raw files, the length of the file in the host has no relation to the capacity seen in the guest. The capacity is determined by a value stored in the file header metadata. By default, the qcow2 file will grow on demand, so the length of the file will gradually grow as more data is stored. It is possible to request preallocation, either just of file metadata, or of the full file payload too. Since the file grows on demand as data is written, traditionally it would never have any holes in it, so the allocation would always match the length (the previous caveat wrt to XFS speculative preallocation still applies though). Since the introduction of SSDs, however, the notion of explicitly cutting holes in files has become commonplace. When this is plumbed through from the guest, a guest initiated TRIM request, will in turn create a hole in the qcow2 file, which will also issue a TRIM to the underlying host storage. Thus even though qcow2 files are grow on demand, they may also become sparse over time, thus allocation may be less than the length. The maximum commitment for a qcow2 file is surprisingly hard to get an accurate answer to. To calculate it requires intimate knowledge of the qcow2 file format and even the type of data stored in it. There is allocation overhead from the data structures used to map guest sectors to host file offsets, which is directly proportional to the capacity and the qcow2 cluster size (a cluster is the qcow2 equivalent “sector” concept, except much bigger – 65536 bytes by default). Over time qcow2 has grown other data structures though, such as various bitmap tables tracking cluster allocation and recent writes. With the addition of LUKS support, there will be key data tables. Most significantly though is that qcow2 can internally store entire VM snapshots containing the virtual device state, guest RAM and copy-on-write disk sectors. If snapshots are ignored, it is possible to calculate a value for the commitment, and it will be proportional to the capacity. If snapshots are used, however, all bets are off – the amount of storage that can be consumed is unbounded, so there is no commitment value that can be accurately calculated.

Summary

Considering the above information, for a newly created file the four size values would look like

Format Capacity Length Allocation Commitment
raw (sparse) 40GB 40GB 0 40GB [1]
raw (prealloc) 40GB 40GB 40GB [1] 40GB [1]
qcow2 (grow on demand) 40GB 193KB 196KB 41GB [2]
qcow2 (prealloc metadata) 40GB 41GB 6.5MB 41GB [2]
qcow2 (prealloc all) 40GB 41GB 41GB 41GB [2]
[1] XFS speculative preallocation may cause allocation/commitment to be very slightly higher than 40GB
[2] use of internal snapshots may massively increase allocation/commitment

For an application attempting to manage filesystem storage to ensure any future guest OS write will always succeed without triggering ENOSPC (out of space) in the host, the commitment value is critical to understand. If the length/allocation values are initially less than the commitment, they will grow towards it as the guest writes data. For raw files it is easy to determine commitment (XFS preallocation aside), but for qcow2 files it is unreasonably hard. Even ignoring internal snapshots, there is no API provided by libvirt that reports this value, nor is it exposed by QEMU or its tools. Determining the commitment for a qcow2 file requires the application to not only understand the qcow2 file format, but also directly query the header metadata to read internal parameters such as “cluster size” to be able to then calculate the required value. Without this, the best an application can do is to guess – e.g. add 2% to the capacity of the qcow2 file to determine likely commitment. Snapshots may life even harder, but to be fair, qcow2 internal snapshots are best avoided regardless in favour of external snapshots. The lack of information around file commitment is a clear gap that needs addressing in both libvirt and QEMU.

That all said, ensuring the sum of commitment values across disk images is within the filesystem free space is only one approach to managing storage. These days QEMU has the ability to live migrate virtual machines even when their disks are on host-local storage – it simply copies across the disk image contents too. So a valid approach is to mostly ignore future commitment implied by disk images, and instead just focus on the near term usage. For example, regularly monitor filesystem usage and if free space drops below some threshold, then migrate one or more VMs (and their disk images) off to another host to free up space for remaining VMs.

Commenting out XML snippets in libvirt guest config by stashing it as metadata

Posted: February 8th, 2017 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Virt Tools | Tags: , , | 3 Comments »

Libvirt uses XML as the format for configuring objects it manages, including virtual machines. Sometimes when debugging / developing it is desirable to comment out sections of the virtual machine configuration to test some idea. For example, one might want to temporarily remove a secondary disk. It is not always desirable to just delete the configuration entirely, as it may need to be re-added immediately after. XML has support for comments <!-- .... some text --> which one might try to use to achieve this. Using comments in XML fed into libvirt, however, will result in an unwelcome suprise – the commented out text is thrown into /dev/null by libvirt.

This is an unfortunate consequence of the way libvirt handles XML documents. It does not consider the XML document to be the master representation of an object’s configuration – a series of C structs are the actual internal representation. XML is simply a data interchange format for serializing structs into a text format that can be interchanged with the management application, or persisted on disk. So when receiving an XML document libvirt will parse it, extracting the pieces of information it cares about which are they stored in memory in some structs, while the XML document is discarded (along with the comments it contained). Given this way of working, to preserve comments would require libvirt to add 100’s of extra fields to its internal structs and extract comments from every part of the XML document that might conceivably contain them. This is totally impractical to do in realityg. The alternative would be to consider the parsed XML DOM as the canonical internal representation of the config. This is what the libvirt-gconfig library in fact does, but it means you can no longer just do simple field accesses to access information – getter/setter methods would have to be used, which quickly becomes tedious in C. It would also involve re-refactoring almost the entire libvirt codebase so such a change in approach would realistically never be done.

Given that it is not possible to use XML comments in libvirt, what other options might be available ?

Many years ago libvirt added the ability to store arbitrary user defined metadata in domain XML documents. The caveat is that they have to be located in a specific place in the XML document as a child of the <metadata> tag, in a private XML namespace. This metadata facility to be used as a hack to temporarily stash some XML out of the way. Consider a guest which contains a disk to be “commented out”:

<domain type="kvm">
  ...
  <devices>
    ...
    <disk type='file' device='disk'>
    <driver name='qemu' type='raw'/>
    <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    ...
  </devices>
</domain>

To stash the disk config as a piece of metadata requires changing the XML to

<domain type="kvm">
  ...
  <metadata>
    <s:disk xmlns:s="http://stashed.org/1" type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </s:disk>
  </metadata>
  ...
  <devices>
    ...
  </devices>
</domain>

What we have done here is

– Added a <metadata> element at the top level
– Moved the <disk> element to be a child of <metadata> instead of a child of <devices>
– Added an XML namespace to <disk> by giving it an ‘s:’ prefix and associating a URI with this prefix

Libvirt only allows a single top level metadata element per namespace, so if there are multiple tihngs to be stashed, just give them each a custom namespace, or introduce an arbitrary wrapper. Aside from mandating the use of a unique namespace, libvirt treats the metadata as entirely opaque and will not try to intepret or parse it in any way. Any valid XML construct can be stashed in the metadata, even invalid XML constructs, provided they are hidden inside a CDATA block. For example, if you’re using virsh edit to make some changes interactively and want to get out before finishing them, just stash the changed in a CDATA section, avoiding the need to worry about correctly closing the elements.

<domain type="kvm">
  ...
  <metadata>
    <s:stash xmlns:s="http://stashed.org/1">
    <![CDATA[
      <disk type='file' device='disk'>
        <driver name='qemu' type='raw'/>
        <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
        <target dev='vda' bus='virtio'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      </disk>
      <disk>
        <driver name='qemu' type='raw'/>
        ...i'll finish writing this later...
    ]]>
    </s:stash>
  </metadata>
  ...
  <devices>
    ...
  </devices>
</domain>

Admittedly this is a somewhat cumbersome solution. In most cases it is probably simpler to just save the snippet of XML in a plain text file outside libvirt. This metadata trick, however, might just come in handy some times.

As an aside the real, intended, usage of the <metdata> facility is to allow applications which interact with libvirt to store custom data they may wish to associated with the guest. As an example, the recently announced libvirt websockets console proxy uses it to record which consoles are to be exported. I know of few other real world applications using this metadata feature, however, so it is worth remembering it exists :-) System administrators are free to use it for local book keeping purposes too.

ANNOUNCE: New libvirt project Go XML parser model

Posted: January 5th, 2017 | Filed under: Coding Tips, Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , | No Comments »

Shortly before christmas, I announced the availability of new Go bindings for the libvirt API. This post announces a companion package for dealing with XML parsing/formatting in Go. The master repository is available on the libvirt GIT server, but it is expected that Go projects will consume it via an import of the github mirror, since the Go ecosystem is heavilty github focused (e.g. godoc.org can’t produce docs for stuff hosted on libvirt.org git)

import (
  libvirtxml "github.com/libvirt/libvirt-go-xml"
  "encoding/xml"
)

domcfg := &libvirtxml.Domain{Type: "kvm", Name: "demo",
                             UUID: "8f99e332-06c4-463a-9099-330fb244e1b3",
                             ....}
xmldoc, err := xml.Marshal(domcfg)

API documentation is available on the godoc website.

When dealing with the libvirt API, most applications will find themselves needing to either parse or format XML documents describing configuration of various libvirt objects. Traditionally this task has been left upto the application to deal with and as a result most applications end up creating some kind of structure / object model to represent the XML document in a more easily accessible manner. To try to reduce this duplicate effort, libvirt has already created the libvirt-glib package, which contains a libvirt-gconfig library mapping libvirt XML documents into the GObject world. This library is accessible to many programming languages via the magic of GObject Introspection, and while there is some work to support this in Go, it is not particularly mature at this time.

In the Go world, there is a package “encoding/xml” which is able to transform between XML documents and Go structs, given suitable annotations on the struct fields. It is very easy to deal with, simply requiring someone to define a bit set of structs with annotated fields to map to the XML document. There’s no real “code” to write as it is really a data definition task.  Looking at applications using libvirt in Go, we see quite a few have already go down this route for dealing with libvirt XML. It should be readily apparent that every application using libvirt in Go is essentially going to end up writing an identical set of structs to deal with the XML handling. This duplication of effort makes no sense at all, and as such, we have started this new libvirt-go-xml package to provide a standard set of Go structs to deal with libvirt XML. The current level of schema support is pretty minimal supporting the capabilities XML, secrets XML and a small amount of the domain XML, so we’d encourage anyone interested in this to contribute patches to expand the XML schema coverage.

The following illustrates a further example of its usage in combination with the libvirt-go library (with error checking omitted for brevity):

import (
  libvirt "github.com/libvirt/libvirt-go"
  libvirtxml "github.com/libvirt/libvirt-go-xml"
  "encoding/xml"
  "fmt"
)

conn, err := libvirt.NewConnect("qemu:///system")
dom := conn.LookupDomainByName("demo")
xmldoc, err := dom.GetXMLDesc(0)

domcfg := &libvirtxml.Domain{}
err := xml.Unmarshal([]byte(xmldocC), domcfg)

fmt.Printf("Virt type %s", domcfg.Type)

 

ANNOUNCE: New libvirt project Go language bindings

Posted: December 15th, 2016 | Filed under: Fedora, libvirt, OpenStack, Virt Tools | Tags: , , , | No Comments »

I’m happy to announce that the libvirt project is now supporting Go language bindings as a primary deliverable, joining Python and Perl, as language bindings with 100% API coverage of libvirt C library. The master repository is available on the libvirt GIT server, but it is expected that Go projects will consume it via an import of the github mirror, since the Go ecosystem is heavilty github focused (e.g. godoc.org can’t produce docs for stuff hosted on libvirt.org git)

import (
    libvirt "github.com/libvirt/libvirt-go"
)

conn, err := libvirt.NewConnect("qemu:///system")
...

API documentation is available on the godoc website.

For a while now libvirt has relied on 3rd parties to provide Go language bindings. The one most people use was first created by Alex Zorin and then taken over by Kyle Kelly. There’s been a lot of excellent work put into these bindings, however, the API coverage largely stops at what was available in libvirt 1.2.2, with the exception of a few APIs from libvirt 1.2.14 which have to be enabled via Go build tags. Libvirt is now working on version 3.0.0 and there have been many APIs added in that time, not to mention enums and other constants. Comparing the current libvirt-go API coverage against what the libvirt C library exposes reveals 163 missing functions (out of 476 total), 367 missing enum constants (out of 847 total) and 165 missing macro constants (out of 200 total). IOW while there is alot already implemented, there was still a very long way to go.

Initially I intended to contribute patches to address the missing API coverage to the existing libvirt-go bindings. In looking at the code though I had some concerns about the way some of the APIs had been exposed to Go. In the libvirt C library there are a set of APIs which accept or return a “virTypedParameterPtr” array, for cases where we need APIs to be easily extensible to handle additions of an arbitrary extra data fields in the future. The way these APIs work is one of the most ugly and unpleasant parts of the C API and thus in language bindings we never expose the virTypedParameter concept directly, but instead map it into a more suitable language specific data structure. In Perl and Python this meant mapping them to hash tables, which gives application developers a language friendly way to interact with the APIs. Unfortunately the current Go API bindings have exposed the virTypedParameter concept directly to Go and since Go does not support unions, the result is arguably even more unpleasant in Go than it already is in C. The second concern is with the way events are exposed to Go – in the C layer we have different callbacks that are needed for each event type, but have one method for registering callbacks, requiring an ugly type cast. This was again exposed directly in Go, meaning that the Go compiler can’t do strong type checking on the callback registration, instead only doing a runtime check at time of event dispatch. There were some other minor concerns about the Go API mapping, such as fact that it needlessly exposed the “vir” prefix on all methods & constants despite already being in a “libvirt” package namespace, returning of a struct instead of pointer to a struct for objects. Understandably the current maintainer had a desire to keep API compatibility going forward, so the decision was made to fork the existing libvirt-go codebase. This allowed us to take advantage of all the work put in so far, while fixing the design problems, and also extending them to have 100% API coverage. The idea is that applications can then decide to opt-in to the new Go binding at a point in time where they’re ready to adapt their code to the API changes.

For users of the existing libvirt Go binding, converting to the new official libvirt Go binding requires a little bit of work, but nothing too serious and will simplify the code if using any of the typed parameter methods. The changes are roughly as follows:

  • The “VIR_” prefix is dropped from all constants. eg libvirt.VIR_DOMAIN_METADATA_DESCRIPTION because libvirt.DOMAIN_METADATA_DESCRIPTION
  • The “vir” prefix is dropped from all types. eg libvirt.virDomain becomes libvirt.Domain
  • Methods returning objects now return a pointer eg “* Domain” instead of “Domain”, allowing us to return the usual “nil” constant on error, instead of a struct with no underlying libvirt connection
  • The domain events DomainEventRegister method has been replaced by a separate method for each event type. eg DomainEventLifecycleRegister, DomainEventRebootRegister, etc giving compile time type checking of callbacks
  • The domain events API now accepts a single callback, instead of taking a pair of callbacks – the caller can create an anonymous function to invoke multiple things if required.
  • Methods accepting or returning typed parameters now have a formal struct defined to expose all the parameters in a manner that allows direct access without type casts and enables normal Go compile time type checking. eg the Domain.GetBlockIOTune method returns a DomainBlockIoTuneParameters struct
  • It is no longer necessary to use Go compiler build tags to access functionality in different libvirt versions. Through the magic of conditional compilation, the binding will transparently build against every libvirt version from 1.2.0 through 3.0.0
  • The binding can find libvirt via pkg-config making it easy to compile against a libvirt installed to a non-standard location by simply setting “PKG_CONFIG_PATH”
  • There is 100% coverage of all APIs [1], constants and macros, verified by the libvirt CI system, so that it always keeps up with GIT master of the Libvirt C library.
  • The error callback concept is removed from the binding as this is deprecated by libvirt due to not being thread safe. It was also redundant since every method already directly returns an error object.
  • There are now explicit types defined for all enums and methods which take flags or enums now use these types instead of “uint32”, again allowing stronger compiler type checking

With the exception of the typed parameter changes adapting existing apps should be a largely boring mechanical conversion to just adapt to the renames.

Again, without the effort put in by Alex Zorin and Kyle Kelly & other community contributors, creation of these new libvirt-go bindings would have taken at least 4-5 weeks instead of the 2 weeks effort put into this. So there’s a huge debt owed to all the people who previously contributed to libvirt Go bindings. I hope that having these new bindings with guaranteed 100% API coverage will be of benefit to the Go community going forward.

[1] At time of writing this is a slight lie, as i’ve not quite finished the virStream and virEvent callback method bindings, but this will be done shortly.