Improved translation po file handling by ditching gettext autotools integration

Posted: November 29th, 2018 | Filed under: Coding Tips, Fedora, libvirt, Virt Tools | Tags: , , , | No Comments »

The libvirt library has long provided translations of its end user facing strings, which largely means error messages and console output from command line tools / daemons. Since libvirt uses autotools for its build system, it naturally used the standard automake integration provided by gettext for handling .po files. The libvirt.pot file with master strings is exported to Zanata, where the actual translation work is outsourced to the Fedora translation team who support up to ~100 languages. At time of writing libvirt has some level of translation in ~45 languages.

With use of Zanata, libvirt must periodically create an updated libvirt.pot file and push it to Zanata, and then just before release it must pull the latest translated .po files back into GIT for release.

There have been a number of problems with this approach which have been annoying us pretty much since the start, and earlier this year it finally became too much to bear any longer.

  • The per-language translation files stored in git contain source file name and line number annotations to indicate where each translatable string originates. Since the translation files are not re-generated on every single source file changes, the file locations annotations becomes increasingly out of date after every commit. When the translation files are updated 98% of the diff is simply changing source file locations leading to a very poor signal/noise ratio.
  • The strings in the per-language translation files are sorted according to source filename. Thus when code is moved between files, or when files are renamed, the strings in the updated translation files all get needlessly reordered, again leading to a poor signal/noise ratio in diffs.
  • Each language translation file contains every translatable string even those which do not have any translation yet. This makes sense if translators are working directly against the .po files, but in libvirt everything is done via the Zanata UI which already knows the list of untranslated strings.
  • The per-language translation files grow in size over time with previously used message strings appended to the end of the file, never discarded by the gettext tools. This again makes sense if translators are working directly against .po files, but Zanata already provides a full translation memory containing historically used strings.
  • Whenever ‘make dist’ is run the gettext autotools integration will regenerate the per-language translation files. As a result of the three previous points, every time a release is made there’s a giant commit more than 100MB in size that contains diffs for translated files which are entirely noise and no signal.

One suggested approach to deal with this is to stop storing translations in GIT at all and simply export them from Zanata only at time of ‘make dist’. The concern with this approach is that the GIT repository no longer contains the full source for the project in a self-contained manner. ‘make dist‘ now needs a live network connection to the Zanata servers. If we were to replace Zanata with a new tool in the future (Zanata is already a replacement for the previously used Transifex), we would potentially loose access to translations for old releases.

With this in mind we decided to optimize the way translations are managed in GIT.

The first easy win was to simply remove the master libvirt.pot file from GIT entirely. This file is auto-generated from the source files and is out of date the moment any source file changes, so no one would ever want to use the stored copy.

The second more complex step was to minimize and canonicalize the per-language translation files. msgmerge is used to take the full .po file and strip out the source file locations and sort the string alphabetically. A perl script is then used to further process the content dropping any translations marked as “fuzzy” and drop any strings for which there is no translated text available. The resulting output is still using the normal .po file format but we call these ‘.mini.po‘ files to indicate that they are stripped down compared to what you’d normally expect to see.

The final step was to remove the gettext / autotools integration and write a custom Makefile.am to handle the key tasks.

  • A target ‘update-mini-po‘ to automate the process of converting full .po files into .mini.po files. This is used when pulling down new translations from Zanata to be stored in git before release.
  • A target ‘update-po’ to automate the inverse process of converting .mini.po files back into full .po files. This is to be used by anyone who might need to look at full language translations outside of Zanata.
  • An install hook to generate the binary .gmo files from the .mini.po files and install them into /usr/share/locale for use at runtime. This avoids the need to ship the full .po files in release tarballs.
  • A target ‘zanata-push‘ to automate the process of re-generating the libvirt.pot file and uploading it to Zanata.
  • A target ‘zanata-pull‘ to automate the process of pulling new translations down from zanata and then triggering ‘update-mini-po

After all this work was completed the key benefits are

  • The size of content stored in GIT was reduced from ~100MB to ~18MB.
  • Updates to the translations in GIT now produce small diffstats with a high signal/noise ratio
  • Files stored in GIT are never changed as a side effect of build system commands like ‘make dist’
  • The autotools integration is easier to understand

while not having any visible change on the translators using Zanata. In the event anyone does need to see full translation languages outside of Zanata there is an extra step to generate the full .po files from the .mini.po files but this is countered by the fact that the result will be fully up to date with respect to translatable strings and source file locations.

I’d encourage any project which is using gettext autotools integration, while also outsourcing to a system like Zanata, to consider whether they’d benefit from taking similar steps to libvirt. Not all projects will get the same degree of space saving but diffstats with good signal/noise ratios and removing side effects from ‘make dist’ are wins that are likely desirable for any project.

 

ANNOUNCE: gtk-vnc 0.9.0 release

Posted: August 17th, 2018 | Filed under: Fedora, Gtk-Vnc, libvirt, Virt Tools | No Comments »

I’m pleased to announce a new release of GTK-VNC, version 0.9.0. This is a cleanup/modernization release. Note that the next release (1.0.0) will drop support for GTK-2

  • Requires gnutls >= 3.1.18
  • Requires libgcrypt >= 1.5.0
  • Requires glib2 >= 2.42.0
  • Use libgcrypt for DES routines
  • Add missing cipher close calls in ARD auth
  • Check for errors after reading mslogon params
  • Support newer UltraVNC mslogon auth type code
  • Avoid divide by zero in mslogin auth from bogus params
  • Re-allow python2 accidentally blocked when removing python binding

Thanks to all those who reported bugs and provides patches that went into this new release.

ANNOUNCE: gtk-vnc 0.8.0 release

Posted: August 1st, 2018 | Filed under: Fedora, Gtk-Vnc, libvirt, Virt Tools | No Comments »

I’m pleased to announce a new release of GTK-VNC, version 0.8.0. This is a small maintenance release tidying up some loose ends

  • Deleted the python2 binding in favour of GObject introspection
  • Pull in latest keycodemapdb content
  • Disable/fix -Wcast-function-type warnings

Thanks to all those who reported bugs and provides patches that went into this new release.

ANNOUNCE: virt-viewer 7.0 release

Posted: July 27th, 2018 | Filed under: Fedora, libvirt, Virt Tools | Tags: , | No Comments »

I am happy to announce a new bugfix release of virt-viewer 7.0 (gpg), including experimental Windows installers for Win x86 MSI (gpg) and Win x64 MSI (gpg). The virsh and virt-viewer binaries in the Windows builds should now successfully connect to libvirtd, following fixes to libvirt’s mingw port.

Signatures are created with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)

All historical releases are available from:

http://virt-manager.org/download/

Changes in this release include:

  • Require spice-gtk >= 0.35
  • Clarify error message when no running VM is found
  • Improve check for libgovirt requirements
  • Support “-” as a URI for input connection file
  • Remove support for spice controller interface
  • Misc man page fixes
  • Lookup win32 translations relative to install dir
  • Position connect window in center not top-left
  • Misc fixes for ovirt foreign menu support

Thanks to everyone who contributed towards this release.

CPU model configuration for QEMU/KVM on x86 hosts

Posted: June 29th, 2018 | Filed under: Fedora, libvirt, OpenStack, Security, Virt Tools | Tags: , , , , , , , | 15 Comments »

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Skylake-Server
Skylake-Server-IBRS
Intel Xeon Processor (Skylake, 2016)
Skylake-Client
Skylake-Client-IBRS
Intel Core Processor (Skylake, 2015)
Broadwell
Broadwell-IBRS
Broadwell-noTSX
Broadwell-noTSX-IBRS
Intel Core Processor (Broadwell, 2014)
Haswell
Haswell-IBRS
Haswell-noTSX
Haswell-noTSX-IBRS
Intel Core Processor (Haswell, 2013)
IvyBridge
IvyBridge-IBRS
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
SandyBridge
SandyBridge-IBRS
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere
Westmere-IBRS
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Nehalem
Nehalem-IBRS
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Penryn
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Conroe
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

pcid
Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
spec-ctrl
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

EPYC
EPYC-IBPB
AMD EPYC Processor (2017)
Opteron_G5
AMD Opteron 63xx class CPU (2012)
Opteron_G4
AMD Opteron 62xx class CPU (2011)
Opteron_G3
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
Opteron_G2
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
Opteron_G1
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

ibpb
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
virt-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
amd-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
amd-no-ssb
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

qemu32
qemu64
QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

kvm32
kvm64
Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
486
athlon
phenom
coreduo
core2duo
n270
pentium
pentium2
pentium3
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Named model
   <cpu mode='custom'>
       <model>Westmere</model>
   </cpu>

With feature customization:

   <cpu mode='custom'>
       <model>Westmere</model>
       <feature name="pcid" policy="require"/>
       ...
   </cpu>