Libvirt: programming language and document format consolidation
Since the project’s creation about 14 years ago, libvirt has grown enormously. In that time there has been a lot of code refactoring, but these were always fairly evolutionary changes; there has been little revolutionary change of the overall system architecture or some core technical decisions made early on. This blog post is one of a series examining recent technical decisions that can be considered more revolutionary to libvirt. This was the topic of a talk given at KVM Forum 2019 in Lyon.
Historical usage
In common with many projects which have been around for a similar time frame, libvirt has accumulated a variety of different programming languages and document formats, for a variety of tasks.
The main library is written in C, but around that there is the autotools build system which adds shell, make, autoconf, automake, libtool, m4, and other utilities like sed, awk, etc. Then there are many helper scripts used for code generation or testing which are variously written in shell, perl or python. For documentation, there are man pages written in POD, web docs written in HTML5 with an XSL templating system, and then some docs written in XML which generate HTML, and some docs generated from source code comments. Finally there are domain specific languages such as XDR for the RPC system.
There are a couple of issues with the situation libvirt finds itself in. The large number of different languages and formats places a broad knowledge burden on new contributors to the project. Some of the current choices are now fairly obscure & declining in popularity, thus not well known by potential project contributors. For example, Markdown and reStructuredText (RST) are more commonly known than Perl’s POD format. Developers are more likely to be competent in Python than in Perl. Some of the languages libvirt uses are simply too hard to deal with, for example it is a struggle to find anyone who can explain m4 or enjoys using it when writing configure scripts for autoconf.
Ultimately the issues all combine to have a couple of negative effects on the project. They drive away potential new contributors due to their relative obscurity. They reduce the efficiency of existing contributors due to their poor suitability for the tasks they are applied to.
Intended future usage
With the above problems in mind, there is a desire to consolidate and update the programming languages and document formats that libvirt uses. The goals are to reduce the knowledge burden on contributors, and simplify the development experience to make it easier to get the important work done. Approximately the intention is to get to a place where libvirt uses C for the core library code, Python for all helper scripts, RST for all documentation, and Meson for the build system. This should eliminate the use of shell, shell utilities like sed/awk, perl, POD, XSL, HTML5, m4, make, autoconf, automake. A decision about which static website builder to use to replace XSL hasn’t been made yet, but Sphinx and Pelican are examples of the kind of tools being considered. Getting to this desired end point will take a prolonged effort, but setting a clear direction now will assist contributors in understanding where to best spend time.
To kickstart this consolidation at the end of last year almost all of the Perl build scripts were converted to Python. This was a manual line-by-line conversion, so the structure of the scripts didn’t really change much, just the minimal language syntax changes in most cases. There are still a handful of Perl scripts remaining, which are some of the most complicated ones. These really need a rewrite, rather than a plain syntax conversion, so will take more time to bring over to Python. The few shell scripts have also been converted to Python, with more to follow. After that all the POD man pages were fed through a automated conversion pipeline to create RST formatted man pages. In doing this, the man pages are now also published online as part of the libvirt website. Some of the existing HTML content has also been converted into RST.
The next really big step is converting the build system from autotools into Meson. This will eliminate some of the most complex, buggy and least understood parts of libvirt. It will be refreshing to have a build system which only requires knowledge of a single high level domain specific language, with Python for extensions, instead of autotools which requires knowledge of at least 6 different languages and the interactions between them.