Gerrymander tips & tricks – viewing comments on a gerrit review without bots

Posted: May 21st, 2014 | Filed under: Coding Tips, Fedora, OpenStack, Virt Tools | Tags: , , , | 1 Comment »

Those who saw my original announcement of gerrymander will know that OpenStack uses the Gerrit tool for code review. One of the efforts made by the Nova (Compute) team during the last release cycle was to increase the automated testing coverage for hypervisors supported by Nova. Previously we only tested Libvirt on KVM, but now there is testing for VMWare, HyperV and XenAPI too. This is great for our code quality, but far less great for the usability of the gerrit web interface. Consider a fairly typical change I needed to look at today, the web UI for gerrit shows many comments. When doing code review of new patch versions though, I mostly don’t care about comments from the “robot” accounts doing automated testing – just the pass / fail status. I’m far more interested in comments made by humans, so I can see whether they’ve been addressed by new patch versions, or whether I agree/disagree with the comments. The gerrit web UI though doesn’t provide any way to separate this info – it just shows a huge list with human & robot comments combined. The ever growing number of robots used by OpenStack for testing mean the human comments are drowned out in the noise

Gerrit comments

This problem is one of the reasons I created the gerrymander tool. The ‘comments’ command has the ability to display all comments on a change, while filtering out those made by known bot accounts. Now there’s a little setup required, by which I mean you must edit $HOME/.gerrymander to list all the bot accounts. For those using OpenStack though, I’ve uploaded a sample config file, which lists all bot accounts, to the wiki. With this config file in effect I can now look at this same change number 87329

# gerrymander comments --color 87329
Change https://review.openstack.org/87329 (I06c2d9930e0f36a0d7057b6a0f5c9c591caac43f)

  libvirt: Use os_command_line when kernel_id is set


Patch 1 (c8d1cf4559174777be4b42b68379cf78ccd8f382)

  Jay Pipes: (jaypipes) nova/virt/libvirt/driver.py:3203

  You can condense lines 3202-3203 to:   guest.os_cmdline =
  img_props.get('os_command_line')


  Jay Pipes: (jaypipes)

  Patch Set 1: I would prefer that you didn't merge this  (1 inline comment)
  Suggestion for consolidation inline, otherwise, looks good.


  Vladan Popovic: (vladan)

  Patch Set 1:  Thanks for the review Jay.


Patch 2 (58323a7271e0f09689b94bc8632204c179934add)

  Daniel Berrange: (berrange) /COMMIT_MSG:18

  "AMI images" is a pretty obscure/non-obvious term.  This is about images with
  an explicit boot kernel set, so please say that explicitly and void the term
  "AMI"


  Daniel Berrange: (berrange)

  Patch Set 2: I would prefer that you didn't merge this  (1 inline comment)


  Jay Pipes: (jaypipes)

  Patch Set 2:  recheck bug 1307344

Patch 3 (8f3ced280d80eaee69029f4bf2a193bcba749284)

  garyk: (garyk) nova/tests/virt/libvirt/test_libvirt.py:1961

  please use self.assertIsNone


  Mohammed Naser: (mnaser)

  Patch Set 3: Looks good to me, but someone else must approve


  garyk: (garyk)

  Patch Set 3: I would prefer that you didn't merge this  (1 inline comment)
  code looks good. one minor nit


Patch 4 (5698a6c7c5195bd1d3ecf01af6fb65c19ae8a990)

  No  comments


Patch 5 (a87773776a0d39aa5591fde6caa65b92f5b17a6d)

  Jay Pipes: (jaypipes)

  Patch Set 5: Looks good to me, but someone else must approve  ++, thx Vladlan!


  Mohammed Naser: (mnaser)

  Patch Set 5: Looks good to me, but someone else must approve  recheck bug
  1307344


  Daniel Berrange: (berrange)

  Patch Set 5: I would prefer that you didn't merge this  Jenkins failures are
  genuine bugs

  Sreeram Yerrapragada: (syerrapragada)

  Patch Set 5:  recheck-vmware


Patch 6 (a1eb12f0c8281c0b01dde00b8225d742b8e832e3)

  garyk: (garyk)

  Patch Set 6: Code-Review+1

The obvious limitation here is that the file comments are not shown inline with the code, but overall I think this is still much more useful than the gerrit web UI display of comments. I can still go look at the web UI if I do need to see the context of certain inline comments. So gerrymander comments report complements the web UI nicely.

Announce: gerrymander 1.1 “Compulsory serving of asparagus at breakfast” – a client API and command line tool for gerrit

Posted: May 15th, 2014 | Filed under: Coding Tips, Fedora, OpenStack | No Comments »

Following quickly on from my announcement of release 1.0, I’m pleased to announce release 1.1 of gerrymander. Gerrymander provides a python 2 and python 3 compatible API for extracting interesting information from gerrit about changes, along with a command line tool exposing several reports. If you use gerrit for any non-trivially sized project, this tool has the potential to significantly improve your review workflow efficiency. For those working on OpenStack, I’ve created a sample configuration file that includes all the core openstack teams and bot accounts.

Thanks to contributions from a number of interested parties this release has a bunch of small fixes included:

  • Fix syntax for passing username to ssh
  • Fix syntax error in patch review stats reprot
  • Fix misc typos in help / example configs
  • Ignore unexpected change type
  • Expand ~ in config file paths
  • Avoid duplicate names in AUTHORS file

You can get it from pypi

 # pip install gerrymander

Or straight from GitHub

 # git clone git://github.com/berrange/gerrymander.git

Announce: gerrymander 1.0 “A dachshund named Colin” – a client API and command line tool for gerrit

Posted: May 9th, 2014 | Filed under: Coding Tips, Fedora, OpenStack, Virt Tools | Tags: , , , | 3 Comments »

This blog post is to announce the first release of a new project, gerrymander, which I’ve mentioned on IRC in passing a few times. The 32,000 ft summary is that it provides a set of python (2 & 3 compatible) APIs and command line tool for extracting and presenting information from gerrit. You can get it from pypi

 # pip install gerrymander

Or straight from GitHub

 # git clone git://github.com/berrange/gerrymander.git

If you’re the impatient type, then go to the README file which provides a quick start guide to using the tool.

For details on why/how I built gerrymander, read onwards….

Background motivation

The OpenStack project uses Gerrit for review of pretty much all contributions to the project. Not wishing to get into a discussion of the merits of using a Gerrit based workflow, I’ll just say that with large volume of changes going through some sub-projects, such as Nova, the Gerrit web interface really starts to show its limitations. As a result a number of OpenStack contributors have developed cli tools for extracting information from Gerrit and presenting it in more practical formats.

  • gerrit_view – created by Josh Harlow, it provides a general purpose query tool and a interactive TUI for live monitoring of changes.
  • reviewtodo – created by Russell Bryant, it generates reports which attempt to prioritize changes such that the most “important” ones are presented at the top of the todo list.
  • reviewstats – created by Russell Bryant, it generates reports which summarize the reviewing activity of all contributors across the project, and reports which attempt to identify how efficiently reviews are being handled.

I’ve previously contributed to the gerrit_view project and in the second half of the IceHouse dev cycle, I turned off gerrit email alerts and stopped using the gerrit web UI index pages for identifying changes needing review. Instead I exclusively use the qgerrit command line tool to identify changes that affect the libvirt driver which need attention from myself. This had a significant positive impact on my productivity when using gerrit, so I started looking at other gerrit client tools and thinking about what further reports or information I might wish to get from gerrit. It became apparent that the tools people are writing have significant overlap / duplication of code for dealing with basic interaction with gerrit. There are features in some tools (eg caching of gerrit queries in reviewstats) which would be useful to the other tools, but since these are all designed as singe-purpose standalone tools there’s not really much scope for sharing functionality.

Introducing the “gerrymander” project

Clearly what was needed was a new “standard” for building gerrit command line tools, so enter “gerrymander“. The gerrymander project is not simply another command line tool, rather it is intended to provide a collection of python modules / APIs to facilitate the creation of arbitrary gerrit command line reports/tools. With that goal in mind the gerrymander package provides a set of modules, for both Python 2 and Python 3:

  • gerrymander.client – module providing a class for connecting to the gerrit server over SSH, running the ‘gerrit’ tool and passing the results to a callback for processing. As well as the standard “live” client, there is a caching client which stores the results from ‘gerrit’ in local files. This means that expensive queries (eg querying the entire history of all changes ever) won’t inflict repeated denial of service attacks on the server.
  • gerrymander.model – module providing a set of classes that represent the JSON schemas returned by the ‘gerrit’ tool as Python objects. This means you’re not simply blindly accessing untyped dictionary fields. Many of the classes have helper APIs against them to allow their information to be accessed in interesting ways.
  • gerrymander.format – module providing a few helper APIs for formatting data to present to the user. For example, a way to produce coloured text for ANSI capable terminals, or to format time deltas / dates in more user friendly ways (ie “4 days ago” instead of “May 5, 2014”).
  • gerrymander.operation – module providing a class for each operation supported by the ‘gerrit’ tool. This provides a slightly higher level way to utilize to the gerrymander.client module classes. This takes care of obscure oddities such as the need to re-execute ‘gerrit query’ multiple times, since it refuses to return more than 500 results at a time.
  • gerrymander.reports – module providing a set of classes for extracting interesting information from gerrit. Each report class will execute one of more operations against gerrit, post-process the data from the query, and then return an object with the structured results. This is where all the really interesting functionality lives.
  • gerrymander.commands – module providing the command line interface to the reports. It takes the report output and formats it as text, xml or json. A configuration file is used to customize default behaviour, such as which fields are visible, defining command lines, project names, usernames of bots, etc.

The actual “gerrymander” command line tool is designed as a multi-call binary – ie it has a number to sub-commands you can execute, each with their own set of options. I won’t repeat what’s already covered in the README file in this blog post, rather just see the help message for the list of reports/commands I’ve written. So far I’ve targeted the functionality provided by the 3 pre-existing projects I mentioned above:

$ gerrymander --help
usage: gerrymander [-h] [-c CONFIG] [-d] [-q]

{watch,todo-noones,todo-anyones,todo-mine,todo-others,patchreviewstats,openreviewstats,changes,comments,changes-nova-specs}
...

Gerrymander client

positional arguments:
{watch,todo-noones,todo-anyones,todo-mine,todo-others,patchreviewstats,openreviewstats,changes,comments,changes-nova-specs}
watch Watch incoming changes
todo-noones List of changes no one has looked at yet
todo-anyones List of changes anyone has looked at
todo-mine List of changes I've looked at before
todo-others List of changes I've not looked at before
patchreviewstats Statistics on patch review approvals
openreviewstats Statistics on open patch reviews
changes Query project changes
comments Display comments on a change
changes-nova-specs Changes in Nova SPECS

optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Override config file (default
/home/berrange/.gerrymander)
-d, --debug Display debugging information
-q, --quiet Supress display of warnings

As one example, taking the simplest / most generic ‘changes’ command, lets see all changes that I’ve written but abandoned

$ gerrymander changes --owner berrange --status abandoned
Changes
-------

+-----------+------------------------------------+----------+-----------------------------------+----------+----------+------------------+
| Status    | URL                                | Owner    | Subject                           | Created  | Updated  | Approvals        |
+-----------+------------------------------------+----------+-----------------------------------+----------+----------+------------------+
| ABANDONED | https://review.openstack.org/8528  | berrange | Allow CPU model to be specifie... | 694 days | 688 days | v=-1,1 c=-1      |
| ABANDONED | https://review.openstack.org/9356  | berrange | Fix use of uninitialized varia... | 673 days | 672 days | v=-1 c=1         |
| ABANDONED | https://review.openstack.org/10871 | berrange | Revert "Handle InstanceNotFoun... | 641 days | 641 days | v=-1             |
| ABANDONED | https://review.openstack.org/18869 | berrange | Merge LibvirtOpenVswitchVirtua... | 490 days | 486 days |                  |
| ABANDONED | https://review.openstack.org/19127 | berrange | Merge all VIF classes into one... | 486 days | 463 days | v=1,1 c=1        |
| ABANDONED | https://review.openstack.org/19214 | berrange | Make it possible to set nova o... | 485 days | 483 days | v=1 c=1,-1       |
| ABANDONED | https://review.openstack.org/29784 | berrange | Make devstack work on Fedora 1... | 353 days | 336 days | v=1 c=-1,1,-1,-1 |
| ABANDONED | https://review.openstack.org/58494 | berrange | Increase min required libvirt ... | 164 days | 79 days  | v=-1 c=-1        |
| ABANDONED | https://review.openstack.org/76902 | berrange | Fix quoting of username in pol... | 70 days  | 63 days  | v=1 c=-1         |
+-----------+------------------------------------+----------+-----------------------------------+----------+----------+------------------+

Some things to note about that are unique in comparison to other gerrit client tools I’ve mentioned above that can do the same kind of query

  • The gerrit query is cached for 5 minutes, so if you re-run to change the display options (eg which fields are shown) it won’t hit the gerrit server again, unless you change the actual args to the query.
  • The default output mode is formatted text, but you can ask for the data in XML or JSON documents, allowing easier parsing by further downstream tools
  • The configuration file lets you set defaults for all of the command line parameters. So you can hide fields you don’t care about, or make fields wider, and more
  • The configuration file lets you define command aliases. So if you have a number of different queries you run, you can define new commands (eg ‘my-abandoned-changes’) which record all the query parameters for the ‘changes’ command. This avoids the need to create shell wrapper scripts around the gerrymander command for common queries
  • The command and/or report are accessible via the Python API, so if you want direct access to the raw data you can use the API instead of parsing the text/xml/json outputs.
  • Optional colourization of fields (eg +1’s / +2’s in green, -1’s / -2’s in red)

If any of this sounds interesting to you, pip install the package and try it out. If you want to contribute patches for more interesting reports, then the code is all up on github at the URL mentioned earlier.

One final important point is that this tool is written such that it has zero knowledge about OpenStack. It is intended to be useful to any project which is using Gerrit for their code review. As such all the projects specific knowledge, such as list of project names, team members, bot accounts, is isolated in the configuration file. So one thing I need to do to ease first time users is to upload a sample configuration file for OpenStack that includes all the different projects / teams / bots OpenStack has.

EDIT: Use this config file with OpenStack

Improving libvirt firewall performance

Previously I talked about improving the performance of logging in libvirt. This article will consider the second, even bigger, performance problem I’ve tackled over the past month or so, the management of firewall rules.

There are three areas of libvirt which have a need to interact with the host firewall

  • Virtual networks – add rules to iptables/ip6tables to control IPv4/IPv6 forwarding and masquerading of traffic from virtual networks (eg the virbr0 device)
  • MAC filtering – adds rules to ebtables to reject spoofed MAC addresses from guest NICs (pretty much obsoleted by general filtering)
  • General network filtering – adds almost arbitrary rules to ebtables/iptables/ip6tables to filter guest NIC traffic

When first written all of these areas of libvirt code would directly invoke the iptables/ip6tables/ebtables command line tools to add/remove the rules needed. Then along came firewalld. While we could pretend firewalld didn’t exist and continue invoking the CLI tools directly, this would end in tears when firewalld flushes chains / tables. When we first realized that libvirt needed to add firewalld support, time was short wrt the forthcoming Fedora release schedule which would include firewalld by default. Thus an expedient decision was made, simply replacing any calls to iptables/ip6tables/ebtables with a call to ‘firewall-cmd –passthrough’ instead. This allowed libvirt to integrate with firewalld, so its rules would not get lost, with the minimum possible number of code changes to libvirt.

Unfortunately, it quickly became apparent that using ‘firewall-cmd’ imposes quite a severe performance penalty. Testing with libvirt’s virtual network feature showed that to start 20 virtual networks took 3 seconds with direct iptables usage and 42 seconds with “firewall-cmd –passthrough”. IOW using firewall-cmd is over x10 slower than direct invocation. The network filtering code is similarly badly affected, running our integration test suite went from 28 seconds to 479 seconds, almost x18 slower with firewall-cmd.

We’ve lived with this performance degradation in libvirt for a little while now, but it was clear that this was never going to be acceptable in the long term. This kind of performance bottleneck in firewall manipulation really hurts applications using libvirt, slowing down guest creation in OpenStack noticeably, and even making libvirtd much slower to startup. With a little slack time in my schedule I did some microbenchmarks comparing repeated invocation of firewall-cmd to a script that instead directly invoked the underlying firewalld dbus API repeatedly. While I don’t have the results anymore, it suffices to say that, the tests strongly showed the overhead lay in the firewall-cmd tool, as opposed to the DBus API.

Based on the test results it was clear that libvirt needed to switch to talking to firewalld directly via DBus, instead of spawning an external program to do this indirectly. Even without the performance overhead of firewall-cmd this would be just good engineering practice, particularly wrt error handling, since parsing errors from stderr is truly horrible. The virtual network code had a lot of complexity in it to cope with the fact that applying any single firewall rule might fail and thus need a set of steps to rollback to the previous state of the firewall. This was nothing compared to the complexity of the network filtering code, which would dynamically generate hairy/scary shell scripts with conditionals to either report or ignore errors that occurred and to dynamically query existing rules to decide what to create next.

What libvirt needed was an internal API for interacting with the firewall which would magically talk to either the DBus API, or the iptables/ip6tables/ebtables tools directly, as appropriate for the host being managed. To further reduce the burden on users of the API, it would also to need to have the concept of “transactions”. That is a user of the API can define a set of rules that it wants applied, and if applying any of those rules fails, then a set of “rollback” steps would be applied to cleanup. Finally, it would need the ability to register hooks which query the rule state and dynamically changed later rules.

It took a while to figure out the right design for this libvirt internal firewall API, but once done the task of converting all libvirt code remained ahead. The virtual network and MAC filtering code was fairly easy to convert over the space of a day or so. The same could not be said of the network filtering code. The task of changing it from dynamically writing shell scripts, to using the new firewall API lasted days, which turned into weeks, eventually turning into the best part of a month’s work. What saved me from insanity was that the original creator of this firewall code had the good sense to create a comprehensive set of integration tests for it. This made it much easier to identify places where I’d screwed up and has dramatically reduced the number of regressions I am likely to have created. Hopefully there are no regressions at all.

At the end of this months work, as well as the original integration tests, libvirt now also has a large number of new unit tests which can validate the operation of our firewall code. These will demonstrate that future changes in this area, do not alter the commands used to build the firewall, complementing the integration tests which validate the final firewall ruleset. Now back to the performance. Creating 20 virtual networks originally took 29 seconds with firewall-cmd, now takes only 3 seconds with firewalld DBus APIs. So there is no measurable difference from direct invocation of iptables there. Running the network filter integration test suite, which took 479 seconds with firewall-cmd, now only takes 37 seconds with firewalld DBus APIs. This is slightly slower than direct ebtables invocation which took 29 seconds, but I think that delta is now at an acceptably low level.

This code is all merged into current libvirt GIT and will be included in the 1.2.4 release due out next week and heading to Fedora rawhide immediately thereafter. So come Fedora 21, anything firewall related in libvirt should be noticeably faster. If any application or script you work on happens to make use of firewall-cmd, then I’d strongly recommend changing it to use the DBus API directly. I don’t know why firewall-cmd is so terribly slow, but as it is today, I can’t recommend anyone using it if they have a non-trivial number of rules to define.

Improving libvirt logging performance

Posted: May 2nd, 2014 | Filed under: Coding Tips, Fedora, libvirt, OpenStack | Tags: , , , , | No Comments »

Over the past month I’ve been working on a few different areas of libvirt to address some performance bottlenecks the code has. This article is about improvements to the logging subsystem in libvirt, primarily to reduce the CPU overhead that the libvirtd daemon imposes on virtualization hosts. This might not seem like a big deal, but when you have many 100’s or 1000’s of containers packed into a machine, libvirtd CPU usage could start to become quite noticable at times.

Over the years the libvirt codebase has gained more and more debugging statements which have been worth their weight in gold when it comes to troubleshooting problems in deployments. Through the configuration files or environment variables it is possible to tailor logging to output to stderr, plain files, syslog or, most recently, the systemd journal, as well as filters to restrict which files will emit log messages. One less well known feature though is that libvirt has an internal logging ring buffer which records every single log message, no matter what logging output / filter settings are configured. It was possible to send libvirt a signal to dump this ring buffer to stderr, and it would also be dumped automatically upon crash.

For some reason, we had never previously thought to measure the impact of our logging framework on the performance of libvirt. Investigating an unrelated performance problem one day I noticed that oprofile was showing unexpectedly high counts against asprintf/memcpy/malloc functions in an area of code that I wasn’t expecting to be utilizing those functions. I recently had time to investigate this in more detail and discovered that the cause was the internal logging ring buffer. We should have realized long ago that the task of asprintf formatting the log messages and copying them into the ring buffer, would have a non-negligible CPU overhead as the number of log statements in our codebase increased.

To enable the overhead to be more precisely quantified, I wrote a short benchmark program for libvirt which simulated the behaviour of the libvirtd main thread running the event loop with 500 virtual machines / containers all doing simultaneous virtual console I/O. With the existing code, running ~50,000 iterations of this simulated event loop took 1 minute 40 seconds. After a quick hack to turn our logging code into a complete no-op, the demo program completed in only 3 seconds. IOW, the logging code was responsible for 97% of the CPU time consumed by the benchmark. This is somewhat embarrassing, to say the least :-)

Obviously we can not leave the logging code as a complete no-op permanently, since it is incredibly valuable to have it. What we need is for it to be as close to a no-op as possible, unless a specific logging output / filter is actually enabled. With this goal, unfortunately, it became clear that the idea of an always-on internal ring buffer for collecting all logs just had to go. There is simply no way to reduce the overhead of this ring buffer, since it inherently has to asprintf the log messages, which is where the majority of the performance hit comes from. At first I just tweaked the way the ring buffer operated so that it would only collect messages that were already generated for an explicitly configured log output. This reduced the runtime of the benchmark program to 4.6 seconds which is pretty decent. After doing this though, we came to the conclusion that the ring buffer as a concept was now pretty much worthless, since the messages it contains were already being output to another log target. So in the end I just deleted the ring buffer code entirely.

The remaining delta of 1.6 seconds over the base line 3 second execution time was potentially a result of two things. First, for each log message generated, there would be multiple string comparisons against the filename of the code emitting it, in order to determine if log messages for that file were requested for output to a log target. Second, the logging code is inside a mutex protected block to ensure serialization of outputs. The latter did not affect the benchmark program since it was single threaded, and uncontended mutex locks on Linux have negligible overhead. It would, however, affect real world usage of libvirtd since the daemon is heavily threaded. To address both of these potential performance problems, we introduced the idea of a log “category” object which has to be statically declared in each source file that intends to emit log messages. This category object has internal state which determines whether log messages from that source file will be sent on to a log output or just dropped. Without going into too much detail, the key benefit this brought, is that we can determine if any single log message needs to be output with only 2 lockless, integer comparisons in the common case. IOW this removed the repeated string comparisons (only need to be done once now) and removed the mutex locking. With this improvement the logging benchmark takes only 3 seconds. IOW there is no appreciable overhead from logging, except for those files where it is explicitly requested.

This performance improvement was released last month in the 1.2.3 release of libvirt, and thus in Fedora rawhide repositories.

EDIT: BTW, if you have an older libvirt you can edit /etc/libvirt/libvirtd.conf, set “log_buffer_size=0” and then restart libvirtd, which will disable the internal ring buffer and thus avoid the worst of the CPU wastage.