For my day-to-day development work I currently have four separate physical servers, one old x86_64 server for file storage, two new x86_64 servers and one new aarch64 server. Even with a fast fibre internet connection, downloading the never ending stream of Fedora RPM updates takes non-negligible time. I also have cause to install distro chroots on a reasonably frequent basis for testing various things related to containers & virtualization, which involves yet more RPM downloads. So I decided it was time to investigate the setup of a local caching proxy for Fedora YUM repositories. I could have figured this out myself, but I fortunately knew that Matthew Booth had already setup exactly the kind of system I wanted, and he shared the necessary config steps that are outlined below.
The general idea is that we will reconfigure the YUM repository location on each machine needing updates to point to a local apache server, instead of the Fedora mirror manager metalink locations. This apache server will be setup using mod_proxy to rewrite requests to point to the offsite upstream download location, but will also be told to use a local squid server to access the remote site, thereby caching the downloads.
Apache setup
Apache needs to be installed, if not already present:
# dnf install httpd
A new drop-in config file addition for apache is created with two mod_proxy directives. The ProxyPass directive tells apache that any requests for http://<our-ip>/fedora/* should be translated into requests to the remote site http://dl.fedoraproject.org/pub/fedora/linux/*. The ProxyRemote directive tells apache that it should not make direct connections to the remote site, but instead use the local proxy server running on port 3128. IOW, requests that would go to dl.fedoraproject.org will instead get sent to the local squid server.
# cat > /etc/httpd/conf.d/yumcache.conf <<EOF
ProxyPass /fedora/ http://dl.fedoraproject.org/pub/fedora/linux/
ProxyPass /fedora-secondary/ http://dl.fedoraproject.org/pub/fedora-secondary/
ProxyRemote * http://localhost:3128/
EOF
The ‘fedora-secondary’ ProxyPass is just there for my aarch64 machine – not required if you are x86_64 only
The out of the box SELinux configuration prevents apache from making network requests, so it is necessary to toggle a SELinux boolean flag before starting apache
# setsebool httpd_can_network_relay=1
With that done, we can start apache and set it to run on future boots too
# systemctl start httpd.service
# systemctl enable httpd.service
Squid setup
Squid needs to be installed, if not already present:
# dnf install squid
The out of the box configuration for squid needs a few small tweaks to optimize it for YUM repo mirroring. The default cache replacement policy purges the least recently used objects from the cache. This is not ideal for YUM repositories – if the YUM update needs 100 RPMS downloading and only 95 of the fit in cache, by the time the last package is downloaded we’ll be pushing the first package out of cache again, which means the next machine will have cache miss. The LFUDA policy keeps popular objects in the cache regardless of size and optimizes the byte hit rate at expense of object hit rate. Some RPMS can be really rather large, so the default maximum object size of 4 MB is totally inadequate, increasing it to 8 GB is probably overkill but will ensure we always attempt to cache any RPM regardless of its size. The cache_dir directive is there to tell squid to use threads for accessing objects to give greater concurrency. The last two directives are critical telling squid not to cache the repomd.xml files whose contents change frequently – without this you’ll often YUM trying to fetch outdated repo data files which no longer exist
# cat >> /etc/squid/squid.conf <<EOF
cache_replacement_policy heap LFUDA
maximum_object_size 8192 MB
cache_dir aufs /var/spool/squid 16000 16 256 max-size=8589934592
acl repomd url_regex /repomd\.xml$
cache deny repomd
EOF
With that configured, squid can be started and set to run on future boots
# systemctl start squid.service
# systemctl enable squid.service
Firewall setup
If a firewall is present on the cache machine, it is necessary to allow remote access to apache. This can be enabled with a simple firewall-cmd instruction
# firewall-cmd --add-service=http --permanent
Client setup
With the cache server setup of the way, all that remains is to update the Fedora YUM config files on each client machine to point to the local server. There is a convenient tool called ‘fedrepos’ which can do this, avoiding the need to open an editor and change the files manually.
# dnf install fedrepos
# fedrepos baseurl http://yumcache.mydomain/fedora --no-metalink
NB on the aarch64 machine, we need to point to fedora-secondary instead
# fedrepos baseurl http://yumcache.mydomain/fedora-secondary --no-metalink
Replace ‘yumcache.mydomain’ with the hostname or IP address of the server running the apache+squid cache of course. If the cache is working as expected you should see YUM achieve 100 MB/s download speed when it gets a cache hit.
I pleased to announce the a new public release of libvirt-sandbox, version 0.6.0, is now available from:
http://sandbox.libvirt.org/download/
The packages are GPG signed with
Key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R)
The libvirt-sandbox package provides an API layer on top of libvirt-gobject which facilitates the cration of application sandboxes using virtualization technology. An application sandbox is a virtual machine or container that runs a single application binary, directly from the host OS filesystem. In other words there is no separate guest operating system install to build or manage.
At this point in time libvirt-sandbox can create sandboxes using either LXC or KVM, and should in theory be extendable to any libvirt driver.
This release contains a mixture of new features and bugfixes.
The first major feature is the ability to provide block devices to sandboxes. Most of the time sandboxes only want/need filesystems, but there are some use cases where block devices are useful. For example, some applications (like databases) can directly use raw block devices for storage. Another one is where a tool actually wishes to be able to format filesystems and have this done inside the container. The complexity with exposing block devices is giving the sandbox tools a predictable path for accessing the device which does not change across hypervisors. To solve this, instead of allowing users of virt-sandbox to specify a block device name, they provide an opaque tag name. The block device is then made available at a path /dev/disk/by-tag/TAGNAME
, which symlinks back to whatever hypervisor specific disk name was used.
The second major feature is the ability to provide a custom root filesystem for the sandbox. The original intent of the sandbox tool was that it provide an easy way to confine and execute applications that are installed on the host filesystem, so by default the host / filesystem is mapped to the sandbox / filesystem read-only. There are some use cases, however, where the user may wish to have a completely different root filesystem. For example, they may wish to execute applications from some separate disk image. So virt-sandbox now allows the user to map in a different root filesystem for the sandbox.
Both of these features were developed as part of a Google Summer of Code 2015 project which is aiming to enhance libvirt sandbox so that it is capable of executing images distributed by the Docker container image repository service. The motivation for this goes back to the original reason for creating the libvirt-sandbox project in the first place, which was to provide a hypervisor agnostic framework for sandboxing applications, as a higher level above the libvirt API. Once this is work is complete it’ll be possible to launch Docker images via libvirt QEMU, KVM or LXC, with no need for the Docker toolchain itself.
The detailed list of changes in this release is:
- API/ABI in-compatible change, soname increased
- Prevent use of virt-sandbox-service as non-root upfront
- Fix misc memory leaks
- Block SIGHUP from the dhclient binary to prevent accidental death if the controlling terminal is closed & reopened
- Add support for re-creating libvirt XML from sandbox config to facilitate upgrades
- Switch to standard gobject introspection autoconf macros
- Add ability to set filters on network interfaces
- Search /usr/lib instead of /lib for systemd unit files, as the former is the canonical location even when / and /usr are merged
- Only set SELinux labels on hosts that support SELinux
- Explicitly link to selinux, instead of relying on indirect linkage
- Update compiler warning flags
- Fix misc docs comments
- Don’t assume use of SELinux in virt-sandbox-service
- Fix path checks for SUSE in virt-sandbox-service
- Add support for AppArmour profiles
- Mount /var after other FS to ensure host image is available
- Ensure state/config dirs can be accessed when QEMU is running non-root for qemu:///system
- Fix mounting of host images in QEMU sandboxes
- Mount images as ext4 instead of ext3
- Allow use of non-raw disk images as filesystem mounts
- Check if required static libs are available at configure time to prevent silent fallback to shared linking
- Require libvirt-glib >= 0.2.1
- Add support for loading lzma and gzip compressed kmods
- Check for support libvirt URIs when starting guests to ensure clear error message upfront
- Add LIBVIRT_SANDBOX_INIT_DEBUG env variable to allow debugging of kernel boot messages and sandbox init process setup
- Add support for exposing block devices to sandboxes with a predictable name under /dev/disk/by-tag/TAGNAME
- Use devtmpfs instead of tmpfs for auto-populating /dev in QEMU sandboxes
- Allow setup of sandbox with custom root filesystem instead of inheriting from host’s root.
- Allow execution of apps from non-matched ld-linux.so / libc.so, eg executing F19 binaries on F22 host
- Use passthrough mode for all QEMU filesystems
If you read nothing else, just take note that in the Liberty release cycle Nova has deprecated usage of libvirt versions < 0.10.2, and in the Mxxxxx release cycle support for running with libvirt < 0.10.2 will be explicitly dropped.
OpenStack has a fairly aggressive policy of updating the minimum required versions of python modules it depends on. Many python modules are updated pretty frequently and (bitter) experience has shown that updates will often not be API compatible, even across seemingly minor version number changes. Maintaining working OpenStack code across different incompatible versions of the same module is tricky to get right and will inevitably be fragile without good testing coverage. While OpenStack has a huge level of testing, it cannot reasonably be expected to track the matrix of different incompatible python module versions. So (reluctantly) I accept that OpenStack has chosen the only practical approach, which is the increase the min required version of a library anytime it is found to be incompatible with an older version, or whenever there is a new feature required that is only present in a newer version. Now this does create pain in that the versions of python modules shipped in most distros are going to be too old to satisfy OpenStack’s needs. Thus when deploying OpenStack the distro provided versions must be updated to something newer. Fortunately, most OpenStack deployment tools mitigate the pain for users by taking ownership of installation and management of the full python stack, whether a 3rd party module, or an OpenStack provided module and this works pretty well in general.
It is important to contrast this with the situation found for dependencies on non-python modules, and in particular for Nova, the hypervisor platform that is targeted. While OpenStack does get some testing coverage of the hypervisor control plane, it is inconsequential when placed in the context of testing done by the hypervisor vendors themselves. The vendors will of course have tested the control plane themselves, both directly and often in the context of higher level apps such as oVirt and OpenStack. Beyond that though, the vendors will test a whole suite of guest operating systems to ensure they deploy and operate in a functionally correct manner. For Windows guests, there will be certifications of accelerated guest drivers via WHQL and the OS as a whole with Microsoft’s SVVP. The vendor will benchmark and validate scalability and performance the hypervisor on a multitude of compute workloads, and against various different storage and network technologies. For government related deployments, the platform will go through Common Criteria Certifications and security audits. Finally of course, the vendor will have a team of people maintaining the version they ship, most critically of course, to deal with security errata. I should note that I’m thinking about Open Source hypervisors primarily here and the difference between upstream releases and productized downstream releases. For closed source hypervisors you only ever get access to the productized release.
This is all a long winded way of saying that it is a very hard sell for OpenStack to require users to update their hypervisor versions to something OpenStack has tested, in preference to the version that the vendor ordinarily ships & supports. The benefit of OpenStack’s testing of the hypervisor control plane does not come anywhere close to offsetting the costs of loosing the testing, certification & support work that the vendor has put onto the hypervisor platform as a whole. There are also costs suffered directly by the user wrt platform upgrades, as distinct from application upgrades. It is fairly common for organizations to go through their own internal build and certification process when deploying a new operating system and/or hypervisor platform. This will include jobs such as integrating with their network services, particularly authentication & authorization engines, service monitoring frameworks, auditing systems and backups services. In addition the OS/hypervisor is also likely to undergo testing against any hardware platforms/models that the organization may have standardized on. It may take as long as 6 months, or even 12, before some organizations are ready to deploy a new hypervisor platform released by a vendor. Once an organization has deployed a platform, they will naturally wish to maximise its useful lifetime before upgrading to newer versions. This is in stark contrast to applications that an organization runs on the platforms which may be upgraded very frequently in matter of weeks or even days. It is sad that there can be such time lags for platform but not applications, but unfortunately this is just the way many organizations IT support works.
For these reasons, OpenStack needs to take a different approach to hypervisor platforms, and be pretty conservative about updating the minimum required version. The costs on users will be quite large and not something that can be mitigated by deployment tools that OpenStack can provide, unless the organization is one of the minority that is nimble enough to cope with a continuous deployment model and has enough in house expertise to take on a degree of hypervisor maintenance. In cases where Nova does wish to update the minimum required version there needs to be a fairly compelling set of benefits to Nova that outweigh the costs that will be imposed on the downstream users. Mere prettiness / cleanliness of the code is exceedingly unlikely to count as a compelling benefit.
Looking specifically at the Libvirt + KVM platform dependency in Nova, back in November 2013 we increased the minimum required libvirt from 0.9.6 to 0.9.11. This had the cost of dropping the ability to run Nova on the (then current) Ubuntu LTS platform. This cost was largely mitigated by the fact that Canonical provide the Cloud Archive add-on repository which ships newer libvirt and KVM versions specifically for use with OpenStack, so users had an easy way out in that case. The compelling benefit to Nova though, was that it enabled OpenStack to depend on the new libvirt-python module that had been split off from the main libvirt package and made available on PyPi. This made it possible for OpenStack testing to setup virtualenvs with specific libvirt python versions in common with its approach for any other python modules. More importantly this new libvirt-python has support for the Python 3 platform, so unblocking that porting item for Nova. As a result, the upgrade from 0.9.6 to 0.9.11 was a clear net win on balance.
The benefit of increasing the min required libvirt to values beyond 0.9.11 is harder to articulate. It would enable removal of a few workarounds in OpenStack but nothing that is imposing an undue burden on Nova Libvirt driver maintenance at this time. Mostly the problem with older versions is that they simply lack a lot of functionality compared to current versions, so there will be an increasingly large set of OpenStack features which will not work at all on such old versions. They also get comparatively less testing by developers, vendors and users alike, so as time goes by we’re less likely to spot incompatibilities with old versions which will ultimately affect the experience users have when deploying OpenStack. It is less clear cut when to the draw the line though, in these cases. To help guide our decision making, a list of currently shipping libvirt, kvm and libguestfs versions across distros is maintained. For the community focused distros with short lifetimes (short == less than 2 years from release to end-of-life), it is quite simple to just drop them as supported targets when they go end of life. So from the POV of Fedora, at time of writing, we’ll only care about Nova supporting Libvirt >= 1.1.3. For the enterprise focus distros with long lifetimes (long == more than 2 years, often 5-10 years), it is hard to decide when to drop them as a supported target. As mentioned earlier, enterprise organizations will typically have quite a time lag between a new release coming out and it being something that is widely deployed. Despite RHEL-7 having been available since June 2014, it is not uncommon for organizations to still be using RHEL-6 for new platform deployments. Officially, RHEL-6 is a supported platform by Red Hat until at least 2020, but clearly Nova will not wish to continue targeting it for that length of time. So there is a question of when it is reasonable for Nova to end support for the RHEL-6 platform. Nova already dropped support for Python 2.6, so RHEL-6 users will need to use the Software Collections Layer to get Python 2.7 access, and Red Hat’s OpenStack product is now RHEL-7 based only, so clearly Nova on RHEL-6 is entering its twilight years.
Looking at the current distro support matrix for libvirt versions it was decided that support for Debian Wheezy and OpenSuse 12.2 was reasonable to drop, but at this time Nova will continue to support RHEL-6 vintage libvirt. To provide users with greater advance notice it was agreed that dropping of libvirt/kvm versions should require issuance of a deprecation warning for one release cycle.. So in the Liberty release, Nova will now print out a warning if run on libvirt < 0.10.2, and in the Mxxxx release cycle this will turn into a fatal error. So anyone currently deployed on libvirt 0.9.11 -> 0.10.1 has advance warning to plan for an upgrade of their hypervisor platform. I suspect that RHEL-6 may well get the chop 1 cycle later, eg we’d issue a warning in Mxxx and drop it in Nxxxx release, as RHEL-7 would have been available for 2 years by that point and should be taking the overwhealming majority of KVM hypervisor deployments.
One of the things to come out of the discussion around incrementing the libvirt minimum version was that we haven’t really articulated what our policy is in this area. As one of the lead maintainers of the Nova libvirt driver, this blog post is an attempt to set out my views of the matter. As you can see there is no simple answer, but the intent is to be as conservative as practical to minimize the number of users who are likely to be impacted by decisions to increase the minimum version. Is also became clear that we need to do a better job of articulating our approach to required platform versions to users in documentation. Previously there had been an attempt to categorize Nova hypervisor platforms/drivers into three groups, primarily according to the level of testing they have in the OpenStack or 3rd party CI systems. The intention behind this is fine, but the usefulness to users is somewhat limited because OpenStack CI obviously only tests a handful of very specific hypervisor platforms. So this classification gives you confidence that a Nova driver has been tested, but not confidence that it has been tested with your particular versions. So functionality that OpenStack claims is tested & operational may not be available on your platform due to version differences. To address this, OpenStack needs to provide more detailed information to users, in particular it must distinguish between what versions of a hypervisor Nova is technically capable of running against, vs the versions of a hypervisor that have been validated by CI. Armed with this knowledge, where those versions differ, it is reasonable for the user to look to their hypervisor vendor for confirmation that their own testing can provide an equivalent level of assurance to the OpenStack CI testing. The user also has the option of running the OpenStack CI tests themselves against their own specific deployment platform. On the theme of providing users with more information about hypervisor capabilities, the Nova feature support matrix which was previously held in a wiki has been turned into a piece of formal documentation maintained in Nova itself. The intent is to continue to expand this to provide more fine grained information about features and eventually annotate them with any caveats about minimum required versions of the hypervisor in the associated notes for each feature item.
One of the issues encountered when debugging libvirt guest problems with Nova, is that it isn’t always entirely obvious why the guest XML is configured the way it is. For a while now, libvirt has had the ability to record arbitrary application specific metadata in the guest XML. Each application simply declares the XML namespace it wishes to use and can then record whatever it wants. Libvirt will treat this metadata as a black box, never attempting to interpret or modify it. In the Juno release I worked on a blueprint to make use of this feature to record some interesting information about Nova.
The initial set of information recorded is as follows:
- Version – the Nova version number, and any vendor specific package suffiix (eg RPM release number). This is useful as the user reporting a bug is often not entirely clear what particular RPM version was installed when the guest was first booted.
- Name – the Nova instance display name. While you can correlated Nova instances to libvirt guests using the UUID, users reporting bugs often only tell you the display name. So recording this in the XML is handy to correlate which XML config corresponds to which Nova guest they’re talking about.
- Creation time – the time at which Nova booted the guest. Sometimes useful when trying to understand the sequence in which things happened.
- Flavour – the Nova flavour name, memory, disk, swap, ephemeral and vcpus settings. Flavours can be changed by the admin after a guest is booted, so having the original values recorded against the guest XML is again handy.
- Owner – the tenant user ID and name, as well as their project
- Root image – the glance image ID, if the guest was booted from an image
The Nova version number information in particular has already proved very useful in a couple of support tickets, showing that the VM instance was not booted under the software version that was initially claimed. There is still scope for augmenting this information further though. When working on another support issues it would have been handy to know the image properties and flavour extra specs that were set, as the user’s bug report also gave misleading / incorrect information in this area. Information about cinder block devices would also be useful to have access to, for cases where the guest isn’t booting from an image.
While all this info is technically available from the Nova database, it is far easier (and less dangerous) to ask the user to provide the libvirt XML configuration than to have them run random SQL queries. Standard OS trouble shooting tools such as sosreport from RHEL/Fedora already collect the libvirt guest XML when run. As a result, the bug report is more likely to contain this useful data in the initial filing, avoiding the need to ask the user to collect further data after the fact.
To give an example of what the data looks like, a Nova guest booted with
$ nova boot --image cirros-0.3.0-x86_64-uec --flavor m1.tiny vm1
Gets the following data recorded
$ virsh -c qemu:///system dumpxml instance-00000001
<domain type='kvm' id='2'>
<name>instance-00000001</name>
<uuid>d0e51bbd-cbbd-4abc-8f8c-dee2f23ded12</uuid>
<metadata>
<nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
<nova:package version="2015.1"/>
<nova:name>vm1</nova:name>
<nova:creationTime>2015-02-19 18:23:44</nova:creationTime>
<nova:flavor name="m1.tiny">
<nova:memory>512</nova:memory>
<nova:disk>1</nova:disk>
<nova:swap>0</nova:swap>
<nova:ephemeral>0</nova:ephemeral>
<nova:vcpus>1</nova:vcpus>
</nova:flavor>
<nova:owner>
<nova:user uuid="ef53a6031fc643f2af7add439ece7e9d">admin</nova:user>
<nova:project uuid="60a60883d7de429aa45f8f9d689c1fd6">demo</nova:project>
</nova:owner>
<nova:root type="image" uuid="2344a0fc-a34b-4e2d-888e-01db795fc89a"/>
</nova:instance>
</metadata>
...snip...
</domain>
The intention is that as long as the XML namespace URI (http://openstack.org/xmlns/libvirt/nova/1.0
) isn’t changed, the data reported here will not change in a backwards incompatible manner. IOW, we will add further elements or attributes to the Nova metadata, but not change or remove existing elements or attributes. So if OpenStack related troubleshooting / debugging tools want to extract this data from the libvirt XML they can be reasonably well assured of compatibility across future Nova releases.
In the Kilo development cycle there have been patches submitted to record similar kind of data for VMWare guests, though obviously it uses a config data format relevant to VMWare rather than XML. Sadly this useful debugging improvement for VMWare had its feature freeze exception request rejected, pushing it out to Liberty, which is rather a shame :-(
This blogs describes a error reporting / troubleshooting feature added to Nova a while back which people are probably not generally aware of.
One of the early things I worked on in the Nova libvirt driver was integration of support for the libvirt event notification system. This allowed Nova to get notified when instances are shutdown by the guest OS administrator, instead of having to wait for the background power state sync task to run. Supporting libvirt events was a theoretically simple concept, but in practice there was a nasty surprise. The key issue was that we needed to have a native thread running the libvirt event loop, while the rest of Nova uses green threads. The code kept getting bizarre deadlocks, which were eventually traced to use of the python logging APIs in the native thread. Since eventlet had monkeypatched the thread mutex primitives, the logging APIs couldn’t be safely called from the native thread as they’d try to obtain a green mutex from a native thread context.
Eventlet has a concept of a backdoor port, which lets you connect to the python process using telnet and get an interactive python prompt. After turning this on, I got a stack trace of all green and native threads and eventually figured out the problem, which was great. Sadly the eventlet backdoor is not something anyone sane would ever enable out of the box on production systems – you don’t really want to allow remote command execution to anyone who can connect to a TCP port :-) Another debugging option is to use Python’s native debugger, but this is again something you have to enable ahead of time and won’t be something admins enable out of the box on production systems. It is possible to connect to a running python process with GDB and get a crude stack trace, but that’s not great either as it requires python debuginfo libraries installed. It would be possible to build an administrative debugging API for Nova using the REST API, but that only works if the REST API & message bus are working – not something that’s going to be much use when Nova itself has deadlocked the entire python interpretor
After this debugging experience I decided to propose something that I’ve had on previous complex systems, a facility that allows an admin to trigger a live error report. Crucially this facility must not require any kind of deployment setup tasks and thus be guaranteed available at all times, especially on production systems where debugging options are limited by the need to maintain service to users. I called the proposal the “Guru Meditation Report” in reference to old Amiga crash reports. I did a quick proof of concept to test the idea, but Solly Ross turned the idea into a complete feature for OpenStack, adding it to the Oslo Incubator in the openstack.common.reports namespace and integrating with Nova. This shipped with the Icehouse release of Nova.
Service integration & usage
Integration into projects is quite straightforward, the openstack-common.conf file needs to list the relevant modules to import from oslo-incubator
$ grep report openstack-common.conf
module=report
module=report.generators
module=report.models
module=report.views
module=report.views.json
module=report.views.text
module=report.views.xml
then each service process needs to initialize the error report system. This just requires a single import line and method call from the main
method
$ cat nova/cmd/nova-compute
...snip...
from nova.openstack.common.report import guru_meditation_report as gmr
...snip...
def main():
gmr.TextGuruMeditation.setup_autorun(version)
...run eventlet service or whatever...
The setup_autorun
method installs a signal handler connected to SIGUSR1
which will dump an error report to stderr when triggered.
So from Icehouse onwards, if any Nova process is mis-behaving you can simply run something like
$ kill -USR1 `pgrep nova-compute`
to get a detailed error report of the process state sent to stderr. On RHEL-7 / Fedora systems this data should end up going into the systemd journal for that service. On other systems you may be lucky enough for the init service to have redirected stderr to a log file, or unlucky enough to have it sent to /dev/null. Did I mention the systemd journal is a really exactly feature when troubleshooting service problems :-)
Error report information
In the oslo-incubator code there are 5 standard sections defined for the error report
- Config – dump of all configuration settings loaded by oslo.config – useful because the config settings loaded in memory don’t necessarily match what is currently stored in /etc/nova/nova.conf on disk – eg admin may have modified the config and forgotten to reload the services.
- Package – information about the software package, as passed into the setup_autorun method previously. This lets you know the openstack release version number, the vendor who packaged it and any vendor specific version info such as the RPM release number. This is again key, because what’s installed on the host currently may not match the version that’s actually running. You can’t always trust the admins to give you correct info when reporting bugs, so having software report itself is more reliable.
- Process – information about the running process including the process ID, parent process ID, user ID, group ID and scheduler state.
- Green Threads – stack trace of every eventlet green thread currently in existence
- Native Threads – stack trace of every native thread currently in existence
The report framework is modular, so it is possible to register new generator functions which add further data to the error report. This is useful if there is application specific data that is useful to include, that would not otherwise be suitable for inclusion in oslo-incubator directly. The data model is separated from the output formatting code, so it is possible to output the report in a number of different data formats. The reports which get sent to stderr are using a crude plain text format, but it is possible to have reports generated in XML, JSON, or another completely custom format.
Future improvements
Triggering from a UNIX signal and printing to stderr, is a very simple and reliable approach that we can guarantee will almost always work no matter what operational state the OpenStack deployment as a whole is in. It should not be considered the only possible approach though. I can see that it may be desirable to also wire this up to RPC messaging bus, so a cloud admin can remotely generate an error report for a service and get the response back over the message bus in an XML or JSON format. This wouldn’t replace the SIGUSR1 based stderr dumps, but rather augment them, as we want to retain the ability to trigger reports even if rabbitmq bus connection is dead/broken for some reason.
AFAIK, this error report system is only wired up into the Nova project at this time. It is desirable to bring this feature over to projects like Neutron, Cinder, Glance, Keystone too, so it can be a considered an openstack wide standard system for admins to collect data for troubleshooting. As explained above, this is no more difficult that adding the modules to openstack-common.conf and then adding a single method call to the service startup main method. Those projects might like to register extra error report sections to provide further data, but that’s by no means required for initial integration.
Having error reports triggered on demand by the admin is nice, but I think there is also value in having error reports triggered automatically in response to unexpected error conditions. For example if a RPC request to boot a new VM instance fails, it could be desirable to save a detailed error report, rather than just having an exception hit the logs with no context around it. In such a scenario you would extend the error report generator so that the report included the exception information & stack trace, and also include the headers and/or payload of the RPC request that failed. The error report would probably be written to a file instead of stderr, using JSON or XML. Tools could then be written to analyse error reports and identify commonly recurring problems.
Even with the current level of features for the error report system, it has proved its worth in facilitating the debugging of a number of problems in Nova, where things like the eventlet backdoor or python debugger were impractical to use. I look forward to its continued development and broader usage across openstack.
Example report
What follows below is an example error report from the nova-compute service running in one of my development hosts. Notice that oslo.conf configuration parameters that were declared with the ‘secret’ flag have their value masked. This is primarily aiming to prevent passwords making their way into the error report, since the expectation is the users may attach these reports to public bugs.
========================================================================
==== Guru Meditation ====
========================================================================
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
========================================================================
==== Package ====
========================================================================
product = OpenStack Nova
vendor = OpenStack Foundation
version = 2015.1
========================================================================
==== Threads ====
========================================================================
------ Thread #140157298652928 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157307045632 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157734876928 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158288500480 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158416287488 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158424680192 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:113 in _native_thread
`libvirt.virEventRunDefaultImpl()`
/usr/lib64/python2.7/site-packages/libvirt.py:340 in virEventRunDefaultImpl
`ret = libvirtmod.virEventRunDefaultImpl()`
------ Thread #140157684520704 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158296893184 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158305285888 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157709698816 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158322071296 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157726484224 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158330464000 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157332223744 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157701306112 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157692913408 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157315438336 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158537955136 ------
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:346 in run
`self.wait(sleep_time)`
/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py:85 in wait
`presult = self.do_poll(seconds)`
/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py:62 in do_poll
`return self.poll.poll(seconds)`
------ Thread #140158313678592 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157718091520 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157323831040 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158338856704 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
========================================================================
==== Green Threads ====
========================================================================
------ Green Thread ------
/usr/bin/nova-compute:9 in <module>
`load_entry_point('nova==2015.1.dev352', 'console_scripts', 'nova-compute')()`
/home/berrange/src/cloud/nova/nova/cmd/compute.py:74 in main
`service.wait()`
/home/berrange/src/cloud/nova/nova/service.py:444 in wait
`_launcher.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:187 in wait
`status, signo = self._wait_for_exit_or_signal(ready_callback)`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:170 in _wait_for_exit_or_signal
`super(ServiceLauncher, self).wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:133 in wait
`self.services.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:473 in wait
`self.tg.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/threadgroup.py:145 in wait
`x.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/threadgroup.py:47 in wait
`return self.thread.wait()`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:175 in wait
`return self._exit_event.wait()`
/usr/lib/python2.7/site-packages/eventlet/event.py:121 in wait
`return hubs.get_hub().switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
No Traceback!
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/green/thread.py:40 in __thread_body
`func(*args, **kwargs)`
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/qpid/selector.py:126 in run
`rd, wr, ex = select(self.reading, self.writing, (), timeout)`
/usr/lib/python2.7/site-packages/eventlet/green/select.py:83 in select
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:492 in run_service
`done.wait()`
/usr/lib/python2.7/site-packages/eventlet/event.py:121 in wait
`return hubs.get_hub().switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:124 in _dispatch_thread
`self._dispatch_events()`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:228 in _dispatch_events
`_c = self._event_notify_recv.read(1)`
/usr/lib64/python2.7/socket.py:380 in read
`data = self._sock.recv(left)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:464 in recv
`self._trampoline(self, read=True)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:439 in _trampoline
`mark_as_closed=self._mark_as_closed)`
/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py:162 in trampoline
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/tpool.py:55 in tpool_trampoline
`_c = _rsock.recv(1)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:325 in recv
`timeout_exc=socket.timeout("timed out"))`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:200 in _trampoline
`mark_as_closed=self._mark_as_closed)`
/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py:162 in trampoline
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/usr/lib/python2.7/site-packages/oslo_utils/excutils.py:92 in inner_func
`return infunc(*args, **kwargs)`
/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py:93 in _executor_thread
`incoming = self.listener.poll()`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:121 in poll
`self.conn.consume(limit=1, timeout=timeout)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:755 in consume
`six.next(it)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:685 in iterconsume
`yield self.ensure(_error_callback, _consume)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:602 in ensure
`return method()`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:669 in _consume
`timeout=poll_timeout)`
<string>:6 in next_receiver
(source not found)
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:674 in next_receiver
`if self._ecwait(lambda: self.incoming, timeout):`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:50 in _ecwait
`result = self._ewait(lambda: self.closed or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:580 in _ewait
`result = self.connection._ewait(lambda: self.error or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:218 in _ewait
`result = self._wait(lambda: self.error or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:197 in _wait
`return self._waiter.wait(predicate, timeout=timeout)`
/usr/lib/python2.7/site-packages/qpid/concurrency.py:59 in wait
`self.condition.wait(timeout - passed)`
/usr/lib/python2.7/site-packages/qpid/concurrency.py:96 in wait
`sw.wait(timeout)`
/usr/lib/python2.7/site-packages/qpid/compat.py:53 in wait
`ready, _, _ = select([self], [], [], timeout)`
/usr/lib/python2.7/site-packages/eventlet/green/select.py:83 in select
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/home/berrange/src/cloud/nova/nova/openstack/common/loopingcall.py:90 in _inner
`greenthread.sleep(-delay if delay < 0 else 0)`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:34 in sleep
`hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/openstack/common/loopingcall.py:133 in _inner
`greenthread.sleep(idle)`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:34 in sleep
`hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
========================================================================
==== Processes ====
========================================================================
Process 27760 (under 27711) [ run by: berrange (1000), state: running ]
========================================================================
==== Configuration ====
========================================================================
cells:
bandwidth_update_interval = 600
call_timeout = 60
capabilities =
hypervisor=xenserver;kvm
os=linux;windows
cell_type = compute
enable = False
manager = nova.cells.manager.CellsManager
mute_child_interval = 300
name = nova
reserve_percent = 10.0
topic = cells
cinder:
cafile = None
catalog_info = volumev2:cinderv2:publicURL
certfile = None
cross_az_attach = True
endpoint_template = None
http_retries = 3
insecure = False
keyfile = None
os_region_name = None
timeout = None
conductor:
manager = nova.conductor.manager.ConductorManager
topic = conductor
use_local = False
workers = None
database:
backend = sqlalchemy
connection = ***
connection_debug = 0
connection_trace = False
db_inc_retry_interval = True
db_max_retries = 20
db_max_retry_interval = 10
db_retry_interval = 1
idle_timeout = 3600
max_overflow = None
max_pool_size = None
max_retries = 10
min_pool_size = 1
mysql_sql_mode = TRADITIONAL
pool_timeout = None
retry_interval = 10
slave_connection = ***
sqlite_db = nova.sqlite
sqlite_synchronous = True
use_db_reconnect = False
use_tpool = False
default:
allow_migrate_to_same_host = True
allow_resize_to_same_host = True
allow_same_net_traffic = True
amqp_auto_delete = False
amqp_durable_queues = False
api_paste_config = /etc/nova/api-paste.ini
api_rate_limit = False
auth_strategy = keystone
auto_assign_floating_ip = False
backdoor_port = None
bandwidth_poll_interval = 600
bindir = /usr/bin
block_device_allocate_retries = 60
block_device_allocate_retries_interval = 3
boot_script_template = /home/berrange/src/cloud/nova/nova/cloudpipe/bootscript.template
ca_file = cacert.pem
ca_path = /home/berrange/src/cloud/data/nova/CA
cert_manager = nova.cert.manager.CertManager
client_socket_timeout = 900
cnt_vpn_clients = 0
compute_available_monitors =
nova.compute.monitors.all_monitors
compute_driver = libvirt.LibvirtDriver
compute_manager = nova.compute.manager.ComputeManager
compute_monitors =
compute_resources =
vcpu
compute_stats_class = nova.compute.stats.Stats
compute_topic = compute
config-dir = None
config-file =
/etc/nova/nova.conf
config_drive_format = iso9660
config_drive_skip_versions = 1.0 2007-01-19 2007-03-01 2007-08-29 2007-10-10 2007-12-15 2008-02-01 2008-09-01
console_host = mustard.gsslab.fab.redhat.com
console_manager = nova.console.manager.ConsoleProxyManager
console_topic = console
consoleauth_manager = nova.consoleauth.manager.ConsoleAuthManager
consoleauth_topic = consoleauth
control_exchange = nova
create_unique_mac_address_attempts = 5
crl_file = crl.pem
db_driver = nova.db
debug = True
default_access_ip_network_name = None
default_availability_zone = nova
default_ephemeral_format = ext4
default_flavor = m1.small
default_floating_pool = public
default_log_levels =
amqp=WARN
amqplib=WARN
boto=WARN
glanceclient=WARN
iso8601=WARN
keystonemiddleware=WARN
oslo.messaging=INFO
qpid=WARN
requests.packages.urllib3.connectionpool=WARN
routes.middleware=WARN
sqlalchemy=WARN
stevedore=WARN
suds=INFO
urllib3.connectionpool=WARN
websocket=WARN
default_notification_level = INFO
default_publisher_id = None
default_schedule_zone = None
defer_iptables_apply = False
dhcp_domain = novalocal
dhcp_lease_time = 86400
dhcpbridge = /usr/bin/nova-dhcpbridge
dhcpbridge_flagfile =
/etc/nova/nova.conf
dmz_cidr =
dmz_mask = 255.255.255.0
dmz_net = 10.0.0.0
dns_server =
dns_update_periodic_interval = -1
dnsmasq_config_file =
ebtables_exec_attempts = 3
ebtables_retry_interval = 1.0
ec2_listen = 0.0.0.0
ec2_listen_port = 8773
ec2_private_dns_show_ip = False
ec2_strict_validation = True
ec2_timestamp_expiry = 300
ec2_workers = 2
enable_new_services = True
enabled_apis =
ec2
metadata
osapi_compute
enabled_ssl_apis =
fake_call = False
fake_network = False
fatal_deprecations = False
fatal_exception_format_errors = False
firewall_driver = nova.virt.libvirt.firewall.IptablesFirewallDriver
fixed_ip_disassociate_timeout = 600
fixed_range_v6 = fd00::/48
flat_injected = False
flat_interface = p1p1
flat_network_bridge = br100
flat_network_dns = 8.8.4.4
floating_ip_dns_manager = nova.network.noop_dns_driver.NoopDNSDriver
force_config_drive = always
force_dhcp_release = True
force_raw_images = True
force_snat_range =
forward_bridge_interface =
all
gateway = None
gateway_v6 = None
heal_instance_info_cache_interval = 60
host = mustard.gsslab.fab.redhat.com
image_cache_manager_interval = 2400
image_cache_subdirectory_name = _base
injected_network_template = /home/berrange/src/cloud/nova/nova/virt/interfaces.template
instance_build_timeout = 0
instance_delete_interval = 300
instance_dns_domain =
instance_dns_manager = nova.network.noop_dns_driver.NoopDNSDriver
instance_format = [instance: %(uuid)s]
instance_name_template = instance-%08x
instance_usage_audit = False
instance_usage_audit_period = month
instance_uuid_format = [instance: %(uuid)s]
instances_path = /home/berrange/src/cloud/data/nova/instances
internal_service_availability_zone = internal
iptables_bottom_regex =
iptables_drop_action = DROP
iptables_top_regex =
ipv6_backend = rfc2462
key_file = private/cakey.pem
keys_path = /home/berrange/src/cloud/data/nova/keys
keystone_ec2_insecure = False
keystone_ec2_url = http://10.33.8.112:5000/v2.0/ec2tokens
l3_lib = nova.network.l3.LinuxNetL3
linuxnet_interface_driver = nova.network.linux_net.LinuxBridgeInterfaceDriver
linuxnet_ovs_integration_bridge = br-int
live_migration_retry_count = 30
lockout_attempts = 5
lockout_minutes = 15
lockout_window = 15
log-config-append = None
log-date-format = %Y-%m-%d %H:%M:%S
log-dir = None
log-file = None
log-format = None
logging_context_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [%(request_id)s %(user_name)s %(project_name)s%(color)s] %(instance)s%(color)s%(message)s
logging_debug_format_suffix = from (pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d
logging_default_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [-%(color)s] %(instance)s%(color)s%(message)s
logging_exception_prefix = %(color)s%(asctime)s.%(msecs)03d TRACE %(name)s %(instance)s
max_age = 0
max_concurrent_builds = 10
max_header_line = 16384
max_local_block_devices = 3
maximum_instance_delete_attempts = 5
memcached_servers = None
metadata_host = 10.33.8.112
metadata_listen = 0.0.0.0
metadata_listen_port = 8775
metadata_manager = nova.api.manager.MetadataManager
metadata_port = 8775
metadata_workers = 2
migrate_max_retries = -1
mkisofs_cmd = genisoimage
monkey_patch = False
monkey_patch_modules =
nova.api.ec2.cloud:nova.notifications.notify_decorator
nova.compute.api:nova.notifications.notify_decorator
multi_host = True
multi_instance_display_name_template = %(name)s-%(count)d
my_block_storage_ip = 10.33.8.112
my_ip = 10.33.8.112
network_allocate_retries = 0
network_api_class = nova.network.api.API
network_device_mtu = None
network_driver = nova.network.linux_net
network_manager = nova.network.manager.FlatDHCPManager
network_size = 256
network_topic = network
networks_path = /home/berrange/src/cloud/data/nova/networks
non_inheritable_image_properties =
bittorrent
cache_in_nova
notification_driver =
notification_topics =
notifications
notify_api_faults = False
notify_on_state_change = None
novncproxy_base_url = http://10.33.8.112:6080/vnc_auto.html
null_kernel = nokernel
num_networks = 1
osapi_compute_listen = 0.0.0.0
osapi_compute_listen_port = 8774
osapi_compute_workers = 2
ovs_vsctl_timeout = 120
password_length = 12
pci_alias =
pci_passthrough_whitelist =
periodic_enable = True
periodic_fuzzy_delay = 60
policy_default_rule = default
policy_dirs =
policy.d
policy_file = policy.json
preallocate_images = none
project_cert_subject = /C=US/ST=California/O=OpenStack/OU=NovaDev/CN=project-ca-%.16s-%s
public_interface = br100
publish_errors = False
pybasedir = /home/berrange/src/cloud/nova
qpid_heartbeat = 60
qpid_hostname = 10.33.8.112
qpid_hosts =
10.33.8.112:5672
qpid_password = ***
qpid_port = 5672
qpid_protocol = tcp
qpid_receiver_capacity = 1
qpid_sasl_mechanisms =
qpid_tcp_nodelay = True
qpid_topology_version = 1
qpid_username =
quota_cores = 20
quota_driver = nova.quota.DbQuotaDriver
quota_fixed_ips = -1
quota_floating_ips = 10
quota_injected_file_content_bytes = 10240
quota_injected_file_path_length = 255
quota_injected_files = 5
quota_instances = 10
quota_key_pairs = 100
quota_metadata_items = 128
quota_ram = 51200
quota_security_group_rules = 20
quota_security_groups = 10
quota_server_group_members = 10
quota_server_groups = 10
reboot_timeout = 0
reclaim_instance_interval = 0
remove_unused_base_images = True
remove_unused_original_minimum_age_seconds = 86400
report_interval = 10
rescue_timeout = 0
reservation_expire = 86400
reserved_host_disk_mb = 0
reserved_host_memory_mb = 512
resize_confirm_window = 0
resize_fs_using_block_device = False
resume_guests_state_on_host_boot = False
rootwrap_config = /etc/nova/rootwrap.conf
routing_source_ip = 10.33.8.112
rpc_backend = qpid
rpc_conn_pool_size = 30
rpc_response_timeout = 60
rpc_thread_pool_size = 64
run_external_periodic_tasks = True
running_deleted_instance_action = reap
running_deleted_instance_poll_interval = 1800
running_deleted_instance_timeout = 0
scheduler_available_filters =
nova.scheduler.filters.all_filters
scheduler_default_filters =
AvailabilityZoneFilter
ComputeCapabilitiesFilter
ComputeFilter
ImagePropertiesFilter
RamFilter
RetryFilter
ServerGroupAffinityFilter
ServerGroupAntiAffinityFilter
scheduler_manager = nova.scheduler.manager.SchedulerManager
scheduler_max_attempts = 3
scheduler_topic = scheduler
scheduler_weight_classes =
nova.scheduler.weights.all_weighers
security_group_api = nova
send_arp_for_ha = True
send_arp_for_ha_count = 3
service_down_time = 60
servicegroup_driver = db
share_dhcp_address = False
shelved_offload_time = 0
shelved_poll_interval = 3600
shutdown_timeout = 60
snapshot_name_template = snapshot-%s
ssl_ca_file = None
ssl_cert_file = None
ssl_key_file = None
state_path = /home/berrange/src/cloud/data/nova
sync_power_state_interval = 600
syslog-log-facility = LOG_USER
tcp_keepidle = 600
teardown_unused_network_gateway = False
tempdir = None
transport_url = None
until_refresh = 0
update_dns_entries = False
use-syslog = False
use-syslog-rfc-format = False
use_cow_images = True
use_forwarded_for = False
use_ipv6 = False
use_network_dns_servers = False
use_project_ca = False
use_single_default_gateway = False
use_stderr = True
user_cert_subject = /C=US/ST=California/O=OpenStack/OU=NovaDev/CN=%.16s-%.16s-%s
vcpu_pin_set = None
vendordata_driver = nova.api.metadata.vendordata_json.JsonFileVendorData
verbose = True
vif_plugging_is_fatal = True
vif_plugging_timeout = 300
virt_mkfs =
vlan_interface =
vlan_start = 100
vnc_enabled = True
vnc_keymap = en-us
vncserver_listen = 127.0.0.1
vncserver_proxyclient_address = 127.0.0.1
volume_api_class = nova.volume.cinder.API
volume_usage_poll_interval = 0
vpn_flavor = m1.tiny
vpn_image_id = 0
vpn_ip = 10.33.8.112
vpn_key_suffix = -vpn
vpn_start = 1000
wsgi_default_pool_size = 1000
wsgi_keep_alive = True
wsgi_log_format = %(client_ip)s "%(request_line)s" status: %(status_code)s len: %(body_length)s time: %(wall_seconds).7f
xvpvncproxy_base_url = http://10.33.8.112:6081/console
ephemeral_storage_encryption:
cipher = aes-xts-plain64
enabled = False
key_size = 512
glance:
allowed_direct_url_schemes =
api_insecure = False
api_servers =
http://10.33.8.112:9292
host = 10.33.8.112
num_retries = 0
port = 9292
protocol = http
guestfs:
debug = False
keymgr:
api_class = nova.keymgr.conf_key_mgr.ConfKeyManager
libvirt:
block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED, VIR_MIGRATE_NON_SHARED_INC
checksum_base_images = False
checksum_interval_seconds = 3600
connection_uri =
cpu_mode = none
cpu_model = None
disk_cachemodes =
disk_prefix = None
gid_maps =
glusterfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
hw_disk_discard = None
hw_machine_type = None
image_info_filename_pattern = /home/berrange/src/cloud/data/nova/instances/_base/%(image)s.info
images_rbd_ceph_conf =
images_rbd_pool = rbd
images_type = default
images_volume_group = None
inject_key = False
inject_partition = -2
inject_password = False
iscsi_iface = None
iscsi_use_multipath = False
iser_use_multipath = False
live_migration_bandwidth = 0
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED
live_migration_uri = qemu+ssh://berrange@%s/system
mem_stats_period_seconds = 10
nfs_mount_options = None
nfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
num_aoe_discover_tries = 3
num_iscsi_scan_tries = 5
num_iser_scan_tries = 5
qemu_allowed_storage_drivers =
quobyte_client_cfg = None
quobyte_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
rbd_secret_uuid = None
rbd_user = None
remove_unused_kernels = False
remove_unused_resized_minimum_age_seconds = 3600
rescue_image_id = None
rescue_kernel_id = None
rescue_ramdisk_id = None
rng_dev_path = None
scality_sofs_config = None
scality_sofs_mount_point = /home/berrange/src/cloud/data/nova/scality
smbfs_mount_options =
smbfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
snapshot_compression = False
snapshot_image_format = None
snapshots_directory = /home/berrange/src/cloud/data/nova/instances/snapshots
sparse_logical_volumes = False
sysinfo_serial = auto
uid_maps =
use_usb_tablet = False
use_virtio_for_bridges = True
virt_type = kvm
volume_clear = zero
volume_clear_size = 0
volume_drivers =
aoe=nova.virt.libvirt.volume.LibvirtAOEVolumeDriver
fake=nova.virt.libvirt.volume.LibvirtFakeVolumeDriver
fibre_channel=nova.virt.libvirt.volume.LibvirtFibreChannelVolumeDriver
glusterfs=nova.virt.libvirt.volume.LibvirtGlusterfsVolumeDriver
gpfs=nova.virt.libvirt.volume.LibvirtGPFSVolumeDriver
iscsi=nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver
iser=nova.virt.libvirt.volume.LibvirtISERVolumeDriver
local=nova.virt.libvirt.volume.LibvirtVolumeDriver
nfs=nova.virt.libvirt.volume.LibvirtNFSVolumeDriver
quobyte=nova.virt.libvirt.volume.LibvirtQuobyteVolumeDriver
rbd=nova.virt.libvirt.volume.LibvirtNetVolumeDriver
scality=nova.virt.libvirt.volume.LibvirtScalityVolumeDriver
sheepdog=nova.virt.libvirt.volume.LibvirtNetVolumeDriver
smbfs=nova.virt.libvirt.volume.LibvirtSMBFSVolumeDriver
wait_soft_reboot_seconds = 120
xen_hvmloader_path = /usr/lib/xen/boot/hvmloader
osapi_v3:
enabled = True
extensions_blacklist =
extensions_whitelist =
oslo_concurrency:
disable_process_locking = False
lock_path = /home/berrange/src/cloud/data/nova
rdp:
enabled = False
html5_proxy_base_url = http://127.0.0.1:6083/
remote_debug:
host = None
port = None
serial_console:
base_url = ws://127.0.0.1:6083/
enabled = False
listen = 127.0.0.1
port_range = 10000:20000
proxyclient_address = 127.0.0.1
spice:
agent_enabled = True
enabled = False
html5proxy_base_url = http://10.33.8.112:6082/spice_auto.html
keymap = en-us
server_listen = 127.0.0.1
server_proxyclient_address = 127.0.0.1
ssl:
ca_file = None
cert_file = None
key_file = None
upgrade_levels:
baseapi = None
cells = None
compute = None
conductor = None
console = None
consoleauth = None
network = None
scheduler = None
workarounds:
disable_libvirt_livesnapshot = True
disable_rootwrap = False