One of the issues encountered when debugging libvirt guest problems with Nova, is that it isn’t always entirely obvious why the guest XML is configured the way it is. For a while now, libvirt has had the ability to record arbitrary application specific metadata in the guest XML. Each application simply declares the XML namespace it wishes to use and can then record whatever it wants. Libvirt will treat this metadata as a black box, never attempting to interpret or modify it. In the Juno release I worked on a blueprint to make use of this feature to record some interesting information about Nova.
The initial set of information recorded is as follows:
- Version – the Nova version number, and any vendor specific package suffiix (eg RPM release number). This is useful as the user reporting a bug is often not entirely clear what particular RPM version was installed when the guest was first booted.
- Name – the Nova instance display name. While you can correlated Nova instances to libvirt guests using the UUID, users reporting bugs often only tell you the display name. So recording this in the XML is handy to correlate which XML config corresponds to which Nova guest they’re talking about.
- Creation time – the time at which Nova booted the guest. Sometimes useful when trying to understand the sequence in which things happened.
- Flavour – the Nova flavour name, memory, disk, swap, ephemeral and vcpus settings. Flavours can be changed by the admin after a guest is booted, so having the original values recorded against the guest XML is again handy.
- Owner – the tenant user ID and name, as well as their project
- Root image – the glance image ID, if the guest was booted from an image
The Nova version number information in particular has already proved very useful in a couple of support tickets, showing that the VM instance was not booted under the software version that was initially claimed. There is still scope for augmenting this information further though. When working on another support issues it would have been handy to know the image properties and flavour extra specs that were set, as the user’s bug report also gave misleading / incorrect information in this area. Information about cinder block devices would also be useful to have access to, for cases where the guest isn’t booting from an image.
While all this info is technically available from the Nova database, it is far easier (and less dangerous) to ask the user to provide the libvirt XML configuration than to have them run random SQL queries. Standard OS trouble shooting tools such as sosreport from RHEL/Fedora already collect the libvirt guest XML when run. As a result, the bug report is more likely to contain this useful data in the initial filing, avoiding the need to ask the user to collect further data after the fact.
To give an example of what the data looks like, a Nova guest booted with
$ nova boot --image cirros-0.3.0-x86_64-uec --flavor m1.tiny vm1
Gets the following data recorded
$ virsh -c qemu:///system dumpxml instance-00000001
<domain type='kvm' id='2'>
<name>instance-00000001</name>
<uuid>d0e51bbd-cbbd-4abc-8f8c-dee2f23ded12</uuid>
<metadata>
<nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
<nova:package version="2015.1"/>
<nova:name>vm1</nova:name>
<nova:creationTime>2015-02-19 18:23:44</nova:creationTime>
<nova:flavor name="m1.tiny">
<nova:memory>512</nova:memory>
<nova:disk>1</nova:disk>
<nova:swap>0</nova:swap>
<nova:ephemeral>0</nova:ephemeral>
<nova:vcpus>1</nova:vcpus>
</nova:flavor>
<nova:owner>
<nova:user uuid="ef53a6031fc643f2af7add439ece7e9d">admin</nova:user>
<nova:project uuid="60a60883d7de429aa45f8f9d689c1fd6">demo</nova:project>
</nova:owner>
<nova:root type="image" uuid="2344a0fc-a34b-4e2d-888e-01db795fc89a"/>
</nova:instance>
</metadata>
...snip...
</domain>
The intention is that as long as the XML namespace URI (http://openstack.org/xmlns/libvirt/nova/1.0
) isn’t changed, the data reported here will not change in a backwards incompatible manner. IOW, we will add further elements or attributes to the Nova metadata, but not change or remove existing elements or attributes. So if OpenStack related troubleshooting / debugging tools want to extract this data from the libvirt XML they can be reasonably well assured of compatibility across future Nova releases.
In the Kilo development cycle there have been patches submitted to record similar kind of data for VMWare guests, though obviously it uses a config data format relevant to VMWare rather than XML. Sadly this useful debugging improvement for VMWare had its feature freeze exception request rejected, pushing it out to Liberty, which is rather a shame :-(
This blogs describes a error reporting / troubleshooting feature added to Nova a while back which people are probably not generally aware of.
One of the early things I worked on in the Nova libvirt driver was integration of support for the libvirt event notification system. This allowed Nova to get notified when instances are shutdown by the guest OS administrator, instead of having to wait for the background power state sync task to run. Supporting libvirt events was a theoretically simple concept, but in practice there was a nasty surprise. The key issue was that we needed to have a native thread running the libvirt event loop, while the rest of Nova uses green threads. The code kept getting bizarre deadlocks, which were eventually traced to use of the python logging APIs in the native thread. Since eventlet had monkeypatched the thread mutex primitives, the logging APIs couldn’t be safely called from the native thread as they’d try to obtain a green mutex from a native thread context.
Eventlet has a concept of a backdoor port, which lets you connect to the python process using telnet and get an interactive python prompt. After turning this on, I got a stack trace of all green and native threads and eventually figured out the problem, which was great. Sadly the eventlet backdoor is not something anyone sane would ever enable out of the box on production systems – you don’t really want to allow remote command execution to anyone who can connect to a TCP port :-) Another debugging option is to use Python’s native debugger, but this is again something you have to enable ahead of time and won’t be something admins enable out of the box on production systems. It is possible to connect to a running python process with GDB and get a crude stack trace, but that’s not great either as it requires python debuginfo libraries installed. It would be possible to build an administrative debugging API for Nova using the REST API, but that only works if the REST API & message bus are working – not something that’s going to be much use when Nova itself has deadlocked the entire python interpretor
After this debugging experience I decided to propose something that I’ve had on previous complex systems, a facility that allows an admin to trigger a live error report. Crucially this facility must not require any kind of deployment setup tasks and thus be guaranteed available at all times, especially on production systems where debugging options are limited by the need to maintain service to users. I called the proposal the “Guru Meditation Report” in reference to old Amiga crash reports. I did a quick proof of concept to test the idea, but Solly Ross turned the idea into a complete feature for OpenStack, adding it to the Oslo Incubator in the openstack.common.reports namespace and integrating with Nova. This shipped with the Icehouse release of Nova.
Service integration & usage
Integration into projects is quite straightforward, the openstack-common.conf file needs to list the relevant modules to import from oslo-incubator
$ grep report openstack-common.conf
module=report
module=report.generators
module=report.models
module=report.views
module=report.views.json
module=report.views.text
module=report.views.xml
then each service process needs to initialize the error report system. This just requires a single import line and method call from the main
method
$ cat nova/cmd/nova-compute
...snip...
from nova.openstack.common.report import guru_meditation_report as gmr
...snip...
def main():
gmr.TextGuruMeditation.setup_autorun(version)
...run eventlet service or whatever...
The setup_autorun
method installs a signal handler connected to SIGUSR1
which will dump an error report to stderr when triggered.
So from Icehouse onwards, if any Nova process is mis-behaving you can simply run something like
$ kill -USR1 `pgrep nova-compute`
to get a detailed error report of the process state sent to stderr. On RHEL-7 / Fedora systems this data should end up going into the systemd journal for that service. On other systems you may be lucky enough for the init service to have redirected stderr to a log file, or unlucky enough to have it sent to /dev/null. Did I mention the systemd journal is a really exactly feature when troubleshooting service problems :-)
Error report information
In the oslo-incubator code there are 5 standard sections defined for the error report
- Config – dump of all configuration settings loaded by oslo.config – useful because the config settings loaded in memory don’t necessarily match what is currently stored in /etc/nova/nova.conf on disk – eg admin may have modified the config and forgotten to reload the services.
- Package – information about the software package, as passed into the setup_autorun method previously. This lets you know the openstack release version number, the vendor who packaged it and any vendor specific version info such as the RPM release number. This is again key, because what’s installed on the host currently may not match the version that’s actually running. You can’t always trust the admins to give you correct info when reporting bugs, so having software report itself is more reliable.
- Process – information about the running process including the process ID, parent process ID, user ID, group ID and scheduler state.
- Green Threads – stack trace of every eventlet green thread currently in existence
- Native Threads – stack trace of every native thread currently in existence
The report framework is modular, so it is possible to register new generator functions which add further data to the error report. This is useful if there is application specific data that is useful to include, that would not otherwise be suitable for inclusion in oslo-incubator directly. The data model is separated from the output formatting code, so it is possible to output the report in a number of different data formats. The reports which get sent to stderr are using a crude plain text format, but it is possible to have reports generated in XML, JSON, or another completely custom format.
Future improvements
Triggering from a UNIX signal and printing to stderr, is a very simple and reliable approach that we can guarantee will almost always work no matter what operational state the OpenStack deployment as a whole is in. It should not be considered the only possible approach though. I can see that it may be desirable to also wire this up to RPC messaging bus, so a cloud admin can remotely generate an error report for a service and get the response back over the message bus in an XML or JSON format. This wouldn’t replace the SIGUSR1 based stderr dumps, but rather augment them, as we want to retain the ability to trigger reports even if rabbitmq bus connection is dead/broken for some reason.
AFAIK, this error report system is only wired up into the Nova project at this time. It is desirable to bring this feature over to projects like Neutron, Cinder, Glance, Keystone too, so it can be a considered an openstack wide standard system for admins to collect data for troubleshooting. As explained above, this is no more difficult that adding the modules to openstack-common.conf and then adding a single method call to the service startup main method. Those projects might like to register extra error report sections to provide further data, but that’s by no means required for initial integration.
Having error reports triggered on demand by the admin is nice, but I think there is also value in having error reports triggered automatically in response to unexpected error conditions. For example if a RPC request to boot a new VM instance fails, it could be desirable to save a detailed error report, rather than just having an exception hit the logs with no context around it. In such a scenario you would extend the error report generator so that the report included the exception information & stack trace, and also include the headers and/or payload of the RPC request that failed. The error report would probably be written to a file instead of stderr, using JSON or XML. Tools could then be written to analyse error reports and identify commonly recurring problems.
Even with the current level of features for the error report system, it has proved its worth in facilitating the debugging of a number of problems in Nova, where things like the eventlet backdoor or python debugger were impractical to use. I look forward to its continued development and broader usage across openstack.
Example report
What follows below is an example error report from the nova-compute service running in one of my development hosts. Notice that oslo.conf configuration parameters that were declared with the ‘secret’ flag have their value masked. This is primarily aiming to prevent passwords making their way into the error report, since the expectation is the users may attach these reports to public bugs.
========================================================================
==== Guru Meditation ====
========================================================================
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
========================================================================
==== Package ====
========================================================================
product = OpenStack Nova
vendor = OpenStack Foundation
version = 2015.1
========================================================================
==== Threads ====
========================================================================
------ Thread #140157298652928 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157307045632 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157734876928 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158288500480 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158416287488 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158424680192 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:113 in _native_thread
`libvirt.virEventRunDefaultImpl()`
/usr/lib64/python2.7/site-packages/libvirt.py:340 in virEventRunDefaultImpl
`ret = libvirtmod.virEventRunDefaultImpl()`
------ Thread #140157684520704 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158296893184 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158305285888 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157709698816 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158322071296 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157726484224 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158330464000 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157332223744 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157701306112 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157692913408 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157315438336 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158537955136 ------
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:346 in run
`self.wait(sleep_time)`
/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py:85 in wait
`presult = self.do_poll(seconds)`
/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py:62 in do_poll
`return self.poll.poll(seconds)`
------ Thread #140158313678592 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157718091520 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140157323831040 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
------ Thread #140158338856704 ------
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/eventlet/tpool.py:72 in tworker
`msg = _reqq.get()`
/usr/lib64/python2.7/Queue.py:168 in get
`self.not_empty.wait()`
/usr/lib64/python2.7/threading.py:339 in wait
`waiter.acquire()`
========================================================================
==== Green Threads ====
========================================================================
------ Green Thread ------
/usr/bin/nova-compute:9 in <module>
`load_entry_point('nova==2015.1.dev352', 'console_scripts', 'nova-compute')()`
/home/berrange/src/cloud/nova/nova/cmd/compute.py:74 in main
`service.wait()`
/home/berrange/src/cloud/nova/nova/service.py:444 in wait
`_launcher.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:187 in wait
`status, signo = self._wait_for_exit_or_signal(ready_callback)`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:170 in _wait_for_exit_or_signal
`super(ServiceLauncher, self).wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:133 in wait
`self.services.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:473 in wait
`self.tg.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/threadgroup.py:145 in wait
`x.wait()`
/home/berrange/src/cloud/nova/nova/openstack/common/threadgroup.py:47 in wait
`return self.thread.wait()`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:175 in wait
`return self._exit_event.wait()`
/usr/lib/python2.7/site-packages/eventlet/event.py:121 in wait
`return hubs.get_hub().switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
No Traceback!
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/green/thread.py:40 in __thread_body
`func(*args, **kwargs)`
/usr/lib64/python2.7/threading.py:784 in __bootstrap
`self.__bootstrap_inner()`
/usr/lib64/python2.7/threading.py:811 in __bootstrap_inner
`self.run()`
/usr/lib64/python2.7/threading.py:764 in run
`self.__target(*self.__args, **self.__kwargs)`
/usr/lib/python2.7/site-packages/qpid/selector.py:126 in run
`rd, wr, ex = select(self.reading, self.writing, (), timeout)`
/usr/lib/python2.7/site-packages/eventlet/green/select.py:83 in select
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/openstack/common/service.py:492 in run_service
`done.wait()`
/usr/lib/python2.7/site-packages/eventlet/event.py:121 in wait
`return hubs.get_hub().switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:124 in _dispatch_thread
`self._dispatch_events()`
/home/berrange/src/cloud/nova/nova/virt/libvirt/host.py:228 in _dispatch_events
`_c = self._event_notify_recv.read(1)`
/usr/lib64/python2.7/socket.py:380 in read
`data = self._sock.recv(left)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:464 in recv
`self._trampoline(self, read=True)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:439 in _trampoline
`mark_as_closed=self._mark_as_closed)`
/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py:162 in trampoline
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/tpool.py:55 in tpool_trampoline
`_c = _rsock.recv(1)`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:325 in recv
`timeout_exc=socket.timeout("timed out"))`
/usr/lib/python2.7/site-packages/eventlet/greenio.py:200 in _trampoline
`mark_as_closed=self._mark_as_closed)`
/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py:162 in trampoline
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/usr/lib/python2.7/site-packages/oslo_utils/excutils.py:92 in inner_func
`return infunc(*args, **kwargs)`
/usr/lib/python2.7/site-packages/oslo_messaging/_executors/impl_eventlet.py:93 in _executor_thread
`incoming = self.listener.poll()`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:121 in poll
`self.conn.consume(limit=1, timeout=timeout)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:755 in consume
`six.next(it)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:685 in iterconsume
`yield self.ensure(_error_callback, _consume)`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:602 in ensure
`return method()`
/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_qpid.py:669 in _consume
`timeout=poll_timeout)`
<string>:6 in next_receiver
(source not found)
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:674 in next_receiver
`if self._ecwait(lambda: self.incoming, timeout):`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:50 in _ecwait
`result = self._ewait(lambda: self.closed or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:580 in _ewait
`result = self.connection._ewait(lambda: self.error or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:218 in _ewait
`result = self._wait(lambda: self.error or predicate(), timeout)`
/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py:197 in _wait
`return self._waiter.wait(predicate, timeout=timeout)`
/usr/lib/python2.7/site-packages/qpid/concurrency.py:59 in wait
`self.condition.wait(timeout - passed)`
/usr/lib/python2.7/site-packages/qpid/concurrency.py:96 in wait
`sw.wait(timeout)`
/usr/lib/python2.7/site-packages/qpid/compat.py:53 in wait
`ready, _, _ = select([self], [], [], timeout)`
/usr/lib/python2.7/site-packages/eventlet/green/select.py:83 in select
`return hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/home/berrange/src/cloud/nova/nova/openstack/common/loopingcall.py:90 in _inner
`greenthread.sleep(-delay if delay < 0 else 0)`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:34 in sleep
`hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
------ Green Thread ------
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:214 in main
`result = function(*args, **kwargs)`
/home/berrange/src/cloud/nova/nova/openstack/common/loopingcall.py:133 in _inner
`greenthread.sleep(idle)`
/usr/lib/python2.7/site-packages/eventlet/greenthread.py:34 in sleep
`hub.switch()`
/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py:294 in switch
`return self.greenlet.switch()`
========================================================================
==== Processes ====
========================================================================
Process 27760 (under 27711) [ run by: berrange (1000), state: running ]
========================================================================
==== Configuration ====
========================================================================
cells:
bandwidth_update_interval = 600
call_timeout = 60
capabilities =
hypervisor=xenserver;kvm
os=linux;windows
cell_type = compute
enable = False
manager = nova.cells.manager.CellsManager
mute_child_interval = 300
name = nova
reserve_percent = 10.0
topic = cells
cinder:
cafile = None
catalog_info = volumev2:cinderv2:publicURL
certfile = None
cross_az_attach = True
endpoint_template = None
http_retries = 3
insecure = False
keyfile = None
os_region_name = None
timeout = None
conductor:
manager = nova.conductor.manager.ConductorManager
topic = conductor
use_local = False
workers = None
database:
backend = sqlalchemy
connection = ***
connection_debug = 0
connection_trace = False
db_inc_retry_interval = True
db_max_retries = 20
db_max_retry_interval = 10
db_retry_interval = 1
idle_timeout = 3600
max_overflow = None
max_pool_size = None
max_retries = 10
min_pool_size = 1
mysql_sql_mode = TRADITIONAL
pool_timeout = None
retry_interval = 10
slave_connection = ***
sqlite_db = nova.sqlite
sqlite_synchronous = True
use_db_reconnect = False
use_tpool = False
default:
allow_migrate_to_same_host = True
allow_resize_to_same_host = True
allow_same_net_traffic = True
amqp_auto_delete = False
amqp_durable_queues = False
api_paste_config = /etc/nova/api-paste.ini
api_rate_limit = False
auth_strategy = keystone
auto_assign_floating_ip = False
backdoor_port = None
bandwidth_poll_interval = 600
bindir = /usr/bin
block_device_allocate_retries = 60
block_device_allocate_retries_interval = 3
boot_script_template = /home/berrange/src/cloud/nova/nova/cloudpipe/bootscript.template
ca_file = cacert.pem
ca_path = /home/berrange/src/cloud/data/nova/CA
cert_manager = nova.cert.manager.CertManager
client_socket_timeout = 900
cnt_vpn_clients = 0
compute_available_monitors =
nova.compute.monitors.all_monitors
compute_driver = libvirt.LibvirtDriver
compute_manager = nova.compute.manager.ComputeManager
compute_monitors =
compute_resources =
vcpu
compute_stats_class = nova.compute.stats.Stats
compute_topic = compute
config-dir = None
config-file =
/etc/nova/nova.conf
config_drive_format = iso9660
config_drive_skip_versions = 1.0 2007-01-19 2007-03-01 2007-08-29 2007-10-10 2007-12-15 2008-02-01 2008-09-01
console_host = mustard.gsslab.fab.redhat.com
console_manager = nova.console.manager.ConsoleProxyManager
console_topic = console
consoleauth_manager = nova.consoleauth.manager.ConsoleAuthManager
consoleauth_topic = consoleauth
control_exchange = nova
create_unique_mac_address_attempts = 5
crl_file = crl.pem
db_driver = nova.db
debug = True
default_access_ip_network_name = None
default_availability_zone = nova
default_ephemeral_format = ext4
default_flavor = m1.small
default_floating_pool = public
default_log_levels =
amqp=WARN
amqplib=WARN
boto=WARN
glanceclient=WARN
iso8601=WARN
keystonemiddleware=WARN
oslo.messaging=INFO
qpid=WARN
requests.packages.urllib3.connectionpool=WARN
routes.middleware=WARN
sqlalchemy=WARN
stevedore=WARN
suds=INFO
urllib3.connectionpool=WARN
websocket=WARN
default_notification_level = INFO
default_publisher_id = None
default_schedule_zone = None
defer_iptables_apply = False
dhcp_domain = novalocal
dhcp_lease_time = 86400
dhcpbridge = /usr/bin/nova-dhcpbridge
dhcpbridge_flagfile =
/etc/nova/nova.conf
dmz_cidr =
dmz_mask = 255.255.255.0
dmz_net = 10.0.0.0
dns_server =
dns_update_periodic_interval = -1
dnsmasq_config_file =
ebtables_exec_attempts = 3
ebtables_retry_interval = 1.0
ec2_listen = 0.0.0.0
ec2_listen_port = 8773
ec2_private_dns_show_ip = False
ec2_strict_validation = True
ec2_timestamp_expiry = 300
ec2_workers = 2
enable_new_services = True
enabled_apis =
ec2
metadata
osapi_compute
enabled_ssl_apis =
fake_call = False
fake_network = False
fatal_deprecations = False
fatal_exception_format_errors = False
firewall_driver = nova.virt.libvirt.firewall.IptablesFirewallDriver
fixed_ip_disassociate_timeout = 600
fixed_range_v6 = fd00::/48
flat_injected = False
flat_interface = p1p1
flat_network_bridge = br100
flat_network_dns = 8.8.4.4
floating_ip_dns_manager = nova.network.noop_dns_driver.NoopDNSDriver
force_config_drive = always
force_dhcp_release = True
force_raw_images = True
force_snat_range =
forward_bridge_interface =
all
gateway = None
gateway_v6 = None
heal_instance_info_cache_interval = 60
host = mustard.gsslab.fab.redhat.com
image_cache_manager_interval = 2400
image_cache_subdirectory_name = _base
injected_network_template = /home/berrange/src/cloud/nova/nova/virt/interfaces.template
instance_build_timeout = 0
instance_delete_interval = 300
instance_dns_domain =
instance_dns_manager = nova.network.noop_dns_driver.NoopDNSDriver
instance_format = [instance: %(uuid)s]
instance_name_template = instance-%08x
instance_usage_audit = False
instance_usage_audit_period = month
instance_uuid_format = [instance: %(uuid)s]
instances_path = /home/berrange/src/cloud/data/nova/instances
internal_service_availability_zone = internal
iptables_bottom_regex =
iptables_drop_action = DROP
iptables_top_regex =
ipv6_backend = rfc2462
key_file = private/cakey.pem
keys_path = /home/berrange/src/cloud/data/nova/keys
keystone_ec2_insecure = False
keystone_ec2_url = http://10.33.8.112:5000/v2.0/ec2tokens
l3_lib = nova.network.l3.LinuxNetL3
linuxnet_interface_driver = nova.network.linux_net.LinuxBridgeInterfaceDriver
linuxnet_ovs_integration_bridge = br-int
live_migration_retry_count = 30
lockout_attempts = 5
lockout_minutes = 15
lockout_window = 15
log-config-append = None
log-date-format = %Y-%m-%d %H:%M:%S
log-dir = None
log-file = None
log-format = None
logging_context_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [%(request_id)s %(user_name)s %(project_name)s%(color)s] %(instance)s%(color)s%(message)s
logging_debug_format_suffix = from (pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d
logging_default_format_string = %(asctime)s.%(msecs)03d %(color)s%(levelname)s %(name)s [-%(color)s] %(instance)s%(color)s%(message)s
logging_exception_prefix = %(color)s%(asctime)s.%(msecs)03d TRACE %(name)s %(instance)s
max_age = 0
max_concurrent_builds = 10
max_header_line = 16384
max_local_block_devices = 3
maximum_instance_delete_attempts = 5
memcached_servers = None
metadata_host = 10.33.8.112
metadata_listen = 0.0.0.0
metadata_listen_port = 8775
metadata_manager = nova.api.manager.MetadataManager
metadata_port = 8775
metadata_workers = 2
migrate_max_retries = -1
mkisofs_cmd = genisoimage
monkey_patch = False
monkey_patch_modules =
nova.api.ec2.cloud:nova.notifications.notify_decorator
nova.compute.api:nova.notifications.notify_decorator
multi_host = True
multi_instance_display_name_template = %(name)s-%(count)d
my_block_storage_ip = 10.33.8.112
my_ip = 10.33.8.112
network_allocate_retries = 0
network_api_class = nova.network.api.API
network_device_mtu = None
network_driver = nova.network.linux_net
network_manager = nova.network.manager.FlatDHCPManager
network_size = 256
network_topic = network
networks_path = /home/berrange/src/cloud/data/nova/networks
non_inheritable_image_properties =
bittorrent
cache_in_nova
notification_driver =
notification_topics =
notifications
notify_api_faults = False
notify_on_state_change = None
novncproxy_base_url = http://10.33.8.112:6080/vnc_auto.html
null_kernel = nokernel
num_networks = 1
osapi_compute_listen = 0.0.0.0
osapi_compute_listen_port = 8774
osapi_compute_workers = 2
ovs_vsctl_timeout = 120
password_length = 12
pci_alias =
pci_passthrough_whitelist =
periodic_enable = True
periodic_fuzzy_delay = 60
policy_default_rule = default
policy_dirs =
policy.d
policy_file = policy.json
preallocate_images = none
project_cert_subject = /C=US/ST=California/O=OpenStack/OU=NovaDev/CN=project-ca-%.16s-%s
public_interface = br100
publish_errors = False
pybasedir = /home/berrange/src/cloud/nova
qpid_heartbeat = 60
qpid_hostname = 10.33.8.112
qpid_hosts =
10.33.8.112:5672
qpid_password = ***
qpid_port = 5672
qpid_protocol = tcp
qpid_receiver_capacity = 1
qpid_sasl_mechanisms =
qpid_tcp_nodelay = True
qpid_topology_version = 1
qpid_username =
quota_cores = 20
quota_driver = nova.quota.DbQuotaDriver
quota_fixed_ips = -1
quota_floating_ips = 10
quota_injected_file_content_bytes = 10240
quota_injected_file_path_length = 255
quota_injected_files = 5
quota_instances = 10
quota_key_pairs = 100
quota_metadata_items = 128
quota_ram = 51200
quota_security_group_rules = 20
quota_security_groups = 10
quota_server_group_members = 10
quota_server_groups = 10
reboot_timeout = 0
reclaim_instance_interval = 0
remove_unused_base_images = True
remove_unused_original_minimum_age_seconds = 86400
report_interval = 10
rescue_timeout = 0
reservation_expire = 86400
reserved_host_disk_mb = 0
reserved_host_memory_mb = 512
resize_confirm_window = 0
resize_fs_using_block_device = False
resume_guests_state_on_host_boot = False
rootwrap_config = /etc/nova/rootwrap.conf
routing_source_ip = 10.33.8.112
rpc_backend = qpid
rpc_conn_pool_size = 30
rpc_response_timeout = 60
rpc_thread_pool_size = 64
run_external_periodic_tasks = True
running_deleted_instance_action = reap
running_deleted_instance_poll_interval = 1800
running_deleted_instance_timeout = 0
scheduler_available_filters =
nova.scheduler.filters.all_filters
scheduler_default_filters =
AvailabilityZoneFilter
ComputeCapabilitiesFilter
ComputeFilter
ImagePropertiesFilter
RamFilter
RetryFilter
ServerGroupAffinityFilter
ServerGroupAntiAffinityFilter
scheduler_manager = nova.scheduler.manager.SchedulerManager
scheduler_max_attempts = 3
scheduler_topic = scheduler
scheduler_weight_classes =
nova.scheduler.weights.all_weighers
security_group_api = nova
send_arp_for_ha = True
send_arp_for_ha_count = 3
service_down_time = 60
servicegroup_driver = db
share_dhcp_address = False
shelved_offload_time = 0
shelved_poll_interval = 3600
shutdown_timeout = 60
snapshot_name_template = snapshot-%s
ssl_ca_file = None
ssl_cert_file = None
ssl_key_file = None
state_path = /home/berrange/src/cloud/data/nova
sync_power_state_interval = 600
syslog-log-facility = LOG_USER
tcp_keepidle = 600
teardown_unused_network_gateway = False
tempdir = None
transport_url = None
until_refresh = 0
update_dns_entries = False
use-syslog = False
use-syslog-rfc-format = False
use_cow_images = True
use_forwarded_for = False
use_ipv6 = False
use_network_dns_servers = False
use_project_ca = False
use_single_default_gateway = False
use_stderr = True
user_cert_subject = /C=US/ST=California/O=OpenStack/OU=NovaDev/CN=%.16s-%.16s-%s
vcpu_pin_set = None
vendordata_driver = nova.api.metadata.vendordata_json.JsonFileVendorData
verbose = True
vif_plugging_is_fatal = True
vif_plugging_timeout = 300
virt_mkfs =
vlan_interface =
vlan_start = 100
vnc_enabled = True
vnc_keymap = en-us
vncserver_listen = 127.0.0.1
vncserver_proxyclient_address = 127.0.0.1
volume_api_class = nova.volume.cinder.API
volume_usage_poll_interval = 0
vpn_flavor = m1.tiny
vpn_image_id = 0
vpn_ip = 10.33.8.112
vpn_key_suffix = -vpn
vpn_start = 1000
wsgi_default_pool_size = 1000
wsgi_keep_alive = True
wsgi_log_format = %(client_ip)s "%(request_line)s" status: %(status_code)s len: %(body_length)s time: %(wall_seconds).7f
xvpvncproxy_base_url = http://10.33.8.112:6081/console
ephemeral_storage_encryption:
cipher = aes-xts-plain64
enabled = False
key_size = 512
glance:
allowed_direct_url_schemes =
api_insecure = False
api_servers =
http://10.33.8.112:9292
host = 10.33.8.112
num_retries = 0
port = 9292
protocol = http
guestfs:
debug = False
keymgr:
api_class = nova.keymgr.conf_key_mgr.ConfKeyManager
libvirt:
block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED, VIR_MIGRATE_NON_SHARED_INC
checksum_base_images = False
checksum_interval_seconds = 3600
connection_uri =
cpu_mode = none
cpu_model = None
disk_cachemodes =
disk_prefix = None
gid_maps =
glusterfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
hw_disk_discard = None
hw_machine_type = None
image_info_filename_pattern = /home/berrange/src/cloud/data/nova/instances/_base/%(image)s.info
images_rbd_ceph_conf =
images_rbd_pool = rbd
images_type = default
images_volume_group = None
inject_key = False
inject_partition = -2
inject_password = False
iscsi_iface = None
iscsi_use_multipath = False
iser_use_multipath = False
live_migration_bandwidth = 0
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED
live_migration_uri = qemu+ssh://berrange@%s/system
mem_stats_period_seconds = 10
nfs_mount_options = None
nfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
num_aoe_discover_tries = 3
num_iscsi_scan_tries = 5
num_iser_scan_tries = 5
qemu_allowed_storage_drivers =
quobyte_client_cfg = None
quobyte_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
rbd_secret_uuid = None
rbd_user = None
remove_unused_kernels = False
remove_unused_resized_minimum_age_seconds = 3600
rescue_image_id = None
rescue_kernel_id = None
rescue_ramdisk_id = None
rng_dev_path = None
scality_sofs_config = None
scality_sofs_mount_point = /home/berrange/src/cloud/data/nova/scality
smbfs_mount_options =
smbfs_mount_point_base = /home/berrange/src/cloud/data/nova/mnt
snapshot_compression = False
snapshot_image_format = None
snapshots_directory = /home/berrange/src/cloud/data/nova/instances/snapshots
sparse_logical_volumes = False
sysinfo_serial = auto
uid_maps =
use_usb_tablet = False
use_virtio_for_bridges = True
virt_type = kvm
volume_clear = zero
volume_clear_size = 0
volume_drivers =
aoe=nova.virt.libvirt.volume.LibvirtAOEVolumeDriver
fake=nova.virt.libvirt.volume.LibvirtFakeVolumeDriver
fibre_channel=nova.virt.libvirt.volume.LibvirtFibreChannelVolumeDriver
glusterfs=nova.virt.libvirt.volume.LibvirtGlusterfsVolumeDriver
gpfs=nova.virt.libvirt.volume.LibvirtGPFSVolumeDriver
iscsi=nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver
iser=nova.virt.libvirt.volume.LibvirtISERVolumeDriver
local=nova.virt.libvirt.volume.LibvirtVolumeDriver
nfs=nova.virt.libvirt.volume.LibvirtNFSVolumeDriver
quobyte=nova.virt.libvirt.volume.LibvirtQuobyteVolumeDriver
rbd=nova.virt.libvirt.volume.LibvirtNetVolumeDriver
scality=nova.virt.libvirt.volume.LibvirtScalityVolumeDriver
sheepdog=nova.virt.libvirt.volume.LibvirtNetVolumeDriver
smbfs=nova.virt.libvirt.volume.LibvirtSMBFSVolumeDriver
wait_soft_reboot_seconds = 120
xen_hvmloader_path = /usr/lib/xen/boot/hvmloader
osapi_v3:
enabled = True
extensions_blacklist =
extensions_whitelist =
oslo_concurrency:
disable_process_locking = False
lock_path = /home/berrange/src/cloud/data/nova
rdp:
enabled = False
html5_proxy_base_url = http://127.0.0.1:6083/
remote_debug:
host = None
port = None
serial_console:
base_url = ws://127.0.0.1:6083/
enabled = False
listen = 127.0.0.1
port_range = 10000:20000
proxyclient_address = 127.0.0.1
spice:
agent_enabled = True
enabled = False
html5proxy_base_url = http://10.33.8.112:6082/spice_auto.html
keymap = en-us
server_listen = 127.0.0.1
server_proxyclient_address = 127.0.0.1
ssl:
ca_file = None
cert_file = None
key_file = None
upgrade_levels:
baseapi = None
cells = None
compute = None
conductor = None
console = None
consoleauth = None
network = None
scheduler = None
workarounds:
disable_libvirt_livesnapshot = True
disable_rootwrap = False
When first getting involved in the OpenStack project as a developer, most people will probably recommend use of DevStack. When I first started hacking, I skipped this because it wasn’t reliable on Fedora at that time, but these days it works just fine and there are even basic instructions for DevStack on Fedora. Last week I decided to finally give DevStack a go, since my hand-crafted dev environment was getting kind of nasty. The front page on the DevStack website says it is only supported on Fedora 16, but don’t let that put you off; aside from one bug which does not appear distro specific, it all seemed to work correctly. What follows is an overview of what I did / learnt
Setting up the virtual machine
I don’t like really like letting scripts like DevStack mess around with my primary development environment, particularly when there is little-to-no-documentation about what changes they will be making and they ask for unrestricted sudo (sigh) privileges ! Thus running DevStack inside a virtual machine was the obvious way to go. Yes, this means actual VMs run by Nova will be forced to use plain QEMU emulation (or nested KVM if you are brave), but for dev purposes this is fine, since the VMs don’t need todo anything except boot. My host is Fedora 17, and for simplicity I decided that my guest dev environment will also be Fedora 17. With that decided installing the guest was a simple matter of running virt-install on the host as root
# virt-install --name f17x86_64 --ram 2000 --file /var/lib/libvirt/images/f17x86_64.img --file-size 20 --accelerate --location http://mirror2.hs-esslingen.de/fedora/linux//releases/17/Fedora/x86_64/os/ --os-variant fedora17
I picked the defaults for all installer options, except for reducing the swap file size down to a more sensible 500 MB (rather than the 4 G it suggested). NB if copying this, you probably want to change the URL used to point to your own best mirror location.
Once installation completed, run through the firstboot wizard, creating yourself an unprivileged user account, then login as root. First add the user to the wheel group, to enable it to run sudo commands:
# gpasswd -a YOURUSERNAME wheel
The last step before getting onto DevStack is to install GIT
# yum -y install git
Setting up DevStack
The recommended way to use DevStack, is to simply check it out of GIT and run the latest code available. I like to keep all my source code checkouts in one place, so I’m using $HOME/src/openstack
for this project
$ mkdir -p $HOME/src/openstack
$ cd $HOME/src/openstack
$ git clone git://github.com/openstack-dev/devstack.git
Arguably you can now just kick off the stack.sh
script at this point, but there are some modifications that are a good idea to do. This involves creating a “localrc” file in the top level directory of the DevStack checkout
$ cd devstack
$ cat > localrc <<EOF
# Stop DevStack polluting /opt/stack
DESTDIR=$HOME/src/openstack
# Switch to use QPid instead of RabbitMQ
disable_service rabbit
enable_service qpid
# Replace with your primary interface name
HOST_IP_IFACE=eth0
PUBLIC_INTERFACE=eth0
VLAN_INTERFACE=eth0
FLAT_INTERFACE=eth0
# Replace with whatever password you wish to use
MYSQL_PASSWORD=badpassword
SERVICE_TOKEN=badpassword
SERVICE_PASSWORD=badpassword
ADMIN_PASSWORD=badpassword
# Pre-populate glance with a minimal image and a Fedora 17 image
IMAGE_URLS="http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz,http://berrange.fedorapeople.org/images/2012-11-15/f17-x86_64-openstack-sda.qcow2"
EOF
With the localrc
created, now just kick off the stack.sh script
$ ./stack.sh
At time of writing there is a bug in DevStack which will cause it to fail to complete correctly – it is checking for existence paths before it has created them. Fortunately, just running it for a second time is a simple workaround
$ ./unstack.sh
$ ./stack.sh
From a completely fresh Fedora 17 desktop install, stack.sh will take a while to complete, as it installs a large number of pre-requisite RPMs and downloads the appliance images. Once it has finished it should tell you what URL the Horizon web interface is running on. Point your browser to it and login as “admin
” with the password provided in your localrc
file earlier.
Because we told DevStack to use $HOME/src/openstack as the base directory, a small permissions tweak is needed to allow QEMU to access disk images that will be created during testing.
$ chmod o+rx $HOME
Note, that SELinux can be left ENFORCING, as it will just “do the right thing” with the VM disk image labelling.
UPDATE: if you want to use the Horizon web interface, then you do in fact need to set SELinux to permissive mode, since Apache won’t be allowed to access your GIT checkout where the Horizon files live.
$ sudo su -
# setenforce 0
# vi /etc/sysconfig/selinux
...change to permissive...
UPDATE:If you want to use Horizon, you must also manually install Node.js from a 3rd party repositoryh, because it is not yet included in Fedora package repositories:
# yum localinstall --nogpgcheck http://nodejs.tchol.org/repocfg/fedora/nodejs-stable-release.noarch.rpm
# yum -y install nodejs nodejs-compat-symlinks
# systemctl restart httpd.service
Testing DevStack
Before going any further, it is a good idea to make sure that things are operating somewhat normally. DevStack has created an file containing the environment variables required to communicate with OpenStack, so load that first
$ . openrc
Now check what images are available in glance. If you used the IMAGE_URLS
example above, glance will have been pre-populated
$ glance image-list
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| ID | Name | Disk Format | Container Format | Size | Status |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| 32b06aae-2dc7-40e9-b42b-551f08e0b3f9 | cirros-0.3.0-x86_64-uec-kernel | aki | aki | 4731440 | active |
| 61942b99-f31c-4155-bd6c-d51971d141d3 | f17-x86_64-openstack-sda | qcow2 | bare | 251985920 | active |
| 9fea8b4c-164b-4f54-8e74-b53966e858a6 | cirros-0.3.0-x86_64-uec-ramdisk | ari | ari | 2254249 | active |
| ec3e9b72-0970-44f2-b442-58d0042448f7 | cirros-0.3.0-x86_64-uec | ami | ami | 25165824 | active |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
Incidentally the glance sort ordering is less than helpful here – it appears to be sorting based on the UUID strings rather than the image names :-(
Before booting a instance, Nova likes to be given an SSH public key, which it will inject into the guest filesystem to allow admin login
$ nova keypair-add --pub-key $HOME/.ssh/id_rsa.pub mykey
Finally an image can be booted
$ nova boot --key-name mykey --image f17-x86_64-openstack-sda --flavor m1.tiny f17demo1
+------------------------+--------------------------------------+
| Property | Value |
+------------------------+--------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | NsddfbJtR6yy |
| config_drive | |
| created | 2012-11-19T15:00:51Z |
| flavor | m1.tiny |
| hostId | |
| id | 6ee509f9-b612-492b-b55b-a36146e6833e |
| image | f17-x86_64-openstack-sda |
| key_name | mykey |
| metadata | {} |
| name | f17demo1 |
| progress | 0 |
| security_groups | [{u'name': u'default'}] |
| status | BUILD |
| tenant_id | dd3d27564c6043ef87a31404aeb01ac5 |
| updated | 2012-11-19T15:00:55Z |
| user_id | 72ae640f50434d07abe7bb6a8e3aba4e |
+------------------------+--------------------------------------+
Since we’re running QEMU inside a KVM guest, booting the image will take a little while – several minutes or more. Just keep running the ‘nova list’ command to keep an eye on it, until it shows up as ACTIVE
$ nova list
+--------------------------------------+----------+--------+------------------+
| ID | Name | Status | Networks |
+--------------------------------------+----------+--------+------------------+
| 6ee509f9-b612-492b-b55b-a36146e6833e | f17demo1 | ACTIVE | private=10.0.0.2 |
+--------------------------------------+----------+--------+------------------+
Just to prove that it really is working, login to the instance with SSH
$ ssh ec2-user@10.0.0.2
The authenticity of host '10.0.0.2 (10.0.0.2)' can't be established.
RSA key fingerprint is 9a:73:e5:1a:39:e2:f7:a5:10:a7:dd:bc:db:6e:87:f5.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.2' (RSA) to the list of known hosts.
[ec2-user@f17demo1 ~]$ sudo su -
[root@f17demo1 ~]#
Working with DevStack
The DevStack setup runs all the python services under a screen session. To stop/start individual services, attach to the screen session with the ‘rejoin-stack.sh’ script. Each service is running under a separate screen “window”. Switch to the window containing the service to be restarted, and just Ctrl-C it and then use bash history to run the same command again.
$ ./rejoin-stack.sh
Sometimes the entire process set will need to be restarted. In this case, just kill the screen session entirely, which causes all the OpenStack services to go away. Then the same ‘rejoin-stack.sh’ script can be used to start them all again.
One annoyance is that unless you have the screen session open, the debug messages from Nova don’t appear to end up anywhere useful. I’ve taken to editing the file “stack-screen” to make each service log to a local file in its checkout. eg I changed
stuff "cd /home/berrange/src/openstack/nova && sg libvirtd /home/berrange/src/openstack/nova/bin/nova-compute"
to
stuff "cd /home/berrange/src/openstack/nova && sg libvirtd /home/berrange/src/openstack/nova/bin/nova-compute 2>&1 | tee nova-compute.log"
The Folsom release of OpenStack has been out for a few weeks now, and I had intended to write this post much earlier, but other things (moving house, getting married & travelling to LinuxCon Europe / KVM Forum all in the space of a month) got in the way. There are highlighted release notes, but I wanted to give a little more detail on some of the changes I was involved with making to the libvirt driver and what motivated them.
XML configuration
First off was a change in the way Nova generates libvirt XML configurations. Previously the libvirt driver in Nova used the Cheetah templating system to generate its XML configurations. The problem with this is that there was alot of information that needed to be passed into the template as parameters, so Nova was inventing a adhoc configuration format for libvirt guest config internally which then was further translated into proper guest config in the template. The resulting code was hard to maintain and understand, because the logic for constructing the XML was effectively spread across both the template file and the libvirt driver code with no consistent structure. Thus the first big change that went into the libvirt driver during Folsom was to introduce a formal set of python configuration objects to represent the libvirt XML config. The libvirt driver code now directly populates these config objects with the required data, and then simply serializes the objects to XML. The use of Cheetah has been completely eliminated, and the code structure is clarified significantly as a result. There is a wiki page describing this in a little more detail.
CPU model configuration
The primary downside from the removal of the Cheetah templating, is that it is no longer possible for admins deploying Nova to make adhoc changes to the libvirt guest XML that is used. Personally I’d actually argue that this is a good thing, because the ability to make adhoc changes meant that there was less motivation for directly addressing the missing features in Nova, but I know plenty of people would disagree with this view :-) It was quickly apparent that the one change a great many people were making to the libvirt XML config was to specify a guest CPU model. If no explicit CPU model is requested in the guest config, KVM will start with a generic, lowest common denominator model that will typically work everywhere. As can be expected, this generic CPU model is not going to offer optimal performance for the guests. For example, if your host has shiny new CPUs with builtin AES encryption instructions, the guest is not going to be able to take advantage of them. Thus the second big change in the Nova libvirt driver was to introduce explicit support for configuration the CPU model. This involves two new Nova config parameters, libvirt_cpu_mode which chooses between “host-model”, “host-passthrough” and “custom”. If mode is set to “custom”, then the libvirt_cpu_model parameter is used to specify the name of the custom CPU that is required. Again there is a wiki page describing this in a little more details.
Once the ability to choose CPU models was merged, it was decided that the default behaviour should also be changed. Thus if Nova is configured to use KVM as its hypervisor, then it will use the “host-model” CPU mode by default. This causes the guest CPU model to be a (almost) exact copy of the host CPU model, offering maximum out of the box performance. There turned out to be one small wrinkle in this choice when using nested KVM though. Due to a combination of problems in libvirt and KVM, use of “host-model” fails for nested KVM. Thus anyone using nested KVM needs to set libvirt_cpu_model=”none” as a workaround for now. If you’re using KVM on bare metal everything should be fine, which is of course the normal scenario for production deployments.
Time keeping policy
Again on the performance theme, the libvirt Nova driver was updated to set time time keeping policies for KVM guests. Virtual machines on x86 have a number of timers available including the PIT, RTC, PM-Timer, HPET. Reliable timers are one of the hardest problems to solve in full machine virtualization platforms, and KVM is no exception. If all comes down to the question of what to do when the hypervisor cannot inject a timer interrupt at the correct time, because a different guest is running. There are a number of policies available, inject the missed tick as soon as possible, merged all missed ticks into 1 and deliver it as soon as possible, temporarily inject missed ticks at a higher rate than normal to “catch up”, or simply discard the missed tick entirely. It turns out that Windows 7 is particularly sensitive to timers and the default KVM policies for missing ticks were causing frequent crashes, while older Linux guests would often experience severe time drift. Research validated by the oVirt project team has previously identified an optimal set of policies that should keep the majority of guests happy. Thus the libvirt Nova driver was updated to set explicit policies for time keeping with the PIT and RTC timers when using KVM, which should make everything time related much more reliable.
Libvirt authentication
The libvirtd daemon can be configured with a number of different authentication schemes. Out of the box it will use PolicyKit to authenticate clients, and thus Nova packages on Fedora / RHEL / EPEL include a policykit configuration file which grants Nova the ability to connect to libvirt. Administrators may, however, decide to use a different configuration scheme, for example, SASL. If the scheme chosen required a username+password, there was no way for Nova’s libvirt driver to provide these authentication credentials. Fortunately the libvirt client has the ability to lookup credentials in a local file. Unfortunately the way Nova connected to libvirt prevented this from working. Thus the way the Nova libvirt driver used openAuth() was fixed to allow the default credential lookup logic to work. It is now possible to require authentication between Nova and libvirt thus:
# augtool -s set /files/etc/libvirt/libvirtd.conf/auth_unix_rw sasl
Saved 1 file(s)
# saslpasswd -a libvirt nova
Password: XYZ
Again (for verification): XYZ
# su – nova -s /bin/sh
$ mkdir -p $HOME/.config/libvirt
$ cat > $HOME/.config/libvirt/auth.conf <<EOF
[credentials-nova]
authname=nova
password=XYZ
[auth-libvirt-localhost]
credentials=nova
EOF
Other changes
Obviously I was not the only person working on the libvirt driver in Folsom, many others contributed work too. Leander Beernaert provided an implementation of the ‘nova diagnostics’ command that works with the libvirt driver, showing the virtual machine cpu, memory, disk and network interface utilization statistics. Pádraig Brady improved the performance of migration, by sending the qcow2 image between hosts directly, instead of converting it to raw file, sending that, and then converting it back to qcow2. Instead of transferring 10 G of raw data, it can now send just the data actually used which may be as little as a few 100 MB. In his test case, this reduced the time to migrate from 7 minutes to 30 seconds, which I’m sure everyone will like to hear :-) Pádraig also optimized the file injection code so that it only mounts the guest image once to inject all data, instead of mounting it separately for each injected item. Boris Filippov contributed support for storing VM disk images in LVM volumes, instead of qcow2 files, while Ben Swartzlander contributed support for using NFS files as the backing for virtual block volumes. Vish updated the way libvirt generates XML configuration for disks, to include the “serial” property against each disk, based on the nova volume ID. This allows the guest OS admin to reliably identify the disk in the guest, using the /dev/disk/by-id/virtio-<volume id> paths, since the /dev/vdXXX device numbers are pretty randomly assigned by the kernel.
Not directly part of the libvirt driver, but Jim Fehlig enhanced the Nova VM schedular so that it can take account of the hypervisor, architecture and VM mode (paravirt vs HVM) when choosing what host to boot an image on. This makes it much more practical to run mixed environments of say, Xen and KVM, or Xen fullvirt vs Xen paravirt, or Arm vs x86, etc. When uploading an image to glance, the admin can tag it with properties specifying the desired hypervisor/architecture/vm_mode. The compute drivers then report what combinations they can support, and the scheduler computes the intersection to figure out which hosts are valid candidates for running the image.
When launching a virtual machine, Nova has the ability to inject various files into the disk image immediately prior to boot up. This is used to perform the following setup operations:
- Add an authorized SSH key for the root account
- Configure init to reset SELinux labelling on /root/.ssh
- Set the login password for the root account
- Copy data into a number of user specified files
- Create the meta.js file
- Configure network interfaces in the guest
This file injection is handled by the code in the nova.virt.disk.api module. The code which does the actual injection is designed around the assumption that the filesystem in the guest image can be mapped into a location in the host filesystem. There are a number of ways this can be done, so Nova has a pluggable API for mounting guest images in the host, defined by the nova.virt.disk.mount module, with the following implementations:
- Loop – Use losetup to create a loop device. Then use kpartx to map the partitions within the device, and finally mount the designated partition. Alternatively on new enough kernels the loop device’s builtin partition support is used instead of kpartx.
- NBD – Use qemu-nbd to run a NBD server and attach with the kernel NBD client to expose a device. Then mapping partitions is handled as per Loop module
- GuestFS – Use libguestfs to inspect the image and setup a FUSE mount for all partitions or logical volumes inside the image.
The Loop module can only handle Raw format files, while the NBD module can handle any format that QEMU supports. While they have the ability to access partitions, the code handling this is very dumb. It requires the Nova global ‘libvirt_inject_partition’ config parameter to specify which partition number to inject. The result is that every image you upload to glance must be partitioned in exactly the same way. Much better would be if it used a metadata parameter associated with the image. The GuestFS module is much more advanced and inspects the guest OS to figure out arbitrarily partitioned images and even LVM based images.
Nova has a “img_handlers” configuration parameter which defines the order in which the 3 mount modules above are to be tried. It tries to mount the image with each one in turn, until one suceeds. This is quite crude code really – it has already been hacked to avoid trying the Loop module if Nova knows it is using QCow2. It has to be changed by the Nova admin if they’re using LXC, otherwise you can end up using KVM with LXC guests which is probably not what you want. The try-and-fallback paradigm also has the undesirable behaviour of masking errors that you would really rather consider fatal to the boot process.
As mentioned earlier, the file injection code uses the mount modules to map the guest image filesystem into a temporary directory in the host (such as /tmp/openstack-XXXXXX). It then runs various commands like chmod, chown, mkdir, tee, etc to manipulate files in the guest image. Of course Nova runs as an unprivileged user, and the guest files to be changed are typically owned as root. This means all the file injection commands need to run via Nova’s rootwrap utility to gain root privileges. Needless to say, this has the undesirable consequence that the code injecting files into a guest image in fact has privileges that allow it to write to arbitrary areas of the host filesystem. One mistake in handling symlinks and you have the potential for a carefully crafted guest image to result in compromise of the host OS. It should come as little surprise that this has already resulted in a security vulnerability / CVE against Nova.
The solution to this class of security problems is to decouple the file injection code from the host filesystem. This can be done by introducing a “VFS” (Virtual File System) interface which defines a formal API for the various logical operations that need to be performed on a guest filesystem. With that it is possible to provide an implementation that uses the libguestfs native python API, rather than FUSE mounts. As well as being inherently more secure, avoiding the FUSE layer will improve performance, and allow Nova to utilize libguestfs APIs that don’t map into FUSE, such as its Augeas support for parsing config files. Nova still needs to work in scenarios where libguestfs is not available though, so a second implementation of the VFS APIs will be required based on the existing Loop/Nbd device mount approach. The security of the non-libguestfs support has not changed with this refactoring work, but de-coupling the file injection code from the host filesystem does make it easier to write unit tests for this code. The file injection code can be tested by mocking out the VFS layer, while the VFS implementations can be tested by mocking out the libguestfs or command execution APIs.
Incidentally if you’re wondering why Libguestfs does all its work inside a KVM appliance, its man page describes the security issues this approach protects against vs just directly mounting guest images on the host