Getting started with LXC using libvirt
For quite a while now, libvirt has had an LXC driver that uses Linux’s namespace + cgroups features to provide container based virtualization. Before continuing I should point out that the libvirt LXC driver does not have any direct need for the userspace tools from the LXC sf.net project, since it directly leverages APIs the Linux kernel exposes to userspace. There are in fact many other potential users of the kernel’s namespace APIs which have their own userspace, such as OpenVZ, Linux-VServer, Parallels. This blog post will just concern itself solely with the native libvirt LXC support.
Connecting to the LXC driver
At this point in time, there is only one URI available for connecting to the libvirt LXC driver, lxc:///
, which gets you a privileged connection. There is not yet any support for unprivileged libvirtd instances using containers, due to restrictions of the kernel’s DAC security models. I’m hoping this may be refined in the future.
If you’re familiar with using libvirt in combination with KVM, then it is likely you are just relying on libvirt picking the right URI by default. Well each host can only have one default URI for libvirt, and KVM will usually take precedence over LXC. You can discover what libvirt has decided the default URI:
# virsh uri qemu:///system
So when using tools like virsh you’ll need to specify the LXC URI somehow. The first way is to use the ‘-c URI’ or ‘–connect URI’ arguments that most libvirt based applications have:
# virsh -c lxc:/// uri lxc:///
The second option is to explicitly override the default libvirt URI for your session using the LIBVIRT_DEFAULT_URI environment variable.
# export LIBVIRT_DEFAULT_URI=lxc:/// # virsh uri lxc:///
For the sake of brevity, all the examples that follow will presume that export LIBVIRT_DEFAULT_URI=lxc:///
has been set.
A simple “Hello World” LXC container
The Hello World equivalent for LXC is probably a container which just runs /bin/sh with the main root filesystem / network interfaces all still being visible. What you’re gaining here is not security, but a rather way to manage resource utilization of everything spawned from that initial process. The libvirt LXC driver currently does most of its resource controls using cgroups, but will also leverage the network traffic shaper directly for network controls which you want to be done per virtual network interface, not per cgroup.
Anyone familiar with libvirt will know that to create a new guest, requires an XML document specifying its configuration. Machine based virtualization requires either a kernel/initrd or a virtual BIOS to boot, and can create a fullyvirutalized (hvm) or paravirtualized machine (xen). Container virtualization by contrast, just wants to know the path to the binary to spawn as the container’s “init” (aka process with PID 1). The virtualization type for containers is thus referred to in libvirt as “exe”. Aside from the virtualization type & path of the initial process, the only other required XML parameters are the guest name, initial memory limit and a text console device. Putting this together, creating the “Hello World” container will require an XML configuration that looks like this:
# cat > helloworld.xml <<EOF <domain type='lxc'> <name>helloworld</name> <memory>102400</memory> <os> <type>exe</type> <init>/bin/sh</init> </os> <devices> <console type='pty'/> </devices> </domain> EOF
This configuration can be imported into libvirt in the normal manner
# virsh define helloworld.xml Domain helloworld defined from helloworld.xml
then started
# virsh start helloworld Domain helloworld started # virsh list Id Name State ---------------------------------- 31417 helloworld running
The ID values assigned by the libvirt LXC driver are in the process ID of the libvirt_lxc helper process libvirt launches. This helper is what actually creates the container, spawning the initial process, after which it just sits around handling console I/O. Speaking of the console, this can now be accessed with virsh
# virsh console helloworld Connected to domain helloworld Escape character is ^] sh-4.2#
That ‘sh’ prompt is the shell process inside the container. All the container processes are visible outside the container as regular proceses
# ps -axuwf ... root 31417 0.0 0.0 42868 1252 ? Ss 16:17 0:00 /usr/libexec/libvirt_lxc --name helloworld --console 27 --handshake 30 --background root 31418 0.0 0.0 13716 1692 pts/39 Ss+ 16:17 0:00 \_ /bin/sh ...
Inside the container, PID numbers are distinct, starting again from ‘1’.
# virsh console helloworld Connected to domain helloworld Escape character is ^] sh-4.2# ps -axuwf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 13716 1692 pts/39 Ss 16:17 0:00 /bin/sh
The container will shutdown when the ‘init’ process exits, so in this example when ‘exit’ is run in the container’s bash shell. Alternatively issue the usual ‘virsh destroy’ to kill it off.
# virsh destroy helloworld Domain helloworld destroyed
Finally remove its configuration
# virsh undefine helloworld Domain helloworld has been undefined
Adding custom mounts to the “Hello World” container
The “Hello World” container shared the same root filesystem as the primary (host) OS. What if the application inside the container requires custom data in certain locations. For example, using containers to sandbox apache servers, might require a custom /etc/httpd and /var/www. This can easily be achieved by specifying one or more filesystem devices in the initial XML configuration. Lets create some custom locations to pass to the “Hello World” container.
# mkdir /export/helloworld/config # touch /export/helloworld/config/hello.txt # mkdir /export/helloworld/data # touch /export/helloworld/data/world.txt
Now edit the helloworld.xml file and add in
<filesystem type='mount'> <source dir='/export/helloworld/config'/> <target dir='/etc/httpd'/> </filesystem> <filesystem type='mount'> <source dir='/export/helloworld/data'/> <target dir='/var/www'/> </filesystem>
Now after defining and starting the container again, it should see the custom mounts
# virsh define helloworld.xml Domain helloworld defined from helloworld.xml # virsh start helloworld Domain helloworld started # virsh console helloworld Connected to domain helloworld Escape character is ^] sh-4.2# ls /etc/httpd/ hello.txt sh-4.2# ls /var/www/ world.txt sh-4.2# exit # virsh undefine helloworld Domain helloworld has been undefined
A private root filesystem with busybox
So far the container has shared the root filesystem with the host OS. This may be OK if the application running in the container is going to an unprivileged user ID and you are careful not to mess up your host OS. If you want todo things like running DHCP inside the container, or have things running as root, then you almost certainly want a private root filesystem in the container. In this example, we’ll use the busybox tools to setup the simplest possible private root for “Hello World”. First create a new directory and copy the busybox binary into position
mkdir /export/helloworld cd /export/helloworld mkdir -p bin var/www etc/httpd cd bin cp /sbin/busybox busybox cd /root
Next step is to setup symlinks for all the busybox commands you intend to use. For example
for i in ls cat rm find ps echo date kill sleep \ true false test pwd sh which grep head wget do ln -s busybox /root/helloworld/bin/$i done
Now all that is required, is to add another filesystem device to the XML configuration
<filesystem type='mount'> <source dir='/export/helloworld/root'/> <target dir='/'/> </filesystem>
With that added to the XML, follow the same steps to define and start the guest again
# virsh define helloworld.xml Domain helloworld defined from helloworld.xml # virsh start helloworld Domain helloworld started
Now when accessing the guest console a completely new filesystem should be visible
# virsh console helloworld Connected to domain helloworld Escape character is ^] # ls bin dev etc proc selinux sys var # ls bin/ busybox echo grep ls rm test which cat false head ps sh true date find kill pwd sleep wget # cat /proc/mounts rootfs / rootfs rw 0 0 devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0 /dev/mapper/vg_t500wlan-lv_root / ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0 devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0 devfs /dev tmpfs rw,seclabel,nosuid,relatime,mode=755 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 proc /proc/sys proc ro,relatime 0 0 /sys /sys sysfs ro,seclabel,relatime 0 0 selinuxfs /selinux selinuxfs ro,relatime 0 0 devpts /dev/ptmx devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=666 0 0 /dev/mapper/vg_t500wlan-lv_root /etc/httpd ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0 /dev/mapper/vg_t500wlan-lv_root /var/www ext4 rw,seclabel,relatime,user_xattr,barrier=1,data=ordered 0 0
Custom networking in the container
The examples thus far have all just inherited access to the host network interfaces. This may or may not be desirable. It is of course possible to configure private networking for the container. Conceptually this works in much the same way as with KVM. Currently it is possible to choose between libvirt’s bridge, network or direct networking modes, giving ethernet bridging, NAT/routing, or VEPA respectively. When configuring private networking, the host OS will get a ‘vethNNN’ device for each container NIC, and the container will see their own ‘ethNNN’ and ‘lo’ devices. The XML configuration additions are just the same as what’s required for KVM, for example
<interface type='network'> <mac address='52:54:00:4d:2b:cd'/> <source network='default'/> </interface>
Define and start the container as before, then compare the network interfaces in the container to what is in the host
# virsh console helloworld Connected to domain helloworld Escape character is ^] # ifconfig eth0 Link encap:Ethernet HWaddr 52:54:00:16:61:DA inet6 addr: fe80::5054:ff:fe16:61da/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:93 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5076 (4.9 KiB) TX bytes:468 (468.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface
We have a choice of configuring the guest eth0 manually, or just launching a DHCP client. To do manual configuration try
# virsh console helloworld Connected to domain helloworld Escape character is ^] # ifconfig eth0 192.168.122.50 # route add 0.0.0.0 gw 192.168.122.1 eth0 # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 192.168.122.1 255.255.255.255 UGH 0 0 0 eth0 192.168.122.0 * 255.255.255.0 U 0 0 0 eth0 # ping 192.168.122.1 PING 192.168.122.1 (192.168.122.1): 56 data bytes 64 bytes from 192.168.122.1: seq=0 ttl=64 time=0.786 ms 64 bytes from 192.168.122.1: seq=1 ttl=64 time=0.157 ms ^C --- 192.168.122.1 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.157/0.471/0.786 ms
Am I running in an LXC container?
Some programs may wish to know if they have been launched inside a libvirt container. To assist them, the initial process is given two environment variables, LIBVIRT_LXC_NAME and LIBVIRT_LXC_UUID
# echo $LIBVIRT_LXC_NAME helloworld # echo $LIBVIRT_LXC_UUID a099376e-a803-ca94-f99c-d9a8f9a30088
An aside about CGroups and LXC
Every libvirt LXC container gets placed inside a dedicated cgroup, $CGROUPROOT/libvirt/lxc/$CONTAINER-NAME
. Libvirt expects the memory
, devices
, freezer
, cpu
and cpuacct
cgroups controllers to be mounted on the host OS. Work on leveraging cgroups inside LXC with libvirt is still ongoing, but there are already APIs to set/get memory and CPU limits, with networking to follow soon. This is could be a topic for a blog post on its own, so won’t be discussed further here.
An aside about LXC security, or lack thereof
You might think that since we can create a private root filesystem, it’d be cool to run an entire Fedora/RHEL OS in the container. I strongly caution against doing this. The DAC (discretionary access control) system on which LXC currently relies for all security is known to be incomplete and so it is entirely possible to accidentally/intentionally break out of the container and/or impose a DOS attack on the host OS. Repeat after me “LXC is not yet secure. If I want real security I will use KVM”. There is a plan to make LXC DAC more secure, but that is no where near finished. We also plan to integrate sVirt with LXC to so that MAC will mitigate holes in the DAC security model.
An aside about Fedora >= 15, SystemD and autofs
If you are attempting to try any of this on Fedora 16 or later, there is currently an unresolved problem with autofs that breaks much use of containers. The root problem is that we are unable to unmount autofs mount points after switching into the private filesystem namespace. Unfortunately SystemD uses autofs in its default configuration, for several type mounts. So if you find containers fail to start, then as a temporary hack you can try disabling all SystemD’s autofs mount points
for i in `systemctl --full | grep automount | awk '{print $1}'` do systemctl stop $i done
We hope to resolve this in a more satisfactory way in the near future.
The complete final example XML configuration
# cat helloworld.xml <domain type='lxc'> <name>helloworld</name> <memory>102400</memory> <os> <type>exe</type> <init>/bin/sh</init> </os> <devices> <console type='pty'/> <filesystem type='mount'> <source dir='/export/helloworld/root'/> <target dir='/'/> </filesystem> <filesystem type='mount'> <source dir='/export/helloworld/config'/> <target dir='/etc/httpd'/> </filesystem> <filesystem type='mount'> <source dir='/export/helloworld/data'/> <target dir='/var/www'/> </filesystem> <interface type='network'> <source network='default'/> </interface> </devices> </domain>
LXC is not yet secure. If I want real security I will use KVM
Best easter egg ever – I had no idea libvirt didn’t rely upon LXC userspace. Great post, these docs need to get shared/advertised more.
I tried this on RHEL6.1, and it mostly worked. I end up with:
[root@foo ~]# pstree -pal `pidof libvirt_lxc`
libvirt_lxc,2167 –name helloworld –console 20 –background
└─sh,2169
…but I can’t use “virsh console helloworld”, I get:
[root@foo ~]# virsh console helloworld
Connected to domain helloworld
Escape character is ^]
error: internal error cannot find default console device
Is that a bug against RHEL6.1 libvirt, or am I doing something wrong?
The initial helloworld.xml wasn’t quite right – it doesn’t allow “virsh console helloworld” to work. The device entries need to wrap the console entry like so:
helloworld
102400
exe
/bin/sh
Then the example works under RHEL 6.1 as well as Fedora.
@lans thanks for pointing out the XML mistake, I have corrected that.
@lans the LXC code in RHEL-6.1 had quite a few problematic bugs, so I don’t recommend testing with that. Better to wait for RHEL-6.2 or try Fedora.
Hello Daniel,
I am eagerly waiting LXC to be fully functional and want to play with..
any idea when LXC final version will come out..
thanks,
Amitabh
@Amitabh there isn’t really any concept of “final’ release. It is more just a process of continual innovation & improvement. I guess by “final” you are probably asking, when you will be able to run arbitrary full OS in a container. I think that is still quite a long way off, to the extent I wouldn’t make any predictions. Application sandboxing is where LXC is most useful in the short-to-medium term IMHO.
[…] Several sessions happened simultaneously. The ones I recall top off the head – Fedora packaging, Puppet. Shanks and myself also did a demo of SSSD and helped out people configure SSSD on their laptops.Later the day, I joined Izhar and little bit on LXC(Linux Containers). I’ve never tried out LXC before, apart from reading about it on the inter-webs. We started off by discussing pros and cons of LXC vs using regular virtual machines. At-least for him, the main bottle neck w/ VMs seems to be I/O. With LXC there is apparently no I/O bottleneck as there are no disk images, and a very small foot print on the host. Primarily useful for application sandboxing(Examples: deplyoing Plone or Drupal like CMS). Izhar gave a quick demo of LXC on his laptop and I did a quick try using Dan Berrange’s post of Getting started w/ LXC. […]
Is the paths under “A private root filesystem with busybox” correct?
For example (this is for all code in that section), shouldn’t it be like this?
mkdir /export/helloworld/root
cd /export/helloworld/root
mkdir -p bin var/www etc/httpd
cd bin
cp /sbin/busybox busybox
cd /export/helloworld/root
and
for i in ls cat rm find ps echo date kill sleep \
true false test pwd sh which grep head wget
do
ln -s busybox /export/helloworld/root/bin/$i
done
@Daniel – Thanks for your response..What i mean with final release is that fully functional so vendors like redhat will start offering with full support.
I agree its continuous innovation/improvement.
Thanks,
Amitabh
@anders I agree with you, i think that’s a bug.
[…] the words of Dan Berrange: Repeat after me “LXC is not yet secure. [. . […]
When I add the root context and start the container I get this error:
Failed to query file context on /export/helloworld/root: No data available
Any idea what’s wrong ?
Sounds like either you have SELinux disabled, or the filesystem at /export doesn’t supprot SELinux. This ought to be non-fatal, but depending on what libvirt version you have you might be hitting a bug causing it to be fatal
I do have selinux disabled. Do I need to have it enabled to be able to have this in my config:
One more question – how can I have apache running and not killing the container when I exit the console ? I start it by using virsh console, then start apache, but when I exit the container dies. Is there other way to start apache in the container?
How do you start apache in the container?
I have some basic question if you can help. I have lxc container up and running my biggest problem at the moment is loop back devices don’t work. Is there some way I can get this to work?
2.6.32-358.11.1.el6.x86_64
lxc-templates-0.8.0-1.el6.rf.x86_64
lxc-libs-0.8.0-1.el6.rf.x86_64
lxc-0.8.0-1.el6.rf.x86_64
libvirt-client-0.10.2-18.el6_4.9.x86_64
virt-viewer-0.5.2-18.el6_4.2.x86_64
virt-manager-0.9.0-18.el6.x86_64
Virsh command line tool of libvirt 0.10.2
See web site at http://libvirt.org/
Compiled with support for:
Hypervisors: QEMU/KVM LXC ESX Test
Networking: Remote Network Bridging Interface netcf Nwfilter VirtualPort
Storage: Dir Disk Filesystem SCSI Multipath iSCSI LVM
Miscellaneous: Daemon Nodedev SELinux Secrets Debug DTrace Readline
————————————————————
Some side questions ….
I’m confused by the diffrences between defining your lxc with xml using virsh and the lxc-create method Most that I google is reference to the lxc-* methods.
Im working on a strictly RHEL 6.4 and lxc-* does not seem to be part of the standard packages
Under the covers do these do the same thing?
What mailing group would I best post questions like this ?
Thx
Hello Daniel,
first of all I want to thank you for the great project “libvirt”!
We are using libvirt since many years for controlling some kvm VMs.
Recently I have tried to migrate one kvm-based SLES VM to libvirt_lxc.
But what means “migrate”?! Of course, creating a new xml config file. Further I have implemented a small wrapper, which mounts the hdd img file loopback before starting the container.
Everything seems to be ok with this. I can start the migrated VM and it is really running and working. But a shutdown of the container leads in the end to a power-off of the host. So, the power-off aciton of the container will be executed through the host system. ;(
On the host I’m running SLES11.3 and libvirt-1.1.1 (libvirt-1.0.2 does the same).
So, my question: What are the correct steps, to migrate a libvirt/kvm-VM to libvirt_lxc? Especially regarding the modifications inside the existing hard disk image? Or could this be a problem of the cgroups implementation of SLES11.3?
Thanks very much in advance for your help!!!
Uwe
Looks like I’m having some trouble to make this go. I’m using
—
virsh -V
Herramienta de línea de comandos virsh de libvirt 0.9.8
Ver sitio web en http://libvirt.org/
Compilado con soporte para:
Hipervisores: Xen QEmu/KVM UML OpenVZ LXC Test
Red: Remote Daemon Network Bridging Nwfilter VirtualPort
Almacenaje: Dir Disk Filesystem SCSI Multipath iSCSI LVM
Varios: AppArmor Secrets Debug Readline
–
Which looks like it’s got support for LXC, but then when I try to run that file it bails out with:
–
Error: Falló al definir un dominio para lxc.xml
Error: internal error unexpected domain type lxc, expecting one of these: qemu, kqemu, kvm, xen
–
That’s Spanish, but it basically says no go. Any idea?
OK, scratch that. it works, but only if you do:
–
sudo virsh -c lxc:/// define helloworld.xml
El dominio helloworld definido desde helloworld.xml
–
That’s defining explicitly the URI to conect from the command. It doesn’t if you define a default URI using ENV variables.
JJ, if you add your user to the ‘libvirtd’ group you can run virsh without sudo. Environment variables are not (generally) carried over to the sudo environment.
Sorry, just realized that ‘libvirtd’ may be a Debian/Ubuntu convention, it may be different on your platform. But something along those lines.
If i have setup a macvlan on the host, how could i use it with libvirt? What would be the libvirt xml look like? I tried tag and but failed to get it working.
Any help will be appreciated.
Nice tutorial for CentOS 7. It works! I have tested it. The tutorial is German, but easy to read.
https://der-linux-admin.de/2014/08/centos-7-centos-7-im-lxc-container
[…] Of particular concern was security, because the “DAC (discretionary access control) system on which LXC [originally] relie[d] for all security is… […]