Setting up a Sheepdog cluster and exporting a volume to a KVM guest

Posted: October 11th, 2011 | Filed under: Fedora, libvirt, Virt Tools | Tags: , , , , , | 3 Comments »

There were recently patches posted to libvir-list to improve the Ceph support in the KVM driver. While trying to review them it quickly became clear I did not have enough knowledge of Ceph to approve the code. So I decided it was time to setup some clustered storage devices to test libvirt with. I decided to try out Ceph, GlusterFS and Sheepdog, and by virtue of Sheepdog compiling the fastest, that is the first one I have tried and thus responsible for this blog post.

Host setup

If you have Fedora 16, sheepdog can directly installed using yum

# yum install sheepdog

Sheepdog relies on corosync to maintain cluster membership, so the first step is to configure that. Corosync ships with an example configuration file, but since I’ve not used it before, I chose to just use the example configuration recommended by the Sheepdog website. So on the 2 hosts I wanted to participate in the cluster I created:

# cat > /etc/cluster/cluster.conf <EOF
compatibility: whitetank
totem {
  version: 2
  secauth: off
  threads: 0
  interface {
    ringnumber: 0
    bindnetaddr: -YOUR IP HERE-
    mcastaddr: 226.94.1.1
    mcastport: 5405
  }
}
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
  logger_subsys {
    subsys: AMF
    debug: off
  }
}
amf {
  mode: disabled
}
EOF

Obviously remembering to change the ‘bindnetaddr‘ parameter. One thing to be aware of is that this configuration allows any host in the same subnet to join the cluster, no authentication or encryption is required. I believe corosync has some support for encryption keys, but I have not explored this. If you don’t trust the network, this should definitely be examined. Then it is simply a matter of starting the corosync and sheepdog, each on each node:

# service corosync start
# service sheepdog start

If all went to plan, it should be possible to see all hosts in the sheepdog cluster, from any node:

# collie node list
   Idx - Host:Port              Number of vnodes
------------------------------------------------
     0 - 192.168.1.2:7000    	64
*    1 - 192.168.1.3:7000    	64

The final step in initializing the nodes is to create a storage cluster across the nodes. This command only needs to be run on one of the nodes

# collie cluster format --copies=2
# collie cluster info
running

Ctime                Epoch Nodes
2011-10-11 10:50:01      1 [192.168.1.2:7000, 192.168.1.3:7000]

Volume setup

libvirt has a storage management API for creating/managing volumes, but there is not currently a driver for sheepdog. So for the time being, volumes need to be created manually using the qemu-img command. All that is required is a volume name and a size. So on any of the nodes:

$ qemu-img create sheepdog:demo 1G

The more observant people might notice that this command can be run by any user on the host, no authentication required. Even if the host is locked down to not allow unprivileged user logins, this still means that any compromised QEMU instance can access all the sheepdog storage. Not cool. Some form of authentication is clearly needed before this can be used for production.

With the default Fedora configuration of sheepdog, all the disk volumes end up being stored under /var/lib/sheepdog, so make sure that directory has plenty of free space.

Guest setup

Once a volume has been created, setting up a guest to use it, is just a matter of using a special XML configuration block for the guest disk.

<disk type='network' device='disk'>
  <driver name='qemu' type='raw'/>
  <source protocol='sheepdog' name='demo'/>
  <target dev='vdb' bus='virtio'/>
</disk>

Notice how although this is a network block device, there is no need to provide a hostname of the storage server. Every virtualization host is a member of the storage cluster, and vica-verca, so the storage is “local” as far as QEMU is concerned. Inside the guest there is nothing special to worry about, a regular virtio block device appears, in this case /dev/vdb. As data is written to the block device in the guest, the data should end up in /var/lib/sheepdog on all nodes in the cluster.

One final caveat to mention, is that live migration of guests between hosts is not currently supported with Sheepdog.

Edit: Live migration *is* supported with sheepdog 0.2.0 and later.