sheepdog / sheepdog Goto Github PK

Distributed Storage System for QEMU

Home Page: http://sheepdog.github.io/sheepdog/

License: GNU General Public License v2.0

Shell 5.29% C 79.82% Makefile 0.94% C++ 0.35% Assembly 7.01% Python 2.29% Perl 3.63% M4 0.69%

qemu qemu-kvm iscsi iscsi-target blockdevice storage distributed-storage c virtualization vmm

sheepdog's Introduction

Sheepdog: Distributed Storage System for KVM
============================================

Overview
--------
Sheepdog is a distributed storage system for QEMU. It provides
highly available block level storage volumes to virtual machines. 
Sheepdog supports advanced volume management features such as snapshot,
cloning, and thin provisioning.

Sheepdog is an Open Source software, released under the terms of the
GNU General Public License version 2. 

For the latest information about Sheepdog, please visit our website at:

   http://sheepdog.github.io/sheepdog/

And (recommend for new comers) wiki at:
   https://github.com/sheepdog/sheepdog/wiki/

Requirements
------------
* Three or more x86-64 machines
* Corosync cluster engine

Install
-------
Please read the INSTALL file distributed with this package for detailed
instructions on installing or compiling from source.

Usage
-----

* Cluster Management Backends

   Sheepdog uses a cluster management backend to manage membership and broadcast
   messages to the cluster nodes.

   For now, sheepdog can use local driver (for development on a single box),
   corosync (the default), zookeeper and Accord.

* Local Driver

   This driver just makes use of UNIX IPC mechanism to manage the membership
   on a single box, where we start multiple 'sheep' processes to simulate the
   cluster. It is very easy and fast setup and especially useful to test
   functionality without involving any other software.

   To set up a 3 node cluster using local driver in one liner bash:

      $ mkdir /path/to/store
      $ for i in 0 1 2; do sheep -c local /path/to/store/$i -z $i -p 700$i;done

* Configure corosync.

   Nearly every modern Linux distribution has x86_64 corosync binaries pre-built
   available via their repositories. We recommend you use these packages if they
   are available on your distribution.

   For debian package based systems:

      $ sudo aptitude install corosync libcorosync-dev

   For RPM package based systems:

      $ sudo yum install corosynclib-devel

   Reference our wiki, the corosync(8) and corosync.conf(5) man page for further
   details.

* Setup Sheepdog
   1. Launch sheepdog on each machines of the cluster.

      $ sheep /store_dir

      Notes:
        /store_dir is a directory to store objects. The directory must
        be on the filesystem with an xattr support. In case of ext3, you
        need to add 'user_xattr' to the mount options.

        $ sudo mount -o remount,user_xattr /store_device

   2. Make fs

      $ dog cluster format --copies=3

      --copies specifies the number of default data redundancy. In this case,
      the replicated data is stored on three machines.

   3. Check cluster state

      Following list shows that Sheepdog is running on 32 nodes.

      $ dog node list
        Idx	Node id (FNV-1a) - Host:Port
      ------------------------------------------------
        0	0308164db75cff7e - 10.68.13.15:7000
      * 1	03104d8b4315c8e4 - 10.68.13.1:7000
        2	0ab18c565bc14aea - 10.68.13.3:7000
        3	0c0d27f0ac395f5d - 10.68.13.16:7000
        4	127ee4802991f308 - 10.68.13.13:7000
        5	135ff2beab2a9809 - 10.68.14.5:7000
        6	17bd6240eab65870 - 10.68.14.4:7000
        7	1cf35757cbf47d7b - 10.68.13.10:7000
        8	1df9580b8960a992 - 10.68.13.11:7000
        9	29307d3fa5a04f78 - 10.68.14.12:7000
        10	29dcb3474e31d4f3 - 10.68.14.15:7000
        11	29e089c98dd2a144 - 10.68.14.16:7000
        12	2a118b7e2738f479 - 10.68.13.4:7000
        13	3d6aea26ba79d75f - 10.68.13.6:7000
        14	42f9444ead801767 - 10.68.14.11:7000
        15	562c6f38283d09fe - 10.68.14.2:7000
        16	5dd5e540cca1556a - 10.68.14.6:7000
        17	6c12a5d10f10e291 - 10.68.14.13:7000
        18	6dae1d955ca72d96 - 10.68.13.7:7000
        19	711db0f5fa40b412 - 10.68.14.14:7000
        20	7c6b95212ee7c085 - 10.68.14.9:7000
        21	7d010c31bf11df73 - 10.68.13.2:7000
        22	82c43e908b1f3f01 - 10.68.13.12:7000
        23	931d2de0aaf61cf5 - 10.68.13.8:7000
        24	961d9d391e6021e7 - 10.68.13.14:7000
        25	9a3ef6fa1081026c - 10.68.13.9:7000
        26	b0b3d300fed8bc26 - 10.68.14.10:7000
        27	b0f08fb98c8f5edc - 10.68.14.8:7000
        28	b9cc316dc5aba880 - 10.68.13.5:7000
        29	d9eda1ec29c2eeeb - 10.68.14.7:7000
        30	e53cebb2617c86fd - 10.68.14.1:7000
        31	ea46913c4999ccdf - 10.68.14.3:7000

* Create a virtual machine image
   1. Create a 256 GB virtual machine image of Alice.

      $ qemu-img create sheepdog:Alice 256G

   2. You can also convert from existing KVM images to Sheepdog ones.

      $ qemu-img convert ~/amd64.raw sheepdog:Bob

   3. See Sheepdog images by the following command.

      $ dog vdi list
        name        id    size    used  shared    creation time  object id
      --------------------------------------------------------------------
        Bob          0  2.0 GB  1.6 GB  0.0 MB 2010-03-23 16:16      80000
        Alice        0  256 GB  0.0 MB  0.0 MB 2010-03-23 16:16      40000

* Boot the virtual machine
   1. Boot the virtual machine.

      $ qemu-system-x86_64 -hda sheepdog:Alice

   2. Following command checks used images.

      $ dog vm list
      Name            |Vdi size |Allocated| Shared  | Status
      ----------------+---------+---------+---------+------------
      Bob             |   2.0 GB|   1.6 GB|   0.0 MB| running on xx.xx.xx.xx
      Alice           |   256 GB|   0.0 MB|   0.0 MB| not running

* Snapshot
   1. Snapshot

      $ qemu-img snapshot -c name sheepdog:Alice

      -c flag is meaningless currently

   2. After getting snapshot, a new virtual machine images are added as a not-
      current image.

      $ dog vdi list
        name        id    size    used  shared    creation time  object id
      --------------------------------------------------------------------
        Bob          0  2.0 GB  1.6 GB  0.0 MB 2010-03-23 16:16      80000
        Alice        0  256 GB  0.0 MB  0.0 MB 2010-03-23 16:21      c0000
      s Alice        1  256 GB  0.0 MB  0.0 MB 2010-03-23 16:16      40000

   3. You can boot from the snapshot image by spcifing tag id

      $ qemu-system-x86_64 -hda sheepdog:Alice:1

* Cloning from the snapshot
   1. Create a Charlie image as a clone of Alice's image.

      $ qemu-img create -b sheepdog:Alice:1 sheepdog:Charlie

   2. Charlie's image is added to the virtual machine list.

      $ dog vdi list
        name        id    size    used  shared    creation time  object id
      --------------------------------------------------------------------
        Bob          0  2.0 GB  1.6 GB  0.0 MB 2010-03-23 16:16      80000
        Alice        0  256 GB  0.0 MB  0.0 MB 2010-03-23 16:21      c0000
      s Alice        1  256 GB  0.0 MB  0.0 MB 2010-03-23 16:16      40000
        Charlie      0  256 GB  0.0 MB  0.0 MB 2010-03-23 16:23     100000

Test Environment
----------------
    - Debian squeeze amd64
    - Debian lenny amd64

===============================================================================
Copyright (C) 2009-2011, Nippon Telegraph and Telephone Corporation.

Join the chat at https://gitter.im/sheepdog/sheepdog

sheepdog's People

Contributors

Stargazers

Watchers

Forkers

ke4qqq kazum desertcrystal lkunemail wiedi douardda yamt nexenta lingyunfish drscream mitake chenggangschool unakatsuo erxing huangfengming chrischung tito-tao cloudcache weixu8 kongjian linuxbest qiaohaiming hungld wangfan188 dmeearm bhansaliakhil changguanghua janpychou liuy windeye salonhuang liuyajing odasatoshi zplambert killinux higkoo dombi wycg1984 sglwlb dreamfrog xpsair qiuxi babarnazmi swordsmanli hadrienk konis heiden-deng xtlx2000 cephpp skuanr huerlisi agenge webrulon rogerhzh yutianzhou lalatendumohanty foodotbar shiopi abioy dlahoza thurday jingtian1989 potatoxin csuhawk lifengchen eaglesjune yihwang project-zerus masa-tu tongfw yinyinbigdata rubenk yamada-h wheelcomplex xuzhaokui vadikgo geekcheng renhuailin johnzhang1985 is00hcw niushao nxtcloud xunmengdeganjue gucki wangzhengyong novyaka liangjianqun wdongxu harmotao lylex simon-rock imrook grinnan cfan0330github loopwizard hiroshikozuka aawm nirs gitter-badger jn7163

sheepdog's Issues

Disable recovery when there's not enough space

This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:

as soon as there's not enough space on the cluster to replicate the loss of any of the nodes, recovery has to be disabled.
if a node die, the cluster is still able to work but it has to show a 'degraded' state in dog cluster info.
(This is alike mdadm showing 'clean,degraded' when a disk is missing)

dog node info
Id Size Used Avail Use%
0 4.6 GB 4.1 GB 479 MB 89%
1 5.0 GB 3.8 GB 1.1 GB 77%
2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%

df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0

dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014

dog node kill 2

dog node info
Id Size Used Avail Use%
0 4.6 GB 4.6 GB 2.7 MB 99%
1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%

/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b

dog vdu check
Server has no space for new objects

Sheepdog daemon version 0.8.0_353_g4d282d3

Failover of connection from qemu to sheep

use more appropriate log level

Add option to disable automatic object recovery

retry I/O requests when sheep returns a retryable error

Re-joing the cluster doesn't remove orphan objects

Summary:
I'm working with 3 nodes.
A node leave the cluster for x time.
Vdi are deleted on the other node.
When joining back the cluster, the node doesn't remove the orphan objects.
It does it if metadata is remove before joining.

root@test006:~# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856

root@test006:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test4 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:44 fd2de3 3
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3
test3 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:44 fd3662 3
test2 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd3815 3

root@test006:~# dog node info
Id Size Used Avail Use%
0 216 GB 3.4 GB 213 GB 1%
1 220 GB 3.4 GB 216 GB 1%
2 220 GB 3.4 GB 216 GB 1%
Total 655 GB 10 GB 645 GB 1%
Total virtual image size 20 GB

root@test006:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0

(I kill node id 2 and remove the 3 of 4 vdis)

root@test005:~# dog node kill 2

root@test005:# dog vdi delete test4
root@test005:# dog vdi delete test3
root@test005:~# dog vdi delete test2

root@test005:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3

root@test005:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
Total 436 GB 1.8 GB 434 GB 0%
Total virtual image size 5.0 GB

root@test005:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheepdog 220G 945M 219G 1% /mnt/sheep/0

(I insert back node id 2 and check the used space)

root@test006:~# script/run_sheep.sh

root@test006:~# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856

root@test006:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3

root@test006:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
2 218 GB 1.5 GB 216 GB 0%
Total 653 GB 3.3 GB 650 GB 0%

Total virtual image size 5.0 GB
root@test006:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0

(Notice the used sapce of vg00-sheep0 didn't vary and node info is showing
1.5GB that is nor 3.5G neither 912M, as it should be).

(I repeat the same stepd but this time i remove /var/lib/sheepdog before
re-joining the cluster)

root@test006:~/script# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856

root@test006:~/script# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test4 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:50 fd2de3 3
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:48 fd32fc 3
test3 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:49 fd3662 3
test2 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:49 fd3815 3

root@test006:~/script# dog node info
Id Size Used Avail Use%
0 216 GB 3.4 GB 213 GB 1%
1 220 GB 3.4 GB 216 GB 1%
2 220 GB 3.4 GB 216 GB 1%
Total 655 GB 10 GB 645 GB 1%

Total virtual image size 20 GB
root@test006:~/script# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0

root@test005:~# dog node kill 2

root@test005:# dog vdi delete test4
root@test005:# dog vdi delete test3
root@test005:~# dog vdi delete test2

root@test005:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
Total 436 GB 1.8 GB 434 GB 0%
Total virtual image size 5.0 GB

root@test005:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheepdog 220G 945M 219G 1% /mnt/sheep/0

(re-join the cluster)

root@test006:~/script# rm -r /var/lib/sheepdog/*

root@test006:~/script# ./run_sheep.sh

root@test006:~/script# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
2 220 GB 912 MB 219 GB 0%
Total 655 GB 2.7 GB 653 GB 0%
Total virtual image size 5.0 GB

root@test006:~/script# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 947M 219G 1% /mnt/sheep/0

documentation: how to update existing cluster

Clearly, a document which explains how to update existing sheepdog cluster with new version is required.

support a variable object size

make an improved integration with libvirt daemon and virt-manager for latest release

Hello!

Recently I 've tried to setup a sheepdog storage using Fedora 22 and its libvirt and virt-manager packages. Currently this distribution has a libvirt 1.2.17 and virt-manager 1.2.0.

During my testcases, I 've successfully built an rpm package of sheepdog 0.9.2 taken from github repo.
Also I 've configured sheepdog with corosync cluster.

When I try to connect within virt-manager to sheepdog cluster remotely, I get errors while checking parameters to connect to sheepdog storage. I 've tried to use your wiki https://github.com/sheepdog/sheepdog/wiki/Libvirt, but it doesn't work with explanation.

Help, if you have enough time!

#define ⇒ enum?

Would you like to replace more defines for constant values by enumerations to stress their relationships?

sheep process starved and eating 100% CPU under heavy IO

Hi,

Still playing with sheepdog, running IO benchmarks in my VMs, I've hit another problem (see issue #26) while running dbench in VMs: I have a sheep process eating 100% of a cpu on 2 nodes (and the fs in unresponsible on 2 of my VMs using sheepdog). On these 2 nodes, collie command does not respond (locked).

On the still working node, I have:

collie cluster info
Cluster status: running
Cluster created at Fri Mar 2 19:14:22 2012
Epoch Time Version
2012-03-02 19:14:22 1 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]

$ collie node info
Id Size Used Use%
0 1.8 TB 18 GB 0%
1 1.8 TB 19 GB 1%
2 1.8 TB 15 GB 0%
Total 5.3 TB 52 GB 0%

Total virtual image size 40 GB

$ collie vdi list
Name Id Size Used Shared Creation time VDI id
squeeze-test_3 1 10 GB 8.4 GB 956 MB 2012-03-03 01:04 406758
squeeze-test_2 1 10 GB 8.1 GB 960 MB 2012-03-03 01:04 40690b
squeeze-test_1 1 10 GB 8.3 GB 964 MB 2012-03-03 01:04 406abe
s squeeze-test 1 10 GB 1.3 GB 0.0 MB 2012-03-02 19:15 ecf746
squeeze-test 2 10 GB 0.0 MB 1.3 GB 2012-03-03 01:03 ecf747

A strace on the sheep process shows it is looping on:

epoll_wait(5, {{EPOLLERR|EPOLLHUP, {u32=67111312, u64=140600116513168}}, {EPOLLERR|EPOLLHUP,{u32=4026533136, u64=140599780967696}}, {EPOLLERR|EPOLLHUP, {u32=4026533408, u64=140599780967968}}, {EPOLLERR|EPOLLHUP, {u32=4026533680, u64=140599780968240}}, {EPOLLERR|EPOLLHUP, {u32=4026533952, u64=140599780968512}}, {EPOLLERR|EPOLLHUP, {u32=4026534224, u64=140599780968784}}, {EPOLLERR|EPOLLHUP, {u32=4026534496, u64=140599780969056}}, {EPOLLERR|EPOLLHUP, {u32=4026534768, u64=140599780969328}}, {EPOLLERR|EPOLLHUP, {u32=4026535040, u64=140599780969600}}, {EPOLLERR|EPOLLHUP, {u32=4026535312, u64=140599780969872}}, {EPOLLERR|EPOLLHUP, {u32=4026535584, u64=140599780970144}}, {EPOLLERR|EPOLLHUP, {u32=67146816, u64=140600116548672}}, {EPOLLERR|EPOLLHUP, {u32=4022399216, u64=140599776833776}}, {EPOLLERR|EPOLLHUP, {u32=4022399488, u64=140599776834048}}, {EPOLLERR|EPOLLHUP, {u32=4022399760, u64=140599776834320}}, {EPOLLERR|EPOLLHUP, {u32=4022400032, u64=140599776834592}}, {EPOLLERR|EPOLLHUP, {u32=4022400304, u64=140599776834864}}, {EPOLLERR|EPOLLHUP, {u32=4022400576, u64=140599776835136}}, {EPOLLERR|EPOLLHUP, {u32=4022400848, u64=140599776835408}}, {EPOLLERR|EPOLLHUP, {u32=4022401120, u64=140599776835680}}, {EPOLLERR|EPOLLHUP, {u32=4022401392, u64=140599776835952}}, {EPOLLERR|EPOLLHUP, {u32=4022401664, u64=140599776836224}}, {EPOLLERR|EPOLLHUP, {u32=4022401936, u64=140599776836496}}, {EPOLLERR|EPOLLHUP, {u32=4022402480, u64=140599776837040}}, {EPOLLERR|EPOLLHUP, {u32=4022402752, u64=140599776837312}}, {EPOLLERR|EPOLLHUP, {u32=67187136, u64=140600116588992}}, {EPOLLERR|EPOLLHUP, {u32=88097008, u64=140600137498864}}, {EPOLLERR|EPOLLHUP, {u32=88097280, u64=140600137499136}}, {EPOLLERR|EPOLLHUP, {u32=4022402208, u64=140599776836768}}, {EPOLLERR|EPOLLHUP, {u32=88097552, u64=140600137499408}}, {EPOLLERR|EPOLLHUP, {u32=88097824, u64=140600137499680}}, {EPOLLERR|EPOLLHUP, {u32=88098096, u64=140600137499952}}}, 128, 1000) = 32

After killing the stuck sheep process on one node, the cluster seems to be back to a normal functionnal state:

$ collie cluster info
Cluster status: running

Cluster created at Fri Mar 2 19:14:22 2012

Epoch Time Version
2012-03-03 14:22:17 3 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]
2012-03-03 14:22:17 2 [172.17.3.20:7000, 172.17.3.22:7000]
2012-03-02 19:14:22 1 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]

And one of my VMs is back to a functional state (IO unblocked) but not the other one (the one running on the node where I killed the sheep process).

Use custom redundancy for some hosts

http://lists.wpkg.org/pipermail/sheepdog/2011-June/001055.html

Introduce Sheepdog client library

Can't find a valid vnode after reweight

I started setting up a sheepdog cluster. I started with 3 nodes and recently added a fourth one.

When I add the fourth on everything is fine, but after I reweigh the cluster, the check command fails:

~# dog cluster check
fix vdi test-4
PANIC: can't find a valid vnode
dog exits unexpectedly (Aborted).
dog.c:329: crash_handler
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf20f) [0x7ff655f2a20f]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7ff655ba41e4]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7ff655ba7397]
sheep.h:95: oid_to_vnodes
vdi.c:1680: do_vdi_check
common.c:182: parse_vdi
cluster.c:491: cluster_check
dog.c:441: main
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4) [0x7ff655b90994]
dog() [0x403b48]

Sheepdog on top of ZFS

I am trying to run Sheepdog on top of a ZFS mount point. I keep getting an error:

Jan 01 12:12:14 [main] queue_request(354) CREATE_AND_WRITE_OBJ, 1
Jan 01 12:12:14 [gway 232] do_process_work(1233) 1, ad75c000003ff, 1
Jan 01 12:12:14 [gway 232] gateway_forward_request(262) ad75c000003ff
Jan 01 12:12:14 [gway 232] default_create_and_write(320) failed to open /var/lib/sheepdog/disc0/obj/000ad75c000003ff.tmp: Invalid argument
Jan 01 12:12:14 [gway 232] err_to_sderr(128) oid=ad75c000003ff, Invalid argument
Jan 01 12:12:14 [gway 232] gateway_forward_request(302) fail to write local 3
Jan 01 12:12:14 [gway 232] gateway_forward_request(307) nr_sent 0, err 3
Jan 01 12:12:14 [gway 232] do_process_work(1240) failed: 1, ad75c000003ff , 1, 3
Jan 01 12:12:14 [main] gateway_op_done(99) leaving sheepdog cluster

I can write to the mount point as a user. I also have made sure that xattr is turned on on the zfs filesystem. Is there something else I am missing?

I have verified that that sheepdog works fine with a ext3 filesystem.

"collie node list" cannot list the other server

as the topic,i have 2 server in my LAN(all installed sheepdog),but one can not see the other in "collie node list" , like this:

[root@kvm-1 ~]# collie node list
M Id Host:Port V-Nodes Zone

0 172.18.11.192:7000 64-1073016148

[root@kvm-2 ~]# collie node list
M Id Host:Port V-Nodes Zone

0 172.18.11.198:7000 64 -972352852

and i had stop my iptables & selinux, but it still just can see itself.. help me please,thanks a lot!

support architecture other than X86_64, i386

Machine parsable output option of collie

kvm segfault under heavy IO

I have a testing sheepdog setup consisting of 3 nodes which are also kvm hosts. I am using Debian squeeze with backports and sheedog built from git (at rev 86a25e9). The sheepdog storage is formatted with 2 copies of the data.

Installed packages on the nodes:

qemu-kvm 1.0+dfsg-8bpo60+1
linux-image-3.2.0-0.bpo. 3.2.4-1bpo60+1

VMs are started with libvirt.

I do not have much logs but the segfault in syslog like:

kvm[17438]: segfault at 401390 ip 00007fd16f3e04d1 sp 00007fd17441fb60 error 4 in kvm

Note that I also have a "reference" VM running on one node of the cluster using a raw image disk as storage that do not segfault.

Is this a known issue?

David

Rackaware objects placement

reserved identifier violation

I would like to point out that identifiers like "__SHEEP_H__" and "__list_del" do not fit to the expected naming conventions of the C/C++ language standard.
Would you like to adjust your selection for unique names?

Check and repair command of Sheepdog VDI

Cluster recovery problem

Hi, I've experienced this problem: I' have a two node cluster with replica count = 2 using corosync. If I have network problems between the two nodes, corosync detect the new topology and the same does sheepdog but when the connectivity is restored corosync shows the right new configuration with two nodes but sheepdog not. I have to restart one node to let the cluster know the nodes are up again and start a recovery.
Sheepdog is version 0.8.0 (built by proxmox)

Reclaim objects of deleted VDIs

add subzone support

http://lists.wpkg.org/pipermail/sheepdog/2011-October/001587.html

command option 'localhost' support

when I run dog vdi list -a localhost, it displays:

Invalid ip address localhost

It seems that sheepdog doesnot support localhost option.
And I think the option is necessary. Some up-level program may generate dog vdi create XXXX -a localhost command when using sheepodog as backend suppport.
So I suggest maybe we can add 'localhost' option support in dog.c. Just add a simple judge statement in "getopt_long switch case 'a' statment" works well. Maybe there is better solution.
Thanks.

Build on 32 bit hosts

I tried to build RPMs for both i586 and x86_64 and faced with problem that 32-bit build is completely broken.

Here is my patch to fix build on 32-bit hosts (packages are built but I not checked how it works):

diff -ru sheepdog-0.9.2.orig/configure.ac sheepdog-0.9.2/configure.ac
--- sheepdog-0.9.2.orig/configure.ac    2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/configure.ac 2015-07-28 18:32:24.051878895 +0300
@@ -197,6 +197,7 @@
        -badflag -D__gnuc_va_list=va_list -D__attribute\(x\)="

 AM_CONDITIONAL(BUILD_SHA1_HW, [[[[ $host = *x86_64* ]]]])
+AM_CONDITIONAL(X86_64,        [[[[ $host = *x86_64* ]]]])

 AC_ARG_ENABLE([fatal-warnings],
        [  --enable-fatal-warnings : enable fatal warnings. ],
diff -ru sheepdog-0.9.2.orig/include/compiler.h sheepdog-0.9.2/include/compiler.h
--- sheepdog-0.9.2.orig/include/compiler.h      2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/include/compiler.h   2015-07-28 10:12:28.430291588 +0300
@@ -164,6 +164,12 @@
 #define cpu_has_avx            cpu_has(X86_FEATURE_AVX)
 #define cpu_has_osxsave                cpu_has(X86_FEATURE_OSXSAVE)

+#else  /* __x86_64__ */
+
+#define cpu_has_ssse3   0
+#define cpu_has_avx     0
+#define cpu_has_osxsave 0
+
 #endif /* __x86_64__ */

 #endif /* SD_COMPILER_H */
diff -ru sheepdog-0.9.2.orig/lib/fec.c sheepdog-0.9.2/lib/fec.c
--- sheepdog-0.9.2.orig/lib/fec.c       2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/lib/fec.c    2015-07-28 10:17:44.588557313 +0300
@@ -737,5 +737,8 @@

        lost[0] = (unsigned char *)buf;
        ec_init_tables(ed, 1, cm, ec_tbl);
-       ec_encode_data_sse(len, ed, 1, ec_tbl, input, lost);
+       if (cpu_has_ssse3)
+               ec_encode_data_sse(len, ed, 1, ec_tbl, input, lost);
+       else
+               ec_encode_data(len, ed, 1, ec_tbl, input, lost);
 }
diff -ru sheepdog-0.9.2.orig/lib/Makefile.am sheepdog-0.9.2/lib/Makefile.am
--- sheepdog-0.9.2.orig/lib/Makefile.am 2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/lib/Makefile.am      2015-07-29 10:01:00.601828973 +0300
@@ -15,7 +15,7 @@
 libsheepdog_a_SOURCES  = event.c logger.c net.c util.c rbtree.c strbuf.c \
                          sha1.c option.c work.c sockfd_cache.c fec.c sd_inode.c

-libsheepdog_a_LIBADD   = isa-l/bin/ec_base.o \
+libsheepdog_a_LIBADD_  = isa-l/bin/ec_base.o \
                          isa-l/bin/ec_highlevel_func.o \
                          isa-l/bin/ec_multibinary.o \
                          isa-l/bin/gf_2vect_dot_prod_sse.o \
@@ -27,6 +27,16 @@
                          isa-l/bin/gf_vect_mul_avx.o \
                          isa-l/bin/gf_vect_mul_sse.o

+libsheepdog_a_LIBADD_32        = isa-l/bin/ec_base.o \
+                         isa-l/bin/ec_highlevel_func.o \
+                         isa-l/bin/ec_multibinary.o
+
+if !X86_64
+arch = 32
+endif
+
+libsheepdog_a_LIBADD   = $(libsheepdog_a_LIBADD_$(arch))
+
 if BUILD_SHA1_HW
 libsheepdog_a_SOURCES  += sha1_ssse3.S
 endif
@@ -43,7 +53,7 @@
        @$(CHECK_STYLE) $(libsheepdog_a_SOURCES)

 libisa.a:
-       cd isa-l/ && $(MAKE) && cd ..
+       cd isa-l/ && $(MAKE) arch=$(arch) && cd ..

 clean:
        cd isa-l/ && $(MAKE) clean && cd ..

Remove objects in the old epoch

support shrinking image size

provide different redundancy levels for each VDI

'man collie' unclear on effect of '--mode' option

The manpage says:
-m, --mode [safe|quorum|unsafe]
This option controls the behavior when there are too few nodes for the configured redundancy. Mode 'safe' will halt cluster IO when (nr_nodes < nr_copies).
Mode 'quorum' will halt cluster IO when (nr_nodes < nr_copies/2 + 1). Mode 'unsafe' will never halt the cluster and therefore data loss may result.

The last line implies data loss cannot occur with quorum or safe mode, yet it's unclear why data loss would occur if only 1 node was available (when data may be read/written just fine?); or even if 0 nodes were available (when cluster IO is therefore effectively halted).

support differential copy for fast object recovery

Add writeback support

Create test suite for Sheepdog

when sheepdog cluster down and then restart cluster, these vms hava below issue

Hi ALL,

i create a cluster usring zookeeper as cluster driver, when the cluster is down and then restart, there are large logs on all instances which starting on this cluster. but the log only contain below info in /var/log/libvirt/qemu/instance-name.log :
qemu-kvm: cannot find aio_req

how can i resolve this problem?

thanks very much!

Fix wrong error messages

Some sheepdog questions

Hi,
i would like to test and deploy sheepdog as our VM disk backend because it looks like a great project, but I have some questions first:

we have a very mixed cluster with different disk sizes, disk performances and cpu performances. Will this be a problem? Should I use only use disks/hosts with the same size?
"Cluster management: good practices -> General remarks" doc says that I will data loss, if I restart multiple sheeps at once? I think I'm not getting it right: I will expericence permanent data loss and/or corruption of the images if I shut down multiple nodes and start them later? Or is this just "in flight IO" that I will lose?
Is it safe to put the sheepdog storage on a partition that is in use by other applications too? Should I create an extra partition (eg for sheepdog to calculate free cluster space)?
Are there any backup best practices? Snapshot, dump & compress image, remove snapshot?

Thanks,
Phillipp

Recovery with EC fails

0 192.168.2.44:7000 3 738371776
1 192.168.2.45:7000 3 755148992
2 192.168.2.46:7000 3 771926208
3 192.168.2.47:7000 3 788703424

I create a test disk

dog vdi create -P test 10G

I constantly write on it, also during recovery.
Note: because my cpu is slow and I'm using /dev/urandom, it doesn't write faster than 3M/s.

dd if=/dev/urandom bs=1M count=2048 | dog vdi write test

I kill the 4th node by 'dog node kill 3'. I get many warning messages and the recovery completes successfully:

Jul 15 10:16:26 WARN [rw 6866] read_erasure_object(233) can not read 7c2b250000082b idx 0
Jul 15 10:16:26 WARN [rw 6867] read_erasure_object(233) can not read 7c2b250000082d idx 0
Jul 15 10:16:26 WARN [rw 6865] read_erasure_object(233) can not read 7c2b250000082e idx 1

I kill the 3th node by 'dog node kill 2'. I get the warning messages and the recovery completes successfully:

Jul 15 10:37:37 WARN [rw 6832] read_erasure_object(233) can not read 7c2b2500000179 idx 0
Jul 15 10:37:37 WARN [rw 6867] read_erasure_object(233) can not read 7c2b250000017b idx 1
Jul 15 10:37:37 WARN [rw 6865] read_erasure_object(233) can not read 7c2b2500000180 idx 1

I insert back node id 2 (note that I "clean" the node first by 'rm -r /var/lib/sheepdog; rm -r /mnt/sheep/0').
During recovery, I get the 'object not found' messages.

Jul 15 10:39:05 WARN [rw 6867] read_erasure_object(233) can not read 7c2b2500000008 idx 2
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER

I insert back the node id 3

Jul 15 10:38:28 NOTICE [main] cluster_recovery_completion(781) all nodes are recovered, epoch 3
Jul 15 10:39:05 INFO [main] local_vdi_state_checkpoint_ctl(1467) freeing vdi state checkpoint at epoch 2
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] read_erasure_object(233) can not read 7c2b2500000006 idx 2
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
...
Jul 15 10:40:26 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 INFO [main] local_vdi_state_checkpoint_ctl(1467) freeing vdi state checkpoint at epoch 3
Jul 15 10:40:58 INFO [main] recover_object_main(948) object recovery progress 1%
Jul 15 10:40:59 INFO [main] recover_object_main(948) object recovery progress 2%
...
Jul 15 10:41:25 INFO [main] recover_object_main(948) object recovery progress 54%
Jul 15 10:41:26 INFO [main] recover_object_main(948) object recovery progress 55%
Jul 15 10:41:26 ERROR [rw 6832] err_to_sderr(74) diskfull, oid=7c2b250000057d
Jul 15 10:41:26 ERROR [rw 6832] recover_object_work(584) failed to recover object 7c2b250000057d
Jul 15 10:41:26 ERROR [rw 6986] err_to_sderr(74) diskfull, oid=7c2b250000057e
Jul 15 10:41:26 ERROR [rw 6986] recover_object_work(584) failed to recover object 7c2b250000057e
Jul 15 10:41:26 ERROR [rw 6865] err_to_sderr(74) diskfull, oid=7c2b250000057f
Jul 15 10:41:26 ERROR [rw 6865] recover_object_work(584) failed to recover object 7c2b250000057f

Sheepdog+sheepfs - can't mount volume

Good day.
I've done all as described in instructions (Ubuntu 12.04+CentOS 6.3).
Setuped corosync on internal network 10.10.10.* (corosync address in config is 10.10.10.0), in logs:
Feb 01 12:50:52 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.10.13) ; members(old:2 left:0)
Feb 01 12:50:52 corosync [MAIN ] Completed service synchronization, ready to provide service.
Setuped sheepdog by:
sheep /mnt (or /home to test)
collie cluster format --copies=3
[root@host log]# collie node list
M Id Host:Port V-Nodes Zone

0 10.10.10.13:7000 64 218761738
1 10.10.10.14:7000 64 235538954
2 10.10.10.15:7000 64 252316170

Then sheepfs:
sheepfs /home/sheep/fs/
[root@host log]# cat /home/sheep/fs/cluster/info
Cluster status: running
Cluster created at Fri Feb 1 12:52:15 2013
Epoch Time Version
2013-02-01 12:52:15 1 [10.10.10.13:7000, 10.10.10.14:7000, 10.10.10.15:7000]

And now it's time to mount a volume:
echo test > /home/sheep/fs/vdi/mount or echo test > /mnt/fs/vdi/mount (ubuntu nodes)
The result is: bash: echo: write error: Invalid argument, tried to insert "test" by vi to the file - can't save file.
What am I doing wrong?
Thanks and best regards, Nikolay.

show differences between VDIs to backup efficiently

Require '-f' or confirmation prompt, on 'collie cluster format'

The subject says it all. It's too easy to run 'cluster collie format' and destroy one's repository.

.stale directory was removed after cluster format

Reproduction procedure is as follows.
However, not occur 100%, is occured by the timing.

start sheepdog cluster
check .stale directory
format cluster
recheck .stale directory

$ sheep -p 7000 -z 0 -l dir=/var/log/sheep0 /var/lib/sheepdog/data0
$ sheep -p 7001 -z 1 -l dir=/var/log/sheep1 /var/lib/sheepdog/data1
$ sheep -p 7002 -z 2 -l dir=/var/log/sheep2 /var/lib/sheepdog/data2

$ ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

$ dog cluster format
using backend plain store

$ ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

I think it is race condition at cluter format.
I've tested with an embedded debugging statements

here is debugging patch.

$ git diff
diff --git a/lib/util.c b/lib/util.c
index 21e0143..90b4f66 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -485,6 +485,7 @@ int rmdir_r(const char *dir_path)
ret = purge_directory(dir_path);
if (ret == 0)
ret = rmdir(dir_path);

sd_notice("rmdir %s", dir_path);
```
return ret;
```
}
diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 876582c..97c5078 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -230,6 +230,7 @@ static int make_stale_dir(const char *path)
char p[PATH_MAX];
```
snprintf(p, PATH_MAX, "%s/.stale", path);
```
sd_notice("mkdir %s", p);
if (xmkdir(p, sd_def_dmode) < 0) {
sd_err("%s failed, %m", p);
return SD_RES_EIO;

log message is below.

$ grep NOTICE /var/log/sheep*/sheep.log
/var/log/sheep0/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data0/obj/.stale
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data0/obj/.stale

/var/log/sheep1/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data1/obj/.stale
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data1/obj/.stale

/var/log/sheep2/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data2/obj/.stale
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data2/obj/.stale

Support other coordinator services

Add subcommand help

Zookeeper failed to init node

I was uploading data and run into below panic. Reproducible.
Configuration: 3 zookeeper, 14 sheep

Jun 26 10:11:26 EMERG [http 5107] lock_table_lookup_acquire(353) PANIC: Failed to init node /sheepdog/lock/7214379
Jun 26 10:11:26 EMERG [http 5107] crash_handler(268) sheep exits unexpectedly (Aborted).
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(833) sheep.c:289: crash_handler
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(847) /lib64/libpthread.so.0(+0xf70f) [0x7f34053c070f]
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x7f340454d924]
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x7f340454f104]
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) zookeeper.c:355: lock_table_lookup_acquire
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) kv.c:737: onode_allocate_extents
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) kv.c:876: onode_append_data
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) swift.c:255: swift_put_object
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) swift.c:349: swift_handle_request
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) http.c:283: http_run_request
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) work.c:350: worker_routine
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(847) /lib64/libpthread.so.0(+0x79d0) [0x7f34053b89d0]
Jun 26 10:11:30 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(clone+0x6c) [0x7f3404603b6c]

Support taking snapshot of running VMs from other than a monitor

compile error

[root@buyer sheepdog-0.9.2]# make
Making all in lib
make[1]: Entering directory /usr/local/src/sheepdog-0.9.2/lib' cd isa-l/ && make && cd .. make[2]: Entering directory/usr/local/src/sheepdog-0.9.2/lib/isa-l'
make[2]: Nothing to be done for default'. make[2]: Leaving directory/usr/local/src/sheepdog-0.9.2/lib/isa-l'
CC net.o
In file included from ../include/internal_proto.h:25,
from ../include/sheep.h:16,
from net.c:31:
../include/fec.h: 在函数‘ec_encode’中:
../include/fec.h:180: 错误：‘cpu_has_ssse3’未声明(在此函数内第一次使用)
../include/fec.h:180: 错误：(即使在一个函数内多次出现，每个未声明的标识符在其
../include/fec.h:180: 错误：所在的函数内也只报告一次。)
../include/fec.h: 在函数‘ec_decode_buffer’中:
../include/fec.h:210: 错误：‘cpu_has_ssse3’未声明(在此函数内第一次使用)
make[1]: *** [net.o] 错误 1
make[1]: Leaving directory `/usr/local/src/sheepdog-0.9.2/lib'
make: *** [all-recursive] 错误 1
You have mail in /var/spool/mail/root

sheep uses too many sockets

This problem is reported in
http://www.mail-archive.com/[email protected]/msg01006.html

cluster_recovery_completion function bug

in the cluster_recovery_completion function,i think memcmp(vnode_info->nodes, recovereds,sizeof(*recovereds) * nr_recovereds) == 0 line have a bug.
beceuse the Function parameter vnode_info->nodes and recovereds will have different nr_vnodes number after recovery completion.vnode_info-> nodes of nr_vnodes variable calculated by recalculate_vnodes function.and recovereds of nr_vnodes variable is a fixed value.
in this case sd_store->cleanup() will not be executed.

is this a bug?

Create pid file when service start

I will make it easy to write an init script

sheepdog / sheepdog Goto Github PK

sheepdog's Introduction

sheepdog's People

Contributors

Stargazers

Watchers

Forkers

sheepdog's Issues

Invalid ip address localhost

Recommend Projects

Recommend Topics

Recommend Org

Jobs