sheepdog / sheepdog Goto Github PK
View Code? Open in Web Editor NEWDistributed Storage System for QEMU
Home Page: http://sheepdog.github.io/sheepdog/
License: GNU General Public License v2.0
Distributed Storage System for QEMU
Home Page: http://sheepdog.github.io/sheepdog/
License: GNU General Public License v2.0
Sheepdog: Distributed Storage System for KVM ============================================ Overview -------- Sheepdog is a distributed storage system for QEMU. It provides highly available block level storage volumes to virtual machines. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog is an Open Source software, released under the terms of the GNU General Public License version 2. For the latest information about Sheepdog, please visit our website at: http://sheepdog.github.io/sheepdog/ And (recommend for new comers) wiki at: https://github.com/sheepdog/sheepdog/wiki/ Requirements ------------ * Three or more x86-64 machines * Corosync cluster engine Install ------- Please read the INSTALL file distributed with this package for detailed instructions on installing or compiling from source. Usage ----- * Cluster Management Backends Sheepdog uses a cluster management backend to manage membership and broadcast messages to the cluster nodes. For now, sheepdog can use local driver (for development on a single box), corosync (the default), zookeeper and Accord. * Local Driver This driver just makes use of UNIX IPC mechanism to manage the membership on a single box, where we start multiple 'sheep' processes to simulate the cluster. It is very easy and fast setup and especially useful to test functionality without involving any other software. To set up a 3 node cluster using local driver in one liner bash: $ mkdir /path/to/store $ for i in 0 1 2; do sheep -c local /path/to/store/$i -z $i -p 700$i;done * Configure corosync. Nearly every modern Linux distribution has x86_64 corosync binaries pre-built available via their repositories. We recommend you use these packages if they are available on your distribution. For debian package based systems: $ sudo aptitude install corosync libcorosync-dev For RPM package based systems: $ sudo yum install corosynclib-devel Reference our wiki, the corosync(8) and corosync.conf(5) man page for further details. * Setup Sheepdog 1. Launch sheepdog on each machines of the cluster. $ sheep /store_dir Notes: /store_dir is a directory to store objects. The directory must be on the filesystem with an xattr support. In case of ext3, you need to add 'user_xattr' to the mount options. $ sudo mount -o remount,user_xattr /store_device 2. Make fs $ dog cluster format --copies=3 --copies specifies the number of default data redundancy. In this case, the replicated data is stored on three machines. 3. Check cluster state Following list shows that Sheepdog is running on 32 nodes. $ dog node list Idx Node id (FNV-1a) - Host:Port ------------------------------------------------ 0 0308164db75cff7e - 10.68.13.15:7000 * 1 03104d8b4315c8e4 - 10.68.13.1:7000 2 0ab18c565bc14aea - 10.68.13.3:7000 3 0c0d27f0ac395f5d - 10.68.13.16:7000 4 127ee4802991f308 - 10.68.13.13:7000 5 135ff2beab2a9809 - 10.68.14.5:7000 6 17bd6240eab65870 - 10.68.14.4:7000 7 1cf35757cbf47d7b - 10.68.13.10:7000 8 1df9580b8960a992 - 10.68.13.11:7000 9 29307d3fa5a04f78 - 10.68.14.12:7000 10 29dcb3474e31d4f3 - 10.68.14.15:7000 11 29e089c98dd2a144 - 10.68.14.16:7000 12 2a118b7e2738f479 - 10.68.13.4:7000 13 3d6aea26ba79d75f - 10.68.13.6:7000 14 42f9444ead801767 - 10.68.14.11:7000 15 562c6f38283d09fe - 10.68.14.2:7000 16 5dd5e540cca1556a - 10.68.14.6:7000 17 6c12a5d10f10e291 - 10.68.14.13:7000 18 6dae1d955ca72d96 - 10.68.13.7:7000 19 711db0f5fa40b412 - 10.68.14.14:7000 20 7c6b95212ee7c085 - 10.68.14.9:7000 21 7d010c31bf11df73 - 10.68.13.2:7000 22 82c43e908b1f3f01 - 10.68.13.12:7000 23 931d2de0aaf61cf5 - 10.68.13.8:7000 24 961d9d391e6021e7 - 10.68.13.14:7000 25 9a3ef6fa1081026c - 10.68.13.9:7000 26 b0b3d300fed8bc26 - 10.68.14.10:7000 27 b0f08fb98c8f5edc - 10.68.14.8:7000 28 b9cc316dc5aba880 - 10.68.13.5:7000 29 d9eda1ec29c2eeeb - 10.68.14.7:7000 30 e53cebb2617c86fd - 10.68.14.1:7000 31 ea46913c4999ccdf - 10.68.14.3:7000 * Create a virtual machine image 1. Create a 256 GB virtual machine image of Alice. $ qemu-img create sheepdog:Alice 256G 2. You can also convert from existing KVM images to Sheepdog ones. $ qemu-img convert ~/amd64.raw sheepdog:Bob 3. See Sheepdog images by the following command. $ dog vdi list name id size used shared creation time object id -------------------------------------------------------------------- Bob 0 2.0 GB 1.6 GB 0.0 MB 2010-03-23 16:16 80000 Alice 0 256 GB 0.0 MB 0.0 MB 2010-03-23 16:16 40000 * Boot the virtual machine 1. Boot the virtual machine. $ qemu-system-x86_64 -hda sheepdog:Alice 2. Following command checks used images. $ dog vm list Name |Vdi size |Allocated| Shared | Status ----------------+---------+---------+---------+------------ Bob | 2.0 GB| 1.6 GB| 0.0 MB| running on xx.xx.xx.xx Alice | 256 GB| 0.0 MB| 0.0 MB| not running * Snapshot 1. Snapshot $ qemu-img snapshot -c name sheepdog:Alice -c flag is meaningless currently 2. After getting snapshot, a new virtual machine images are added as a not- current image. $ dog vdi list name id size used shared creation time object id -------------------------------------------------------------------- Bob 0 2.0 GB 1.6 GB 0.0 MB 2010-03-23 16:16 80000 Alice 0 256 GB 0.0 MB 0.0 MB 2010-03-23 16:21 c0000 s Alice 1 256 GB 0.0 MB 0.0 MB 2010-03-23 16:16 40000 3. You can boot from the snapshot image by spcifing tag id $ qemu-system-x86_64 -hda sheepdog:Alice:1 * Cloning from the snapshot 1. Create a Charlie image as a clone of Alice's image. $ qemu-img create -b sheepdog:Alice:1 sheepdog:Charlie 2. Charlie's image is added to the virtual machine list. $ dog vdi list name id size used shared creation time object id -------------------------------------------------------------------- Bob 0 2.0 GB 1.6 GB 0.0 MB 2010-03-23 16:16 80000 Alice 0 256 GB 0.0 MB 0.0 MB 2010-03-23 16:21 c0000 s Alice 1 256 GB 0.0 MB 0.0 MB 2010-03-23 16:16 40000 Charlie 0 256 GB 0.0 MB 0.0 MB 2010-03-23 16:23 100000 Test Environment ---------------- - Debian squeeze amd64 - Debian lenny amd64 =============================================================================== Copyright (C) 2009-2011, Nippon Telegraph and Telephone Corporation. Join the chat at https://gitter.im/sheepdog/sheepdog
This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:
dog node info
Id Size Used Avail Use%
0 4.6 GB 4.1 GB 479 MB 89%
1 5.0 GB 3.8 GB 1.1 GB 77%
2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%
df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0
dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014
dog node kill 2
dog node info
Id Size Used Avail Use%
0 4.6 GB 4.6 GB 2.7 MB 99%
1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%
/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b
dog vdu check
Server has no space for new objects
Sheepdog daemon version 0.8.0_353_g4d282d3
Summary:
I'm working with 3 nodes.
A node leave the cluster for x time.
Vdi are deleted on the other node.
When joining back the cluster, the node doesn't remove the orphan objects.
It does it if metadata is remove before joining.
root@test006:~# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856
root@test006:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test4 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:44 fd2de3 3
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3
test3 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:44 fd3662 3
test2 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd3815 3
root@test006:~# dog node info
Id Size Used Avail Use%
0 216 GB 3.4 GB 213 GB 1%
1 220 GB 3.4 GB 216 GB 1%
2 220 GB 3.4 GB 216 GB 1%
Total 655 GB 10 GB 645 GB 1%
Total virtual image size 20 GB
root@test006:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0
(I kill node id 2 and remove the 3 of 4 vdis)
root@test005:~# dog node kill 2
root@test005:# dog vdi delete test4# dog vdi delete test3
root@test005:
root@test005:~# dog vdi delete test2
root@test005:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3
root@test005:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
Total 436 GB 1.8 GB 434 GB 0%
Total virtual image size 5.0 GB
root@test005:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheepdog 220G 945M 219G 1% /mnt/sheep/0
(I insert back node id 2 and check the used space)
root@test006:~# script/run_sheep.sh
root@test006:~# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856
root@test006:~# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:43 fd32fc 3
root@test006:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
2 218 GB 1.5 GB 216 GB 0%
Total 653 GB 3.3 GB 650 GB 0%
Total virtual image size 5.0 GB
root@test006:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0
(Notice the used sapce of vg00-sheep0 didn't vary and node info is showing
1.5GB that is nor 3.5G neither 912M, as it should be).
(I repeat the same stepd but this time i remove /var/lib/sheepdog before
re-joining the cluster)
root@test006:~/script# dog node list
Id Host:Port V-Nodes Zone
0 192.168.10.4:7000 127 67807424
1 192.168.10.5:7000 129 84584640
2 192.168.10.6:7000 129 101361856
root@test006:~/script# dog vdi list
Name Id Size Used Shared Creation time VDI id Copies Tag
test4 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:50 fd2de3 3
test1 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:48 fd32fc 3
test3 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:49 fd3662 3
test2 0 5.0 GB 864 MB 0.0 MB 2014-09-10 09:49 fd3815 3
root@test006:~/script# dog node info
Id Size Used Avail Use%
0 216 GB 3.4 GB 213 GB 1%
1 220 GB 3.4 GB 216 GB 1%
2 220 GB 3.4 GB 216 GB 1%
Total 655 GB 10 GB 645 GB 1%
Total virtual image size 20 GB
root@test006:~/script# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 3.5G 217G 2% /mnt/sheep/0
root@test005:~# dog node kill 2
root@test005:# dog vdi delete test4# dog vdi delete test3
root@test005:
root@test005:~# dog vdi delete test2
root@test005:~# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
Total 436 GB 1.8 GB 434 GB 0%
Total virtual image size 5.0 GB
root@test005:~# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheepdog 220G 945M 219G 1% /mnt/sheep/0
(re-join the cluster)
root@test006:~/script# rm -r /var/lib/sheepdog/*
root@test006:~/script# ./run_sheep.sh
root@test006:~/script# dog node info
Id Size Used Avail Use%
0 216 GB 912 MB 215 GB 0%
1 220 GB 912 MB 219 GB 0%
2 220 GB 912 MB 219 GB 0%
Total 655 GB 2.7 GB 653 GB 0%
Total virtual image size 5.0 GB
root@test006:~/script# df -h /mnt/sheep/0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg00-sheep0 220G 947M 219G 1% /mnt/sheep/0
Clearly, a document which explains how to update existing sheepdog cluster with new version is required.
Hello!
Recently I 've tried to setup a sheepdog storage using Fedora 22 and its libvirt and virt-manager packages. Currently this distribution has a libvirt 1.2.17 and virt-manager 1.2.0.
During my testcases, I 've successfully built an rpm package of sheepdog 0.9.2 taken from github repo.
Also I 've configured sheepdog with corosync cluster.
When I try to connect within virt-manager to sheepdog cluster remotely, I get errors while checking parameters to connect to sheepdog storage. I 've tried to use your wiki https://github.com/sheepdog/sheepdog/wiki/Libvirt, but it doesn't work with explanation.
Help, if you have enough time!
Would you like to replace more defines for constant values by enumerations to stress their relationships?
Hi,
Still playing with sheepdog, running IO benchmarks in my VMs, I've hit another problem (see issue #26) while running dbench in VMs: I have a sheep process eating 100% of a cpu on 2 nodes (and the fs in unresponsible on 2 of my VMs using sheepdog). On these 2 nodes, collie
command does not respond (locked).
On the still working node, I have:
collie cluster info
Cluster status: running
Cluster created at Fri Mar 2 19:14:22 2012
Epoch Time Version
2012-03-02 19:14:22 1 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]$ collie node info
Id Size Used Use%
0 1.8 TB 18 GB 0%
1 1.8 TB 19 GB 1%
2 1.8 TB 15 GB 0%
Total 5.3 TB 52 GB 0%Total virtual image size 40 GB
$ collie vdi list
Name Id Size Used Shared Creation time VDI id
squeeze-test_3 1 10 GB 8.4 GB 956 MB 2012-03-03 01:04 406758
squeeze-test_2 1 10 GB 8.1 GB 960 MB 2012-03-03 01:04 40690b
squeeze-test_1 1 10 GB 8.3 GB 964 MB 2012-03-03 01:04 406abe
s squeeze-test 1 10 GB 1.3 GB 0.0 MB 2012-03-02 19:15 ecf746
squeeze-test 2 10 GB 0.0 MB 1.3 GB 2012-03-03 01:03 ecf747
A strace on the sheep process shows it is looping on:
epoll_wait(5, {{EPOLLERR|EPOLLHUP, {u32=67111312, u64=140600116513168}}, {EPOLLERR|EPOLLHUP,{u32=4026533136, u64=140599780967696}}, {EPOLLERR|EPOLLHUP, {u32=4026533408, u64=140599780967968}}, {EPOLLERR|EPOLLHUP, {u32=4026533680, u64=140599780968240}}, {EPOLLERR|EPOLLHUP, {u32=4026533952, u64=140599780968512}}, {EPOLLERR|EPOLLHUP, {u32=4026534224, u64=140599780968784}}, {EPOLLERR|EPOLLHUP, {u32=4026534496, u64=140599780969056}}, {EPOLLERR|EPOLLHUP, {u32=4026534768, u64=140599780969328}}, {EPOLLERR|EPOLLHUP, {u32=4026535040, u64=140599780969600}}, {EPOLLERR|EPOLLHUP, {u32=4026535312, u64=140599780969872}}, {EPOLLERR|EPOLLHUP, {u32=4026535584, u64=140599780970144}}, {EPOLLERR|EPOLLHUP, {u32=67146816, u64=140600116548672}}, {EPOLLERR|EPOLLHUP, {u32=4022399216, u64=140599776833776}}, {EPOLLERR|EPOLLHUP, {u32=4022399488, u64=140599776834048}}, {EPOLLERR|EPOLLHUP, {u32=4022399760, u64=140599776834320}}, {EPOLLERR|EPOLLHUP, {u32=4022400032, u64=140599776834592}}, {EPOLLERR|EPOLLHUP, {u32=4022400304, u64=140599776834864}}, {EPOLLERR|EPOLLHUP, {u32=4022400576, u64=140599776835136}}, {EPOLLERR|EPOLLHUP, {u32=4022400848, u64=140599776835408}}, {EPOLLERR|EPOLLHUP, {u32=4022401120, u64=140599776835680}}, {EPOLLERR|EPOLLHUP, {u32=4022401392, u64=140599776835952}}, {EPOLLERR|EPOLLHUP, {u32=4022401664, u64=140599776836224}}, {EPOLLERR|EPOLLHUP, {u32=4022401936, u64=140599776836496}}, {EPOLLERR|EPOLLHUP, {u32=4022402480, u64=140599776837040}}, {EPOLLERR|EPOLLHUP, {u32=4022402752, u64=140599776837312}}, {EPOLLERR|EPOLLHUP, {u32=67187136, u64=140600116588992}}, {EPOLLERR|EPOLLHUP, {u32=88097008, u64=140600137498864}}, {EPOLLERR|EPOLLHUP, {u32=88097280, u64=140600137499136}}, {EPOLLERR|EPOLLHUP, {u32=4022402208, u64=140599776836768}}, {EPOLLERR|EPOLLHUP, {u32=88097552, u64=140600137499408}}, {EPOLLERR|EPOLLHUP, {u32=88097824, u64=140600137499680}}, {EPOLLERR|EPOLLHUP, {u32=88098096, u64=140600137499952}}}, 128, 1000) = 32
After killing the stuck sheep process on one node, the cluster seems to be back to a normal functionnal state:
$ collie cluster info
Cluster status: runningCluster created at Fri Mar 2 19:14:22 2012
Epoch Time Version
2012-03-03 14:22:17 3 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]
2012-03-03 14:22:17 2 [172.17.3.20:7000, 172.17.3.22:7000]
2012-03-02 19:14:22 1 [172.17.3.20:7000, 172.17.3.21:7000, 172.17.3.22:7000]
And one of my VMs is back to a functional state (IO unblocked) but not the other one (the one running on the node where I killed the sheep process).
I started setting up a sheepdog cluster. I started with 3 nodes and recently added a fourth one.
When I add the fourth on everything is fine, but after I reweigh the cluster, the check command fails:
~# dog cluster check
fix vdi test-4
PANIC: can't find a valid vnode
dog exits unexpectedly (Aborted).
dog.c:329: crash_handler
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf20f) [0x7ff655f2a20f]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x34) [0x7ff655ba41e4]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x147) [0x7ff655ba7397]
sheep.h:95: oid_to_vnodes
vdi.c:1680: do_vdi_check
common.c:182: parse_vdi
cluster.c:491: cluster_check
dog.c:441: main
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf4) [0x7ff655b90994]
dog() [0x403b48]
I am trying to run Sheepdog on top of a ZFS mount point. I keep getting an error:
Jan 01 12:12:14 [main] queue_request(354) CREATE_AND_WRITE_OBJ, 1
Jan 01 12:12:14 [gway 232] do_process_work(1233) 1, ad75c000003ff, 1
Jan 01 12:12:14 [gway 232] gateway_forward_request(262) ad75c000003ff
Jan 01 12:12:14 [gway 232] default_create_and_write(320) failed to open /var/lib/sheepdog/disc0/obj/000ad75c000003ff.tmp: Invalid argument
Jan 01 12:12:14 [gway 232] err_to_sderr(128) oid=ad75c000003ff, Invalid argument
Jan 01 12:12:14 [gway 232] gateway_forward_request(302) fail to write local 3
Jan 01 12:12:14 [gway 232] gateway_forward_request(307) nr_sent 0, err 3
Jan 01 12:12:14 [gway 232] do_process_work(1240) failed: 1, ad75c000003ff , 1, 3
Jan 01 12:12:14 [main] gateway_op_done(99) leaving sheepdog cluster
I can write to the mount point as a user. I also have made sure that xattr is turned on on the zfs filesystem. Is there something else I am missing?
I have verified that that sheepdog works fine with a ext3 filesystem.
as the topic,i have 2 server in my LAN(all installed sheepdog),but one can not see the other in "collie node list" , like this:
[root@kvm-1 ~]# collie node list
M Id Host:Port V-Nodes Zone
[root@kvm-2 ~]# collie node list
M Id Host:Port V-Nodes Zone
and i had stop my iptables & selinux, but it still just can see itself.. help me please,thanks a lot!
I have a testing sheepdog setup consisting of 3 nodes which are also kvm hosts. I am using Debian squeeze with backports and sheedog built from git (at rev 86a25e9). The sheepdog storage is formatted with 2 copies of the data.
Installed packages on the nodes:
qemu-kvm 1.0+dfsg-8bpo60+1bpo60+1
linux-image-3.2.0-0.bpo. 3.2.4-1
VMs are started with libvirt.
I do not have much logs but the segfault in syslog like:
kvm[17438]: segfault at 401390 ip 00007fd16f3e04d1 sp 00007fd17441fb60 error 4 in kvm
Note that I also have a "reference" VM running on one node of the cluster using a raw image disk as storage that do not segfault.
Is this a known issue?
David
I would like to point out that identifiers like "__SHEEP_H__
" and "__list_del
" do not fit to the expected naming conventions of the C/C++ language standard.
Would you like to adjust your selection for unique names?
Hi, I've experienced this problem: I' have a two node cluster with replica count = 2 using corosync. If I have network problems between the two nodes, corosync detect the new topology and the same does sheepdog but when the connectivity is restored corosync shows the right new configuration with two nodes but sheepdog not. I have to restart one node to let the cluster know the nodes are up again and start a recovery.
Sheepdog is version 0.8.0 (built by proxmox)
when I run dog vdi list -a localhost
, it displays:
It seems that sheepdog doesnot support localhost option.
And I think the option is necessary. Some up-level program may generate dog vdi create XXXX -a localhost
command when using sheepodog as backend suppport.
So I suggest maybe we can add 'localhost' option support in dog.c. Just add a simple judge statement in "getopt_long switch case 'a' statment" works well. Maybe there is better solution.
Thanks.
I tried to build RPMs for both i586 and x86_64 and faced with problem that 32-bit build is completely broken.
Here is my patch to fix build on 32-bit hosts (packages are built but I not checked how it works):
diff -ru sheepdog-0.9.2.orig/configure.ac sheepdog-0.9.2/configure.ac
--- sheepdog-0.9.2.orig/configure.ac 2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/configure.ac 2015-07-28 18:32:24.051878895 +0300
@@ -197,6 +197,7 @@
-badflag -D__gnuc_va_list=va_list -D__attribute\(x\)="
AM_CONDITIONAL(BUILD_SHA1_HW, [[[[ $host = *x86_64* ]]]])
+AM_CONDITIONAL(X86_64, [[[[ $host = *x86_64* ]]]])
AC_ARG_ENABLE([fatal-warnings],
[ --enable-fatal-warnings : enable fatal warnings. ],
diff -ru sheepdog-0.9.2.orig/include/compiler.h sheepdog-0.9.2/include/compiler.h
--- sheepdog-0.9.2.orig/include/compiler.h 2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/include/compiler.h 2015-07-28 10:12:28.430291588 +0300
@@ -164,6 +164,12 @@
#define cpu_has_avx cpu_has(X86_FEATURE_AVX)
#define cpu_has_osxsave cpu_has(X86_FEATURE_OSXSAVE)
+#else /* __x86_64__ */
+
+#define cpu_has_ssse3 0
+#define cpu_has_avx 0
+#define cpu_has_osxsave 0
+
#endif /* __x86_64__ */
#endif /* SD_COMPILER_H */
diff -ru sheepdog-0.9.2.orig/lib/fec.c sheepdog-0.9.2/lib/fec.c
--- sheepdog-0.9.2.orig/lib/fec.c 2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/lib/fec.c 2015-07-28 10:17:44.588557313 +0300
@@ -737,5 +737,8 @@
lost[0] = (unsigned char *)buf;
ec_init_tables(ed, 1, cm, ec_tbl);
- ec_encode_data_sse(len, ed, 1, ec_tbl, input, lost);
+ if (cpu_has_ssse3)
+ ec_encode_data_sse(len, ed, 1, ec_tbl, input, lost);
+ else
+ ec_encode_data(len, ed, 1, ec_tbl, input, lost);
}
diff -ru sheepdog-0.9.2.orig/lib/Makefile.am sheepdog-0.9.2/lib/Makefile.am
--- sheepdog-0.9.2.orig/lib/Makefile.am 2015-06-17 10:09:20.000000000 +0300
+++ sheepdog-0.9.2/lib/Makefile.am 2015-07-29 10:01:00.601828973 +0300
@@ -15,7 +15,7 @@
libsheepdog_a_SOURCES = event.c logger.c net.c util.c rbtree.c strbuf.c \
sha1.c option.c work.c sockfd_cache.c fec.c sd_inode.c
-libsheepdog_a_LIBADD = isa-l/bin/ec_base.o \
+libsheepdog_a_LIBADD_ = isa-l/bin/ec_base.o \
isa-l/bin/ec_highlevel_func.o \
isa-l/bin/ec_multibinary.o \
isa-l/bin/gf_2vect_dot_prod_sse.o \
@@ -27,6 +27,16 @@
isa-l/bin/gf_vect_mul_avx.o \
isa-l/bin/gf_vect_mul_sse.o
+libsheepdog_a_LIBADD_32 = isa-l/bin/ec_base.o \
+ isa-l/bin/ec_highlevel_func.o \
+ isa-l/bin/ec_multibinary.o
+
+if !X86_64
+arch = 32
+endif
+
+libsheepdog_a_LIBADD = $(libsheepdog_a_LIBADD_$(arch))
+
if BUILD_SHA1_HW
libsheepdog_a_SOURCES += sha1_ssse3.S
endif
@@ -43,7 +53,7 @@
@$(CHECK_STYLE) $(libsheepdog_a_SOURCES)
libisa.a:
- cd isa-l/ && $(MAKE) && cd ..
+ cd isa-l/ && $(MAKE) arch=$(arch) && cd ..
clean:
cd isa-l/ && $(MAKE) clean && cd ..
The manpage says:
-m, --mode [safe|quorum|unsafe]
This option controls the behavior when there are too few nodes for the configured redundancy. Mode 'safe' will halt cluster IO when (nr_nodes < nr_copies).
Mode 'quorum' will halt cluster IO when (nr_nodes < nr_copies/2 + 1). Mode 'unsafe' will never halt the cluster and therefore data loss may result.
The last line implies data loss cannot occur with quorum or safe mode, yet it's unclear why data loss would occur if only 1 node was available (when data may be read/written just fine?); or even if 0 nodes were available (when cluster IO is therefore effectively halted).
Hi ALL,
i create a cluster usring zookeeper as cluster driver, when the cluster is down and then restart, there are large logs on all instances which starting on this cluster. but the log only contain below info in /var/log/libvirt/qemu/instance-name.log :
qemu-kvm: cannot find aio_req
how can i resolve this problem?
thanks very much!
Hi,
i would like to test and deploy sheepdog as our VM disk backend because it looks like a great project, but I have some questions first:
Thanks,
Phillipp
0 192.168.2.44:7000 3 738371776
1 192.168.2.45:7000 3 755148992
2 192.168.2.46:7000 3 771926208
3 192.168.2.47:7000 3 788703424
dog vdi create -P test 10G
dd if=/dev/urandom bs=1M count=2048 | dog vdi write test
Jul 15 10:16:26 WARN [rw 6866] read_erasure_object(233) can not read 7c2b250000082b idx 0
Jul 15 10:16:26 WARN [rw 6867] read_erasure_object(233) can not read 7c2b250000082d idx 0
Jul 15 10:16:26 WARN [rw 6865] read_erasure_object(233) can not read 7c2b250000082e idx 1
Jul 15 10:37:37 WARN [rw 6832] read_erasure_object(233) can not read 7c2b2500000179 idx 0
Jul 15 10:37:37 WARN [rw 6867] read_erasure_object(233) can not read 7c2b250000017b idx 1
Jul 15 10:37:37 WARN [rw 6865] read_erasure_object(233) can not read 7c2b2500000180 idx 1
Jul 15 10:39:05 WARN [rw 6867] read_erasure_object(233) can not read 7c2b2500000008 idx 2
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
I insert back the node id 3
Jul 15 10:38:28 NOTICE [main] cluster_recovery_completion(781) all nodes are recovered, epoch 3
Jul 15 10:39:05 INFO [main] local_vdi_state_checkpoint_ctl(1467) freeing vdi state checkpoint at epoch 2
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6832] read_erasure_object(233) can not read 7c2b2500000006 idx 2
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.45:7000, op name: READ_PEER
...
Jul 15 10:40:26 WARN [rw 6832] sheep_exec_req(1188) failed No object found, remote address: 192.168.2.44:7000, op name: READ_PEER
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 ERROR [main] check_request_epoch(172) old node version 5, 4 (READ_PEER)
Jul 15 10:40:57 INFO [main] local_vdi_state_checkpoint_ctl(1467) freeing vdi state checkpoint at epoch 3
Jul 15 10:40:58 INFO [main] recover_object_main(948) object recovery progress 1%
Jul 15 10:40:59 INFO [main] recover_object_main(948) object recovery progress 2%
...
Jul 15 10:41:25 INFO [main] recover_object_main(948) object recovery progress 54%
Jul 15 10:41:26 INFO [main] recover_object_main(948) object recovery progress 55%
Jul 15 10:41:26 ERROR [rw 6832] err_to_sderr(74) diskfull, oid=7c2b250000057d
Jul 15 10:41:26 ERROR [rw 6832] recover_object_work(584) failed to recover object 7c2b250000057d
Jul 15 10:41:26 ERROR [rw 6986] err_to_sderr(74) diskfull, oid=7c2b250000057e
Jul 15 10:41:26 ERROR [rw 6986] recover_object_work(584) failed to recover object 7c2b250000057e
Jul 15 10:41:26 ERROR [rw 6865] err_to_sderr(74) diskfull, oid=7c2b250000057f
Jul 15 10:41:26 ERROR [rw 6865] recover_object_work(584) failed to recover object 7c2b250000057f
Good day.
I've done all as described in instructions (Ubuntu 12.04+CentOS 6.3).
Setuped corosync on internal network 10.10.10.* (corosync address in config is 10.10.10.0), in logs:
Feb 01 12:50:52 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.10.13) ; members(old:2 left:0)
Feb 01 12:50:52 corosync [MAIN ] Completed service synchronization, ready to provide service.
Setuped sheepdog by:
sheep /mnt (or /home to test)
collie cluster format --copies=3
[root@host log]# collie node list
M Id Host:Port V-Nodes Zone
Then sheepfs:
sheepfs /home/sheep/fs/
[root@host log]# cat /home/sheep/fs/cluster/info
Cluster status: running
Cluster created at Fri Feb 1 12:52:15 2013
Epoch Time Version
2013-02-01 12:52:15 1 [10.10.10.13:7000, 10.10.10.14:7000, 10.10.10.15:7000]
And now it's time to mount a volume:
echo test > /home/sheep/fs/vdi/mount or echo test > /mnt/fs/vdi/mount (ubuntu nodes)
The result is: bash: echo: write error: Invalid argument, tried to insert "test" by vi to the file - can't save file.
What am I doing wrong?
Thanks and best regards, Nikolay.
The subject says it all. It's too easy to run 'cluster collie format' and destroy one's repository.
Reproduction procedure is as follows.
However, not occur 100%, is occured by the timing.
$ sheep -p 7000 -z 0 -l dir=/var/log/sheep0 /var/lib/sheepdog/data0
$ sheep -p 7001 -z 1 -l dir=/var/log/sheep1 /var/lib/sheepdog/data1
$ sheep -p 7002 -z 2 -l dir=/var/log/sheep2 /var/lib/sheepdog/data2
$ ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale
/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale
/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale
$ dog cluster format
using backend plain store
$ ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
I think it is race condition at cluter format.
I've tested with an embedded debugging statements
here is debugging patch.
$ git diff
diff --git a/lib/util.c b/lib/util.c
index 21e0143..90b4f66 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -485,6 +485,7 @@ int rmdir_r(const char *dir_path)
ret = purge_directory(dir_path);
if (ret == 0)
ret = rmdir(dir_path);
sd_notice("rmdir %s", dir_path);
return ret;
}
diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 876582c..97c5078 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -230,6 +230,7 @@ static int make_stale_dir(const char *path)
char p[PATH_MAX];
snprintf(p, PATH_MAX, "%s/.stale", path);
sd_notice("mkdir %s", p);
if (xmkdir(p, sd_def_dmode) < 0) {
sd_err("%s failed, %m", p);
return SD_RES_EIO;
log message is below.
$ grep NOTICE /var/log/sheep*/sheep.log
/var/log/sheep0/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data0/obj/.stale
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data0/obj/.stale
/var/log/sheep1/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data1/obj/.stale
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data1/obj/.stale
/var/log/sheep2/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data2/obj/.stale
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data2/obj/.stale
I was uploading data and run into below panic. Reproducible.
Configuration: 3 zookeeper, 14 sheep
Jun 26 10:11:26 EMERG [http 5107] lock_table_lookup_acquire(353) PANIC: Failed to init node /sheepdog/lock/7214379
Jun 26 10:11:26 EMERG [http 5107] crash_handler(268) sheep exits unexpectedly (Aborted).
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(833) sheep.c:289: crash_handler
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(847) /lib64/libpthread.so.0(+0xf70f) [0x7f34053c070f]
Jun 26 10:11:27 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(gsignal+0x34) [0x7f340454d924]
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(abort+0x174) [0x7f340454f104]
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) zookeeper.c:355: lock_table_lookup_acquire
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) kv.c:737: onode_allocate_extents
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) kv.c:876: onode_append_data
Jun 26 10:11:28 EMERG [http 5107] sd_backtrace(833) swift.c:255: swift_put_object
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) swift.c:349: swift_handle_request
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) http.c:283: http_run_request
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(833) work.c:350: worker_routine
Jun 26 10:11:29 EMERG [http 5107] sd_backtrace(847) /lib64/libpthread.so.0(+0x79d0) [0x7f34053b89d0]
Jun 26 10:11:30 EMERG [http 5107] sd_backtrace(847) /lib64/libc.so.6(clone+0x6c) [0x7f3404603b6c]
[root@buyer sheepdog-0.9.2]# make
Making all in lib
make[1]: Entering directory /usr/local/src/sheepdog-0.9.2/lib' cd isa-l/ && make && cd .. make[2]: Entering directory
/usr/local/src/sheepdog-0.9.2/lib/isa-l'
make[2]: Nothing to be done for default'. make[2]: Leaving directory
/usr/local/src/sheepdog-0.9.2/lib/isa-l'
CC net.o
In file included from ../include/internal_proto.h:25,
from ../include/sheep.h:16,
from net.c:31:
../include/fec.h: 在函数‘ec_encode’中:
../include/fec.h:180: 错误:‘cpu_has_ssse3’未声明(在此函数内第一次使用)
../include/fec.h:180: 错误:(即使在一个函数内多次出现,每个未声明的标识符在其
../include/fec.h:180: 错误:所在的函数内也只报告一次。)
../include/fec.h: 在函数‘ec_decode_buffer’中:
../include/fec.h:210: 错误:‘cpu_has_ssse3’未声明(在此函数内第一次使用)
make[1]: *** [net.o] 错误 1
make[1]: Leaving directory `/usr/local/src/sheepdog-0.9.2/lib'
make: *** [all-recursive] 错误 1
You have mail in /var/spool/mail/root
This problem is reported in
http://www.mail-archive.com/[email protected]/msg01006.html
in the cluster_recovery_completion function,i think memcmp(vnode_info->nodes, recovereds,sizeof(*recovereds) * nr_recovereds) == 0 line have a bug.
beceuse the Function parameter vnode_info->nodes and recovereds will have different nr_vnodes number after recovery completion.vnode_info-> nodes of nr_vnodes variable calculated by recalculate_vnodes function.and recovereds of nr_vnodes variable is a fixed value.
in this case sd_store->cleanup() will not be executed.
is this a bug?
I will make it easy to write an init script
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.