hypriot / cluster-lab Goto Github PK

View Code? Open in Web Editor NEW

156.0 156.0 18.0 264 KB

Hypriot Cluster Lab

Home Page: http://blog.hypriot.com

License: MIT License

Makefile 3.35% Shell 92.95% Ruby 3.70%

cluster-lab's Introduction

rpi-golang

Docker image containing GOLANG that is compatible to the Raspberry Pi.

Build the Docker Image

make build

Run the Docker Image and get version of installed GOLANG

make version

Push Docker Image to Docker Hub

First, use a docker login with username, password and email address
Second, push Docker Image to the official Docker Hub

make push

cluster-lab's People

Contributors

Stargazers

Watchers

Forkers

whatever4711 firecyberice 4sp1r3 clebio kraeml mjgorman imagetoon-xx 40a poeblu solarlancer butangero fork-world rasata mrleehoan zhjgstc 1060460048 theoliver daplanet

cluster-lab's Issues

Takes out router

I've tried this 3 times, turn pi on with hypriot cluster image. A few seconds later the router login page becomes unresponsive and then a minute or so go by then all network connectivity is lost. I have to restart the router to get connectivity back.

Hopefully the below information will help:
Pi -> Gigabit switch -> Virgin Media SuperHub 2ac

SuperHub software version: V1.01.11, Hardware: 1.03

I'm 99% sure the router supports VLAN. And the switch should definitely support VLAN.

Allow additional labels

Allow additional labels for all containers, eg. to add traefik.enable=false

Systemd Enhancements

The current implementation does not fully support systemd control. When executing systemctl disable clusterlab it still starts up after rebooting.

rancher support

Is there any plans to support the rancher project

Set Network config (IP Address, Netmarsk, Gateway) and DNS

Hi,

I plan to use CL in a DMZ context where I have no DHCP by default nor DNS ; so I don't know if the issue should be reported there or in hypriot/flash. Let me know which one is the best.

At the end, what I would like is something like:

--ipaddress for the IP of the node with a /24 or something ; ex: --ipaddress=192.168.8.100/24 ; a variant could be --ipaddress and --netmask depending on the expected format
--gateway : for the IP of my gateway to internet : ex --gateway=192.168.8.1
--dns: to set the DNS with at least one, but ideally a list of dns like --dns="8.8.8.8 8.8.4.4"

It would lead to:

flash --hostname cl-master -ipaddress=192.168.8.100/24 --gateway=192.168.8.1 --dns="8.8.8.8 8.8.4.4" http://downloads.hypriot.com/hypriot-rpi-20151128-152209-docker-swarm-cluster.img.zip

Would it make sense ?

Some thoughts...

cluster-lab

What happens if the master goes down?
Should this have port 8400 or 8500? docker run -ti --rm hypriot/rpi-consul members -rpc-addr=192.168.200.1:8400. Now, I saw that consul uses many ports, could you explain which port does what?
It's more logical for the end user to run systemctl start cluster-lab than cluster-start, why don't start automatically from .deb pkg, where you only enable the services?
Maybe have an .conf file somewhere, where you may specify e.g. network settings
Could you combine the two systemd services into one? Or make a cluster-lab service with ExecStart and ExecStop pointing to your files. If required, you can also make a second service which is Before shutdown, which stops cluster-lab

Overall, this is a major step in the right direction
After some month of tuning, this will be a very good starting point for enthusiasts like me

kubernetes-on-arm

Can you add my kubernetes-on-arm as a related project? It is very related
This week, I'm probably going to release v0.6.2 of my repo, which will feature HypriotOS support.
This makes Kubernetes this far away:

# Install
wget <deb file> from github
dpkg -i <deb file>
kube-config install

# After that, kubectl, hyperkube, etcdctl and all required binaries are in $PATH
kube-config enable-master

# Now Kubernetes is up and running...
kubectl get no,po,svc,rc,secrets,serviceAccounts --all-namespaces

# To connect a worker, just do this from another node
kube-config enable-worker [master-ip]

# Spin up dns and a registry as cluster addons
kube-config enable-addon dns
kube-config enable-addon registry

Deb issue on vagrant up

==> follower1: dpkg-deb: building package 'hypriot-cluster-lab-src' in 'hypriot-cluster-lab-src_0.2.12-1.deb'.
==> follower1: /cluster-lab-src/vagrant
==> follower1: dpkg: error processing archive ./hypriot-cluster-lab-src_0.1.1-1.deb (--install):
==> follower1:  cannot access archive: No such file or directory
==> follower1: Errors were encountered while processing:
==> follower1:  ./hypriot-cluster-lab-src_0.1.1-1.deb
==> follower1: /tmp/vagrant-shell: line 39: cluster-lab: command not found

Happens on each box. On Latest master.

Refactor Cluster Lab

Refactor Cluster-Lab to make it more reliable

Node registers himself as master but should be slave

In the Hypriot community channel (https://gitter.im/hypriot/talk), lispmeister reported:

@MathiasRenner we tested the new (0.1.1)cluster lab SD image and we see more then one node declaring itself swarm master. Still trying to figure out why that's happening. Using the old image (0.1) with a patched cluster-start.sh script works reliably though.

Despite own tests and also users of community that approved the latest release to work fine, this error happened in my environment tonight as well.

Quickfix:
On the node that should be slave, run

sudo systemctl start cluster-stop
sudo systemctl start cluster-start

Let's debug and fix this:
The relevant parts of log output of journalctl:

...
Jan 01 01:00:13 slave1 ntpd[545]: Listen normally on 3 eth0 192.168.0.20 UDP 123

IP address via DHCP is assigned. OK.

Jan 01 01:00:13 slave1 ntpd[545]: Listening on routing socket on fd #24 for interface updates
Jan 01 01:00:13 slave1 systemd[1]: Started LSB: Start NTP daemon.
Jan 01 01:00:13 slave1 ifup[246]: Restarting ntp (via systemctl): ntp.service.
Jan 01 01:00:14 slave1 dhclient[254]: bound to 192.168.0.20 -- renewal in 375250 seconds.
Jan 01 01:00:14 slave1 ifup[246]: bound to 192.168.0.20 -- renewal in 375250 seconds.
Jan 01 01:00:14 slave1 ntpdate[588]: the NTP socket is in use, exiting
Jan 01 01:00:14 slave1 systemd[1]: Reloading OpenBSD Secure Shell server.
Jan 01 01:00:14 slave1 sshd[261]: Received SIGHUP; restarting.
Jan 01 01:00:14 slave1 systemd[1]: Reloaded OpenBSD Secure Shell server.
Jan 01 01:00:14 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_rsa_key
Jan 01 01:00:14 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_dsa_key
Jan 01 01:00:14 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Jan 01 01:00:14 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_ed25519_key
Jan 01 01:00:14 slave1 sshd[261]: Server listening on 0.0.0.0 port 22.
Jan 01 01:00:14 slave1 sshd[261]: Server listening on :: port 22.
Jan 01 01:00:14 slave1 ntpd_intres[312]: parent died before we finished, exiting
Jan 01 01:00:15 slave1 ntpdate[667]: the NTP socket is in use, exiting
Jan 01 01:00:15 slave1 systemd[1]: Reloading OpenBSD Secure Shell server.
Jan 01 01:00:15 slave1 sshd[261]: Received SIGHUP; restarting.
Jan 01 01:00:15 slave1 systemd[1]: Reloaded OpenBSD Secure Shell server.
Jan 01 01:00:15 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_rsa_key
Jan 01 01:00:15 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_dsa_key
Jan 01 01:00:15 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Jan 01 01:00:15 slave1 sshd[261]: Server listening on 0.0.0.0 port 22.
Jan 01 01:00:15 slave1 sshd[261]: Server listening on :: port 22.
Jan 01 01:00:15 slave1 sshd[261]: Could not load host key: /etc/ssh/ssh_host_ed25519_key
Jan 31 22:01:56 slave1 systemd[1]: Time has been changed
Jan 31 22:02:01 slave1 rc.local[263]: Creating SSH2 RSA key; this may take some time ...
Jan 31 22:02:01 slave1 rc.local[263]: 2048 a2:ce:f9:0a:db:e3:57:20:5a:63:68:4b:a7:4a:24:8b /etc
Jan 31 22:02:04 slave1 cluster-start.sh[413]: W: Failed to fetch http://mirrordirector.raspbian
Jan 31 22:02:04 slave1 cluster-start.sh[413]: W: Failed to fetch http://mirrordirector.raspbian
Jan 31 22:02:04 slave1 cluster-start.sh[413]: W: Failed to fetch https://packagecloud.io/Hyprio
Jan 31 22:02:04 slave1 cluster-start.sh[413]: W: Some index files failed to download. They have
Jan 31 22:02:04 slave1 cluster-start.sh[413]: install required packages
Jan 31 22:02:06 slave1 rc.local[263]: Creating SSH2 DSA key; this may take some time ...
Jan 31 22:02:06 slave1 rc.local[263]: 1024 8c:2c:1a:d2:dc:95:70:b4:7a:34:24:27:d4:0c:bc:6b /etc
Jan 31 22:02:06 slave1 rc.local[263]: Creating SSH2 ECDSA key; this may take some time ...
Jan 31 22:02:06 slave1 rc.local[263]: 256 85:e4:b0:ac:97:72:c3:bf:6b:49:b6:1a:f5:e5:ba:08 /etc/
Jan 31 22:02:07 slave1 rc.local[263]: Creating SSH2 ED25519 key; this may take some time ...
Jan 31 22:02:07 slave1 rc.local[263]: 256 a2:94:18:ac:b9:57:ed:ff:3c:06:6e:2a:7b:3b:02:d8 /etc/
Jan 31 22:02:07 slave1 cluster-start.sh[413]: WARNING: The following packages cannot be authent
Jan 31 22:02:07 slave1 cluster-start.sh[413]: libavahi-client3 avahi-utils vlan
Jan 31 22:02:07 slave1 cluster-start.sh[413]: E: There are problems and -y was used without --f
Jan 31 22:02:07 slave1 cluster-start.sh[413]: create vlan with tag 200 on eth0
Jan 31 22:02:07 slave1 kernel: 8021q: 802.1Q VLAN Support v1.8
Jan 31 22:02:07 slave1 cluster-start.sh[413]: configure avahi only on eth0.200 \(vlan with id 2
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Files changed, reloading.
Jan 31 22:02:07 slave1 avahi-daemon[347]: No service file found in /etc/avahi/services.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Joining mDNS multicast group on interface eth0.200.IP
Jan 31 22:02:07 slave1 avahi-daemon[347]: New relevant interface eth0.200.IPv4 for mDNS.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Registering new address record for 192.168.200.5 on e
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #-----------------
Jan 31 22:02:07 slave1 cluster-start.sh[413]: # check if leader
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #-----------------
Jan 31 22:02:07 slave1 cluster-start.sh[413]: set ip address on vlan 200
Jan 31 22:02:07 slave1 cluster-start.sh[413]: /usr/local/bin/cluster-start.sh: line 215: avahi-
Jan 31 22:02:07 slave1 avahi-daemon[347]: Withdrawing address record for 192.168.200.5 on eth0.
Jan 31 22:02:07 slave1 avahi-daemon[347]: Leaving mDNS multicast group on interface eth0.200.IP
Jan 31 22:02:07 slave1 avahi-daemon[347]: Interface eth0.200.IPv4 no longer relevant for mDNS.
Jan 31 22:02:07 slave1 cluster-start.sh[413]: if CLUSTERMASTERIP is empty then this machine is 
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #####################
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #                   #
Jan 31 22:02:07 slave1 cluster-start.sh[413]: # configure node as #
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #                   #
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #  cluster master   #
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #                   #
Jan 31 22:02:07 slave1 cluster-start.sh[413]: #####################
...

What's truncated at this line:

Jan 31 22:02:04 slave1 cluster-start.sh[413]: W: Failed to fetch http://mirrordirector.raspbian

... is a .gpg at the end. Without this key, installing avahi-browse client fails later (log says WARNING: The following packages cannot be authent [...])

Thus, the error must be somewhere before the Failed to fetch - any ideas?

sed command failing with vagrant and Docker 1.11

While starting the cluster-lab with a vagrant up after vagrant destroy I get the following log output:

==> follower2: Setting up hypriot-cluster-lab-src (0.2.12-1) ...
==> follower2: Created symlink from /etc/systemd/system/multi-user.target.wants/cluster-lab.service to /etc/systemd/system/cluster-lab.service.
==> follower2: cp:
==> follower2: cannot stat ‘/etc/systemd/system/docker.service’
==> follower2: : No such file or directory
==> follower2: sed: can't read /etc/systemd/system/docker.service: No such file or directory

A docker info against Swarm results in the following output:

root@follower1:/home/vagrant# DOCKER_HOST=tcp://192.168.200.1:2378 docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 3
 (unknown): 192.168.200.45:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T05:03:41Z
 (unknown): 192.168.200.1:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T04:58:31Z
 (unknown): 192.168.200.26:2375
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels:
  └ Error: Cannot connect to the docker engine endpoint
  └ UpdatedAt: 2016-06-08T05:01:01Z
Plugins:
 Volume:
 Network:
Kernel Version: 4.2.0-30-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: e62a0f42529d
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support

A docker info against the local Docker installation results in

root@follower1:/home/vagrant# docker info
Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.11.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.2.0-30-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.9 MiB
Name: follower1
ID: FJYP:QGBI:QQRC:DCXS:OEOW:36JV:JMPV:DTFV:6B6K:C4XO:PEQO:LJYE
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

A cluster-lab health shows

root@follower1:/home/vagrant# cluster-lab health

Internet Connection
  [PASS]   eth1 exists
  [PASS]   eth1 has an ip address
  [PASS]   Internet is reachable
  [PASS]   DNS works

Networking
  [PASS]   eth1.200 exists
  [PASS]   eth1.200 has correct IP from vlan network
  [PASS]   Cluster leader is reachable
  [PASS]   eth1.200 has exactly one IP
  [PASS]   eth1.200 has no local link address
  [PASS]   Avahi process exists
  [PASS]   Avahi is using eth1.200

Docker
  [PASS]   Docker is running
  [FAIL]   Docker is configured to use Consul as key-value store
  [FAIL]   Docker is configured to listen via tcp at port 2375
  [FAIL]   Docker listens on 192.168.200.26 via tcp at port 2375 (Docker-Engine)

Consul
  [PASS]   Consul Docker image exists
  [PASS]   Consul Docker container is running
  [PASS]   Consul is listening on port 8300
  [PASS]   Consul is listening on port 8301
  [PASS]   Consul is listening on port 8302
  [PASS]   Consul is listening on port 8400
  [PASS]   Consul is listening on port 8500
  [PASS]   Consul is listening on port 8600
  [PASS]   Consul API works
  [PASS]   Cluster-Node is pingable with IP 192.168.200.26
  [PASS]   Cluster-Node is pingable with IP 192.168.200.45
  [PASS]   Cluster-Node is pingable with IP 192.168.200.1
  [PASS]   No Cluster-Node is in status 'failed'
  [FAIL]   Consul is able to talk to Docker-Engine on port 7946 (Serf)

Swarm
  [PASS]   Swarm-Join Docker container is running
  [PASS]   Swarm-Manage Docker container is running
  [PASS]   Number of Swarm and Consul nodes is equal which means our cluster is healthy

It seems the Docker daemon was not configured correctly by the cluster-lab.

I guess the problem is related to the following line:
https://github.com/hypriot/cluster-lab/blob/master/package/usr/local/lib/cluster-lab/docker_lib#L79-L81

@firecyberice What do you think?

Configuration for docker daemon

A configuration option for the docker daemon would be nice.

Let's say I want to run a local docker registry and use that with all my daemons. I would have to change the /etc/default/docker file to add --insecure-registry myregistrydomain.com:5000 for every daemon in my cluster. Only possible way would be to change the cluster-start.sh file, since it will be overwritten otherwise.
Even better when I can supply this config with the flash tool, too. This makes it a very easy setup then.

Is it possible for Cluster Lab on wlan0 instead of eth0?

Hello, I've followed the instructions in this Hypriot Blog Post. Could that be setup Cluster Lab using Pi 3's wifi chip instead of using eth0 (Ethernet)?

I tried to edit /etc/cluster-lab/cluster.conf manually. When I cluster-lab stop and then start. It shows me:

Got more than one IP address on wlan0.200: 192.168.200.1

Is that a bad idea to build a cluster using wlan? If not, how to config Cluster Lab on the Pi 3?
Thanks!

firstboot_done : command not found

Hi,

Starting from basic hypriot os image (latest one), I then installed cluster-lab package and when I do systemctl start cluster-start, at the end, I have:

firstboot_done: command not found

So I hope, it was just a useless text that I missed.

At least, consul UI is up and running.

Brw, on home page project on github, package is cluster-laband not hypriot-cluster-lab.

Initial thoughts....

In a previous incarnation, the cluster was working on B+ as well as 2B -- is the new direction to be 2B only?
Your readme points to localhost for the location of the SD card images.

cluster-stop.sh line33

sed -i -e 's/use-ipv6=no/use-ipv6=yes' /etc/avahi/avahi-daemon.conf

should be

sed -i -e 's/use-ipv6=no/use-ipv6=yes/' /etc/avahi/avahi-daemon.conf

Thanks.

Consul container not starting

When running the OS for the first time I can see that the docker container for consul is not starting:

$ docker ps
CONTAINER ID        IMAGE                COMMAND                  CREATED              STATUS                         PORTS                    NAMES
3528081eb5dc        hypriot/rpi-swarm    "/swarm manage consul"   About a minute ago   Up About a minute              0.0.0.0:2378->2375/tcp   bin_swarmmanage_1
db3898d5fc5f        hypriot/rpi-consul   "/consul agent -serve"   About a minute ago   Restarting (1) 9 seconds ago                            bin_consul_1
7d9b0936d53e        hypriot/rpi-swarm    "/swarm join --advert"   About a minute ago   Up About a minute              2375/tcp                 bin_swarm_1

So i did docker logs bin_consul_1 to see what was happening :

==> Error starting agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Error starting agent: Failed to get advertise address: Multiple private IPs found. Please configure one.

I'seen on the blog's comment that I am not the only one with this issue.

Investigate if we could use Swarm Replicas to make the Cluster Lab more resilient

See High Availability in Docker Swarm for possible solutions on how to do that.

Having more than one Swarm Manager would make the Cluster Lab more resilient in the event that the master node goes down.

Consul broken ?

Hi,

I started my picosluter and clusterlab with sd-card-rpi-v0.5.14.img and upgraded it.

As notice in #36, I saw that docker was no longer working. So I removed the /etc/docker/daemon.jsonfile but anyway. Docker and Cluster-lab starts well but consul container is always restarting.

For what I can see on my master node:

HypriotOS/armv7: pirate@pico-master in ~
$ docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                         PORTS                    NAMES
c0398b11ece3        hypriot/rpi-swarm:1.2.2    "/swarm manage --repl"   9 seconds ago       Up 7 seconds                   0.0.0.0:2378->2375/tcp   cluster_lab_swarmmanage
7495b4163adb        hypriot/rpi-swarm:1.2.2    "/swarm join --advert"   11 seconds ago      Up 9 seconds                   2375/tcp                 cluster_lab_swarm
82457f97e74d        hypriot/rpi-consul:0.6.4   "/consul agent -serve"   15 seconds ago      Restarting (1) 2 seconds ago                            cluster_lab_consul

$ docker logs cluster_lab_consul
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
==> dial tcp 192.168.200.1:8301: getsockopt: connection refused
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Joining cluster...
==> dial tcp 192.168.200.1:8301: getsockopt: connection refused

and:

$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled)
   Active: active (running) since Fri 2016-05-27 21:34:19 UTC; 55s ago
     Docs: https://docs.docker.com
 Main PID: 1116 (docker)
   CGroup: /system.slice/docker.service
           ├─1116 /usr/bin/docker daemon --storage-driver overlay --host fd:// --debug --host tcp://192.168.200.31:2375 --cluster-advertise 192.168.200.31:2375 --cluster-sto...
           ├─1121 docker-containerd -l /var/run/docker/libcontainerd/docker-containerd.sock --runtime docker-runc --debug --metrics-interval=0
           ├─1395 docker-containerd-shim 7495b4163adb8d323bfb41671212d75aef65d04ca5264519aa90f4dbd0f91e12 /var/run/docker/libcontainerd/7495b4163adb8d323bfb41671212d75aef65d...
           ├─1476 docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 2378 -container-ip 172.17.0.3 -container-port 2375
           └─1480 docker-containerd-shim c0398b11ece30d3c24cc0c8c5ec1851302dff382b983472719dbecb0ba64036a /var/run/docker/libcontainerd/c0398b11ece30d3c24cc0c8c5ec1851302dff...

May 27 21:34:48 pico-master docker[1116]: time="2016-05-27T21:34:48.214344343Z" level=debug msg="logs: begin stream"
May 27 21:34:48 pico-master docker[1116]: time="2016-05-27T21:34:48.219792562Z" level=debug msg="logs: end stream"
May 27 21:34:53 pico-master docker[1116]: time="2016-05-27T21:34:53.723090296Z" level=debug msg="received containerd event: &types.Event{Type:\"start-container\",...x5748bd7d}"
May 27 21:34:53 pico-master docker[1116]: time="2016-05-27T21:34:53.726771710Z" level=debug msg="event unhandled: type:\"start-container\" id:\"82457f97e74d5251a6...464384893 "
May 27 21:34:53 pico-master docker[1116]: time="2016-05-27T21:34:53Z" level=debug msg="containerd: process exited" id=82457f97e74d5251a6f5b5a619f7bd61db00b1c5c92e...temPid=1784
May 27 21:34:53 pico-master docker[1116]: time="2016-05-27T21:34:53.924103578Z" level=debug msg="received containerd event: &types.Event{Type:\"exit\", Id:\"82457...x5748bd7d}"
May 27 21:34:58 pico-master docker[1116]: time="2016-05-27T21:34:58.647580399Z" level=warning msg="Registering as \"192.168.200.31:2375\" in discovery failed: can...n sessions"
May 27 21:34:58 pico-master docker[1116]: time="2016-05-27T21:34:58.683884447Z" level=error msg="discovery error: Get http://192.168.200.31:8500/v1/kv/docker/node...on refused"
May 27 21:34:58 pico-master docker[1116]: time="2016-05-27T21:34:58.684968644Z" level=error msg="discovery error: Put http://192.168.200.31:8500/v1/kv/docker/node...on refused"
May 27 21:34:58 pico-master docker[1116]: time="2016-05-27T21:34:58.686406219Z" level=error msg="discovery error: Unexpected watch error"
Hint: Some lines were ellipsized, use -l to show in full.

$ sudo systemctl status cluster-lab -l
● cluster-lab.service - hypriot-cluster-lab
   Loaded: loaded (/etc/systemd/system/cluster-lab.service; enabled)
   Active: active (exited) since Fri 2016-05-27 21:34:30 UTC; 12min ago
 Main PID: 888 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/cluster-lab.service
           └─975 dhclient eth0.200

May 27 21:33:36 pico-master cluster-lab[327]: dpkg-query: error: error writing to '<standard output>': Broken pipe
May 27 21:33:46 pico-master cluster-lab[327]: Device "eth0.200" does not exist.
May 27 21:33:46 pico-master cluster-lab[327]: dpkg-query: error: error writing to '<standard output>': Broken pipe
May 27 21:33:47 pico-master cluster-lab[327]: dpkg-query: error: error writing to '<standard output>': Broken pipe
May 27 21:33:49 pico-master cluster-lab[888]: dpkg-query: error: error writing to '<standard output>': Broken pipe
May 27 21:33:49 pico-master cluster-lab[888]: dpkg-query: error: error writing to '<standard output>': Broken pipe
May 27 21:33:53 pico-master dhclient[965]: DHCPREQUEST on eth0.200 to 255.255.255.255 port 67
May 27 21:33:53 pico-master dhclient[965]: DHCPACK from 192.168.200.1
May 27 21:34:30 pico-master systemd[1]: Started hypriot-cluster-lab.

What else do you need ? How can I fix it ?

Thanks,
Nicolas

Add options to secure Consul and Swarm communication

Currently the communication between Consul nodes and also Swarm nodes is not secured.

We need to look which ways for securing the communication we want to use and the implement one.

Test with armbian on cubietruck

Based on fresh jessie armbian installation and following the blog guide as a base.

Installed docker-compose from dpkg directly, since the following did not provide the package, only the docker-hypriot package.

curl -s https://packagecloud.io/install/repositories/Hypriot/Schatzkiste/script.deb.sh | sudo bash

Armbian does use init instead of systemd by default, which leads to:

$ sudo systemctl start cluster-start
Failed to get D-Bus connection: Unknown error -1

Found the following solution to change to systemd http://forum.armbian.com/index.php/topic/342-dbus-error-when-using-systemctl/

Latest not so easy solved problem is that the kernel does not support vxlan, which leads to not working overlay networks. In order to make this work the kernel has to be recompiled with the option -> Device Drivers -> Network device support -> Network core driver support -> Virtual eXtensible Local Area Network (VXLAN) [m].

/etc/resolv.conf points to private name servers

While running up a cluster-lab imaged pi to be a dhcp server, I noticed that the resolv.conf points to private nameservers in Germany.

I know that the expected usecase is for the cluster-lab hosts to have dhcp provide them with their IP/name resolution data, but on the off chance that someone is either exploring (or a serious edge case like a travelling cloud-in-a-box), would it make sense to have it point by default to the google public nameservers (8.8.8.8 and 8.8.4.4)?

Add additional delay in startup sequence

Since consul takes some time to complete the startup process on the slave nodes, the swarm join functionality is not registering the current node in the key value store of consul.

An additional delay between starting up consul and swarm join should help to overcome this issue.

Avahi (and avahi browse) delay

Hi,

on a rPI Model B Revision 2.0, 512MB RAM,
using http://downloads.hypriot.com/hypriot_20160121-235123_clusterlab.img.zip
i had to modify the start script in order for the system to pickup on master.

(otherwise 192.168.200.1 was always assigned on node-1, so i had to debug avahi-browse output and it only 'refreshes' when the daemon is restarted)

echo -e "#-----------------\n# check if leader\n#-----------------"

setip 192.168.200.254
sleep 5
systemctl restart avahi-daemon.service
sleep 5

CLUSTERMASTERIP=$(avahi-browse _cluster._tcp -t -r -p | grep 'os-release=hypriot' | grep '^=' | grep ';Cluster-Master' |  grep 'eth0\.' | grep IPv4 | awk -F ';' 'BEGIN { format="%s\n" }{ printf(format,$8) }')

is this something worth a change (by me via PR-ing) or you have other plans related to this ?

Not Working on Pi 3B

Hello,

I recently decided to try Hypriot Cluster Lab on my Pi 3b(s) and 2b(s). On the 2b(s) it worked perfectly, but on the 3b(s), it will refuse to boot, when I plug them in, the red light turns on and nothing happens. Help?

Thanks in advance,
Zach Hilman

Fix false failing test

When the Cluster Lab starts up it often show the following error

[FAIL]   Consul is able to talk to Docker-Engine on port 7946 (Serf)

while actually everything is working.

So we need to give it more time or to make the test otherwise more reliable.

Allow docker swarm token to work

We are finding that consul for large networks is unstable. Members do not join properly for instance. For more flexibility, it would be great to allow the use of hosted swarm that use the token:// with a unique token address rather than consul. This means that the system needs to come up with a slightly modified token. And you should get the token from the user or by running docker run swarm create which will give you a guid. I haven't tried it bu docker run hypriot/rpi-swarm create should do the same.

The specific change is to modify the cluster-lab.conf so that there is a DOCKER_CONSUL_CONF for consul and a DOCKER_SWARM_CONF for non-consul. Then the cluster-lab script needs to be able to switch. Perhaps add a new command to cluster-lab script. There are bunch of different approaches but the core line that needs to change is assuming $TOKEN has the unique id

DOCKER_OPTS='{\\n \
  \"storage-driver\": \"overlay\", \\n \
  \"hosts\": [\"fd://\", \"tcp://${CLUSTER_NODE_IP}:2375\"], \\n \
  \"cluster-advertise\": \"${CLUSTER_NODE_IP}:2375\", \\n \
  \"cluster-store\": \"consul://${CLUSTER_NODE_IP}:8500\", \\n \
  \"label\": [\"hypriot.arch=${ARCHITECTURE}\",\"hypriot.hierarchy=${CLUSTER_NODE_ROLE}\"] \\n \
}'

For the slaves you need to be smart about what swarm image you execute depending on Intel vs arm roughly (as I have not yet figured out all the escapes needed):

DOCKER_OPTS='{\\n \
  \"storage-driver\": \"overlay\", \\n \
  \"swarm\"  \\n \
  \"swarm-image\": \"hypriot/rpi-swarm", \\n \  
  \"swarm-advertise\": \"${CLUSTER_TOKEN}", \\n \
  \"label\": [\"hypriot.arch=${ARCHITECTURE}\",\"hypriot.hierarchy=${CLUSTER_NODE_ROLE}\"] \\n \
}'

For the master you need all of the above plus

  \"swarmmaster\": \"overlay\", \\n \

Not able to run the command: docker -H tcp://192.168.200.1:2378 info

I have installed latest cluster lab in my macbook with EL Captain. I am able to successfuly logging in to vagrant leader through vagrant ssh leader from mac machine. When I tried executing the command sudo su and docker -H tcp://192.168.200.1:2378 info, I am getting FATA[0000] Cannot connect to the Docker daemon. Is 'docker -d' running on this host?

Thanks.
Vasu

Example for docker vagrant provider ?

Hello,

The cluster lab main example shows using Vagrant with Virtualbox as an example. I'm on a headless CentOS 7 system with only docker available.

I've been able to run a vagrant + docker example such as : https://github.com/bubenkoff/vagrant-docker-example

But when I try to add "--provider docker" to the first vagrant instruction, I get :

[root@test-vm vagrant]# /usr/bin/vagrant up --provider=docker
No usable default provider could be found for your system.

Or maybe using a docker provider for cluster-lab is not possible ?

Move deployment to Travis CI

Work in progress here: https://github.com/hypriot/cluster-lab/tree/move_to_Travis-CI

Docker Daemon

By default, the Docker Daemon is started with ExecStart=/usr/bin/docker daemon -H fd://. This behavior must be overwritten to be applicable to a json configuration.
I suggest to write a file in /etc/systemd/system/docker.service.d containing

[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon

It modifies the default start behavior. Beware to call systemctl daemon-reload before starting docker again.

UI container can't find ./dockerui

Anyone else having trouble starting the UI container?

docker -H MASTER_IP:2378 run --cidfile=dockeruipull -d -p 9000:9000 --env="constraint:node==hypriot11" --name dockerui1 hypriot/rpi-dockerui -e http://MASTER_IP:2378
7496213a97ad621da51d809dc16c6e6d4574a5958fc48355aff36b71d0efca15
Error response from daemon: Cannot start container 7496213a97ad621da51d809dc16c6e6d4574a5958fc48355aff36b71d0efca15: [8] System error: exec: "./dockerui": stat ./dockerui: no such file or directory

Move config from daemon.json to docker.service

To make cluster-lab compatible to our lastest image-builder-rpi v0.5.17 which is now compatible with docker-machine we also have to move the config from daemon.json back to docker.service.

supporting rancher