coreos / quartermaster Goto Github PK

A framework for managing containerized storage systems on top of Kubernetes

License: Apache License 2.0

Go 98.46% Makefile 1.54%

quartermaster's Introduction

Overview

Quartermaster is a framework for managing containerized storage systems like Ceph, GlusterFS, NFS-Ganesha, Rook and others on top of Kubernetes. Quartermaster enables the deployment, installation, and integration of these type of storage systems onto a Kubernetes cluster. Quartermaster abstracts this complexity and presents the client with a simple storage deployment model which is fully integrated with Kubernetes. By simplifying the deployment of storage systems, Quartermaster makes it possible to easily and reliably deploy, upgrade, and get status of a desired storage system in a Kubernetes cluster. Once deployed, a Quartermaster managed storage system could be used to fulfill persistent volume claim requests. Quartermaster can also be used to help the setup and testing of PersistentVolumes provided by containerized storage systems deployed in Kubernetes. Today, Quartermaster supports GlusterFS, but it is designed to easily be extended to support a multitude of storage systems on top of Kubernetes.

Project status: Alpha

Quartermaster is under heavy development. We are looking at maturing the Quartermaster framework as well as adding support for more storage systems

More information

Getting Started

Kubernetes Development Environment Setup

You can use minikube but only for mock or nfs storage deployments since minikube is only a single node. To create a Kubernetes cluster with storage, it is recommended you use Matchbox/Bootkube or Kubernetes-CentOS. Check driver documentation for any requirements on these systems.

Deploying quartermaster

Deploy Quartermaster to the kube-system namespace:

kubectl run -n kube-system kube-qm --image=quay.io/lpabon/qm

Now that Quartermaster is running, you can deploy one of the following storage systems onto your Kubernetes cluster.

Mock: The fake storage system deployment used for developers.
NFS: The simplest storage system deployment.
GlusterFS: A reliable distributed file system.
Swift: OpenStack Swift object storage

Developers

Developers please see the following documentation on how to build and participate in the quartermaster project.

Licensing

Quartermaster is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

quartermaster's People

Contributors

Stargazers

Watchers

Forkers

ericchiang thiagodasilva lpabon ivancherepov ianchakeres freesky-edward leseb ritazh aliouba pombredanne aland-zhang xiaoju isabella232 frankfanslc

quartermaster's Issues

README: quick start guide

README needs to explain what this is (linking to the design doc) and include the current getting started guide. Example: https://github.com/coreos/etcd#getting-started

Document Driver API so it can be used by godoc

StorageClass name should be part of the StorageCluster object

This will allow the administrator to default a name of a storage class in the StorageCluster. For example: StorageClass "file", and "block" could be for GlutserFS and Ceph, respectively.

Glusterfs driver needs to use incluster heketi address

Update to kubernetes 1.5.2 in Glide

clean up driver API

We need to have a clean and documented driver API for driver writers to consume.

Ideally a good godoc should also be provided.

Cleanup all logging functions to use pkg/utils like glusterfs driver

setup CI

Travis CI should be good enough initially. Not sure how should we do that before we open source this though.

Zone in GlusterFS is supposed to be optional

If no zone is available assume zone 1

README doc

Obviously, we need a README to talk about:

motivation about QM
high level introduction about the project
a quick start guide
a link to our initial GlusterFS demo

operator.go unable to create gluster deployment on LXD

Environment:

Canonical distribution of Kubernetes (CDK) via $ conjure-up on top of LXD (Ubuntu 16.04 LTS)

$ juju status

Model                         Controller                Cloud/Region         Version  SLA
conjure-canonical-kubern-4d5  conjure-up-localhost-b36  localhost/localhost  2.2.1    unsupported

App                    Version  Status  Scale  Charm                  Store       Rev  OS      Notes
easyrsa                3.0.1    active      1  easyrsa                jujucharms   12  ubuntu
etcd                   2.3.8    active      3  etcd                   jujucharms   40  ubuntu
flannel                0.7.0    active      4  flannel                jujucharms   20  ubuntu
kubeapi-load-balancer  1.10.3   active      1  kubeapi-load-balancer  jujucharms   16  ubuntu  exposed
kubernetes-master      1.7.0    active      1  kubernetes-master      jujucharms   35  ubuntu
kubernetes-worker      1.7.0    active      3  kubernetes-worker      jujucharms   40  ubuntu  exposed

Unit                      Workload  Agent  Machine  Public address  Ports           Message
easyrsa/0*                active    idle   4        10.182.159.175                  Certificate Authority connected.
etcd/0                    active    idle   6        10.182.159.112  2379/tcp        Healthy with 3 known peers
etcd/1*                   active    idle   1        10.182.159.171  2379/tcp        Healthy with 3 known peers
etcd/2                    active    idle   0        10.182.159.186  2379/tcp        Healthy with 3 known peers
kubeapi-load-balancer/0*  active    idle   5        10.182.159.90   443/tcp         Loadbalancer ready.
kubernetes-master/0*      active    idle   2        10.182.159.174  6443/tcp        Kubernetes master running.
  flannel/0               active    idle            10.182.159.174                  Flannel subnet 10.1.45.1/24
kubernetes-worker/0       active    idle   3        10.182.159.219  80/tcp,443/tcp  Kubernetes worker running.
  flannel/3               active    idle            10.182.159.219                  Flannel subnet 10.1.81.1/24
kubernetes-worker/1*      active    idle   7        10.182.159.165  80/tcp,443/tcp  Kubernetes worker running.
  flannel/1*              active    idle            10.182.159.165                  Flannel subnet 10.1.96.1/24
kubernetes-worker/2       active    idle   8        10.182.159.49   80/tcp,443/tcp  Kubernetes worker running.
  flannel/2               active    idle            10.182.159.49                   Flannel subnet 10.1.76.1/24

Machine  State    DNS             Inst id        Series  AZ  Message
0        started  10.182.159.186  juju-0361f0-0  xenial      Running
1        started  10.182.159.171  juju-0361f0-1  xenial      Running
2        started  10.182.159.174  juju-0361f0-2  xenial      Running
3        started  10.182.159.219  juju-0361f0-3  xenial      Running
4        started  10.182.159.175  juju-0361f0-4  xenial      Running
5        started  10.182.159.90   juju-0361f0-5  xenial      Running
6        started  10.182.159.112  juju-0361f0-6  xenial      Running
7        started  10.182.159.165  juju-0361f0-7  xenial      Running
8        started  10.182.159.49   juju-0361f0-8  xenial      Running

Relation           Provides               Consumes               Type
certificates       easyrsa                etcd                   regular
certificates       easyrsa                kubeapi-load-balancer  regular
certificates       easyrsa                kubernetes-master      regular
certificates       easyrsa                kubernetes-worker      regular
cluster            etcd                   etcd                   peer
etcd               etcd                   flannel                regular
etcd               etcd                   kubernetes-master      regular
cni                flannel                kubernetes-master      regular
cni                flannel                kubernetes-worker      regular
loadbalancer       kubeapi-load-balancer  kubernetes-master      regular
kube-api-endpoint  kubeapi-load-balancer  kubernetes-worker      regular
cni                kubernetes-master      flannel                subordinate
kube-control       kubernetes-master      kubernetes-worker      regular
cni                kubernetes-worker      flannel                subordinate

$ kubectl version:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T22:55:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Description

Following the demo:

kubectl run -n kube-system kube-qm --image=quay.io/lpabon/qm
kubectl apply -f quartermaster/examples/glusterfs/matchbox-bootkube.yaml

Expected Behavior:

$ kubectl get po --all-namespaces
NAME                                  READY     STATUS    RESTARTS   AGE          
demo-busybox-9rw9g                    1/1       Running   0          14s                                      
demo-busybox-gcjj1                    1/1       Running   0          14s                                      
demo-busybox-qw50t                    1/1       Running   0          14s                                      
gluster-4094793719-2213454022-8p27z   1/1       Running   0          3m                                       
gluster-4132804602-820420739-bwrpx    1/1       Running   0          3m                                       
gluster-4170815485-1520607404-f1p92   1/1       Running   0          3m                                       
heketi-4126678016-r530m               1/1       Running   0          4m                                       
nginx                                 1/1       Running   0          14s

gluster pods up.

System Behavior (on cdk):

$ kubectl get po --all-namespaces

NAMESPACE     NAME                                    READY     STATUS    RESTARTS   AGE
default       default-http-backend-dz3s7              1/1       Running   0          16h
default       heketi-3769712726-vxr0s                 1/1       Running   0          40m
default       nginx-ingress-controller-7wsjd          1/1       Running   0          16h
default       nginx-ingress-controller-bspjv          1/1       Running   0          16h
default       nginx-ingress-controller-n8v5d          1/1       Running   0          16h
kube-system   heapster-v1.4.0-2637284905-c26mz        4/4       Running   1          16h
kube-system   kube-dns-3097350089-wpx1d               3/3       Running   0          17h
kube-system   kube-qm-3053120389-kfb2n                1/1       Running   0          48m
kube-system   kubernetes-dashboard-1265873680-q0pgg   1/1       Running   0          17h
kube-system   monitoring-influxdb-grafana-v4-ztqqn    2/2       Running   0          17h

missing gluster pods!

$ kubectl -n kube-system logs kube-qm-3053120389-kfb2n

main INFO 2017/08/09 17:52:41 main.go:51: Version: v0.0.4
operator INFO 2017/08/09 17:52:41 operator.go:101: storage driver nfs loaded
operator INFO 2017/08/09 17:52:41 operator.go:101: storage driver mock loaded
operator INFO 2017/08/09 17:52:41 operator.go:101: storage driver glusterfs loaded
operator INFO 2017/08/09 17:52:41 operator.go:101: storage driver swift loaded
operator INFO 2017/08/09 17:52:41 operator.go:131: connection to Kubernetes established. Cluster version v1.7.0
operator INFO 2017/08/09 17:52:41 tpr.go:84: storage-status.storage.coreos.com TPR created
operator INFO 2017/08/09 17:52:41 tpr.go:84: storage-node.storage.coreos.com TPR created
operator INFO 2017/08/09 17:52:41 tpr.go:84: storage-cluster.storage.coreos.com TPR created
operator DEBUG 2017/08/09 17:52:50 operator.go:150: register event handlers
operator INFO 2017/08/09 17:52:50 operator.go:189: Waiting for synchronization with Kubernetes TPR
operator DEBUG 2017/08/09 17:52:50 operator.go:167: addDeployment trigger for deployment add
operator DEBUG 2017/08/09 17:52:50 operator.go:167: addDeployment trigger for deployment add
operator DEBUG 2017/08/09 17:52:50 operator.go:167: addDeployment trigger for deployment add
operator DEBUG 2017/08/09 17:52:50 operator.go:167: addDeployment trigger for deployment add
operator INFO 2017/08/09 17:52:50 operator.go:195: Synchronization complete
operator DEBUG 2017/08/09 17:57:50 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 17:57:50 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 17:57:50 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 17:57:50 operator.go:175: addDeployment trigger for deployment update
operator ERROR 2017/08/09 17:57:50 client.go:61: EOF
operator ERROR 2017/08/09 17:57:54 client.go:76: EOF
operator INFO 2017/08/09 17:59:36 cluster_operator.go:96: reconcile storagenode key default/gluster
glusterfs DEBUG 2017/08/09 17:59:36 glusterfs.go:95: msg add cluster cluster gluster
glusterfs DEBUG 2017/08/09 17:59:36 heketi_kube.go:228: service account created
operator DEBUG 2017/08/09 17:59:36 operator.go:167: addDeployment trigger for deployment add
operator DEBUG 2017/08/09 17:59:36 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 17:59:36 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 17:59:37 operator.go:175: addDeployment trigger for deployment update
operator DEBUG 2017/08/09 18:00:57 operator.go:175: addDeployment trigger for deployment update
glusterfs DEBUG 2017/08/09 18:01:00 heketi_kube.go:207: heketi deployed
glusterfs DEBUG 2017/08/09 18:01:00 heketi_kube.go:270: service account created
glusterfs DEBUG 2017/08/09 18:01:00 heketi_kube.go:50: Got: http://10.152.183.222:8080
glusterfs DEBUG 2017/08/09 18:01:35 glusterfs.go:115: Created cluster gluster cluster id 10e7f7d4ed34f85b3151fa5decc922a3
operator INFO 2017/08/09 18:01:35 cluster_operator.go:171: update gluster
operator INFO 2017/08/09 18:01:35 cluster_operator.go:273: created storage node gluster-4094793719
operator DEBUG 2017/08/09 18:01:35 operator.go:153: enqueueStorageNode trigger for storage add
operator ERROR 2017/08/09 18:01:35 operator.go:365: unable to create deployment gluster-4094793719: Deployment.apps "gluster-4094793719" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
operator ERROR 2017/08/09 18:01:35 operator.go:246: reconciliation failed: unable to create deployment gluster-4094793719: Deployment.apps "gluster-4094793719" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
E0809 18:01:35.818959       1 operator.go:246] reconciliation failed: unable to create deployment gluster-4094793719: Deployment.apps "gluster-4094793719" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
operator DEBUG 2017/08/09 18:01:35 operator.go:153: enqueueStorageNode trigger for storage add
operator INFO 2017/08/09 18:01:35 cluster_operator.go:273: created storage node gluster-4132804602
operator ERROR 2017/08/09 18:01:35 operator.go:365: unable to create deployment gluster-4132804602: Deployment.apps "gluster-4132804602" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
operator ERROR 2017/08/09 18:01:35 operator.go:246: reconciliation failed: unable to create deployment gluster-4132804602: Deployment.apps "gluster-4132804602" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
E0809 18:01:35.827979       1 operator.go:246] reconciliation failed: unable to create deployment gluster-4132804602: Deployment.apps "gluster-4132804602" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
operator DEBUG 2017/08/09 18:01:35 operator.go:153: enqueueStorageNode trigger for storage add
operator INFO 2017/08/09 18:01:35 cluster_operator.go:273: created storage node gluster-4170815485
operator ERROR 2017/08/09 18:01:35 operator.go:365: unable to create deployment gluster-4170815485: Deployment.apps "gluster-4170815485" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
operator ERROR 2017/08/09 18:01:35 operator.go:246: reconciliation failed: unable to create deployment gluster-4170815485: Deployment.apps "gluster-4170815485" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
E0809 18:01:35.836527       1 operator.go:246] reconciliation failed: unable to create deployment gluster-4170815485: Deployment.apps "gluster-4170815485" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy
. . .

ERROR repeats after a couple of minutes

Any help is appreciated

Add documentation for each driver

Remove StorageCluster from XXXNode APIs

GlusterFS could recognize IP if using Cluster network

If the deployment is a simple test using cluster network, the driver can determine the IP of the storage network by using the cluster network IP.

Can a service be used here? The biggest issue that Gluster has is that it creates new ports as bricks are created. It would be great if a range of ports could be added to a service if that is used. Maybe a service does not need to be used? What happens to IP addresses which are not part of a service but the pod does not change from node?

Check out this idea of how to update the /etc/hosts file on each container from this github repo: https://github.com/suquant/glusterd

Specifying an image for StorageNode

It is possible to override the image used by implicitly describing it in the StorageNode. The concern here is that this would make it hard to upgrade easily and also hard to keep the same version and image across node.

Remove imported pkgs

Point include them in glide instead.

clean up Mock storage implementation

Mock storage implementation serves as an example for users to understand how driver works. We need to make our Mock storage easy to read and understand.

Create a node_operator.go

Base it on how cluster_operator.go is done by moving the code from operator.go into node_operator.go

QM can not communicate with kubernetes api 10.96.0.1:443

kicking the tires with QM and not sure this is a QM issue or a kubeadm/cni issue but wondering if anyone has seen this or has any ideas. Since kube seems operational, I can run various containers like nginx and busybox with no issues, wondering if something with QM is causing the issue?

Environment:

4 node cluster with centos 7 (1 master and 3 nodes)
kubeadm init successful with versions 1.6.2 (latest) and 1.5.6 (tried multiple versions in case RBAC was my issue)
cni add-on is weave (also tried flannel, calico)

Pods Running:

# kubectl get pods -o wide --all-namespaces
NAMESPACE     NAME                                      READY     STATUS             RESTARTS   AGE       IP                NODE
default       nfs-bb-pod1                               1/1       Running            0          2m        10.38.0.0         cns17.rhs
default       nginx-pod1                                1/1       Running            0          58s       10.34.0.0         cns16.rhs
kube-system   dummy-2088944543-27dn4                    1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   etcd-cnsmaster15.rhs                      1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   kube-apiserver-cnsmaster15.rhs            1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   kube-controller-manager-cnsmaster15.rhs   1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   kube-discovery-1769846148-tkflt           1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   kube-dns-2924299975-9r58s                 4/4       Running            0          27m       10.32.0.3         cnsmaster15.rhs
kube-system   kube-proxy-2b1h0                          1/1       Running            0          18m       192.168.122.217   cns17.rhs
kube-system   kube-proxy-2x6sv                          1/1       Running            0          18m       192.168.122.216   cns16.rhs
kube-system   kube-proxy-3t57s                          1/1       Running            0          18m       192.168.122.218   cns18.rhs
kube-system   kube-proxy-cb1hg                          1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   kube-qm-847486390-lts2t                   0/1       CrashLoopBackOff   7          15m       10.40.0.0         cns18.rhs
kube-system   kube-scheduler-cnsmaster15.rhs            1/1       Running            0          27m       192.168.122.215   cnsmaster15.rhs
kube-system   weave-net-8jg0v                           2/2       Running            0          18m       192.168.122.216   cns16.rhs
kube-system   weave-net-qp71m                           2/2       Running            0          18m       192.168.122.218   cns18.rhs
kube-system   weave-net-smlp9                           2/2       Running            0          22m       192.168.122.215   cnsmaster15.rhs
kube-system   weave-net-xwzqh                           2/2       Running            0          18m       192.168.122.217   cns17.rhs

Services:

# kubectl get svc --all-namespaces
NAMESPACE     NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       kubernetes   10.96.0.1    <none>        443/TCP         30m
kube-system   kube-dns     10.96.0.10   <none>        53/UDP,53/TCP   30m

Endpoints:

# kubectl get endpoints --all-namespaces
NAMESPACE     NAME                      ENDPOINTS                   AGE
default       kubernetes                192.168.122.215:6443        31m
kube-system   kube-controller-manager   <none>                      26m
kube-system   kube-dns                  10.32.0.3:53,10.32.0.3:53   31m
kube-system   kube-scheduler            <none>                      26m

notice I can run pods such as busybox and ngnix with no issues

qm describe:

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----			-------------			--------	------		-------
  5m		5m		1	{default-scheduler }					Normal		Scheduled	Successfully assigned kube-qm-847486390-lts2t to cns18.rhs
  5m		5m		1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal		Created		Created container with docker id 27461e2167c5; Security:[seccomp=unconfined]
  5m		5m		1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal		Started		Started container with docker id 27461e2167c5
  5m		5m		1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal		Started		Started container with docker id efa3bc71ac10
  5m		5m		1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal		Created		Created container with docker id efa3bc71ac10; Security:[seccomp=unconfined]
  4m		4m		1	{kubelet cns18.rhs}					Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kube-qm" with CrashLoopBackOff: "Back-off 10s restarting failed container=kube-qm pod=kube-qm-847486390-lts2t_kube-system(674f0070-30c4-11e7-afd2-525400ab92fa)"

  4m	4m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Created		Created container with docker id bf93136ea8b8; Security:[seccomp=unconfined]
  4m	4m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Started		Started container with docker id bf93136ea8b8
  3m	3m	2	{kubelet cns18.rhs}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kube-qm" with CrashLoopBackOff: "Back-off 20s restarting failed container=kube-qm pod=kube-qm-847486390-lts2t_kube-system(674f0070-30c4-11e7-afd2-525400ab92fa)"

  3m	3m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Created		Created container with docker id 6deb71862d46; Security:[seccomp=unconfined]
  3m	3m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Started		Started container with docker id 6deb71862d46
  3m	2m	3	{kubelet cns18.rhs}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kube-qm" with CrashLoopBackOff: "Back-off 40s restarting failed container=kube-qm pod=kube-qm-847486390-lts2t_kube-system(674f0070-30c4-11e7-afd2-525400ab92fa)"

  2m	2m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Created		Created container with docker id 8a727dbcb596; Security:[seccomp=unconfined]
  2m	2m	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Started		Started container with docker id 8a727dbcb596
  4m	31s	13	{kubelet cns18.rhs}	spec.containers{kube-qm}	Warning	BackOff		Back-off restarting failed docker container
  1m	31s	7	{kubelet cns18.rhs}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "kube-qm" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-qm pod=kube-qm-847486390-lts2t_kube-system(674f0070-30c4-11e7-afd2-525400ab92fa)"

  5m	18s	6	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Pulling	pulling image "quay.io/lpabon/qm"
  5m	17s	6	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Pulled	Successfully pulled image "quay.io/lpabon/qm"
  16s	16s	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Created	Created container with docker id c61ab4b81350; Security:[seccomp=unconfined]
  16s	16s	1	{kubelet cns18.rhs}	spec.containers{kube-qm}	Normal	Started	Started container with docker id c61ab4b81350

qm pod error:

[root@cnsmaster15 kubernetes]# kubectl -n kube-system logs kube-qm-847486390-lts2t
main INFO 2017/05/04 12:22:59 main.go:50: Version: v0.0.3-10-g2edceac
operator INFO 2017/05/04 12:22:59 operator.go:88: storage driver nfs loaded
operator INFO 2017/05/04 12:22:59 operator.go:88: storage driver mock loaded
operator INFO 2017/05/04 12:22:59 operator.go:88: storage driver glusterfs loaded
operator INFO 2017/05/04 12:22:59 operator.go:88: storage driver swift loaded
operator ERROR 2017/05/04 12:23:29 operator.go:118: communicating with server failed: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout

telnet to api server from node

# telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

API seems to not be available

I ran the following command from the README (changed the namespace to storage for my test cluster).
kubectl run -n storage kube-qm --image=quay.io/lpabon/qm

However, when walking through the GlusterFS examples, the API seems to be unavailable and the cluster cannot get the TPR for StorageCluster. I wget-ed the cluster.yaml, and changed the hostnames and IPs to match nodes within my cluster.

$ wget https://raw.githubusercontent.com/lpabon/quartermaster/master/examples/glusterfs/cluster.yaml
$ kubectl get all -n storage
NAME                          READY     STATUS             RESTARTS   AGE
po/kube-qm-1394339067-qsgpd   0/1       CrashLoopBackOff   14         49m

NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-qm   1         1         1            0           49m

NAME                    DESIRED   CURRENT   READY     AGE
rs/kube-qm-1394339067   1         1         0         49m
$ kubectl create -n storage -f cluster.yaml 
error: unable to recognize "cluster.yaml": no matches for storage.coreos.com/, Kind=StorageCluster

Doc: How to create a driver

Nodes may only utilize a single storage cluster

Currently while working through an example, I have three node cluster as follows:

+----------+ +----------+ +----------+
| node1    | | node2    | | node3    |
|          | |          | |          |
|          | |          | |          |
| RAID5    | | RAID5    | | RAID5    |
|  - 2TB   | |  - 2TB   | |  - 2 TB  |
|          | |          | |          |
| NVMe     | |          | |          |
|  - 120GB | |          | |          |
|          | |          | |          |
|          | |          | |          |
+----------+ +----------+ +----------+

In this configuration, I have two distinct classes of storage.

I then attempted to define two storage clusters, with the RAID 5 volumes of nodes 1, 2, and 3 being a member of one StorageClustr and the NVMe disk of node 1 being a separate storage cluster.

In attempting to define both storage pools as distinct clusters, I run into issues with the use of host paths being utilized, e.g.:

    volumeMounts:
    - mountPath: /var/lib/glusterd
      name: glusterfs-config
...
  volumes:
  - hostPath:
      path: /var/lib/glusterfs-container/glusterd
    name: glusterfs-config

In this case the use of glusterfs-config will always conflict with the definition of any other StorageClusters on the host.

Potentially using the cluster UUID as a part of the URL can make it deterministic from the host perspective, yet still allow for the definition of multiple StorageClusters.

doc move to Documentation

Please rename the docs folder to Documentation. This has two advantages:

It is sorted to the top on github
All of the other CoreOS projects have docs in this folder so pulling docs in the future, etc will be easier.

Regression: Reconciliation failed to add new node

glusterfs DEBUG 2017/05/27 17:11:46 heketi_kube.go:50: Got: http://10.3.0.230:8080
glusterfs ERROR 2017/05/27 17:11:46 glusterfs.go:239: unable to add node gluster-4132804602: Failed to get list of pods
operator ERROR 2017/05/27 17:11:46 operator.go:381: unable to add node gluster-4132804602: Failed to get list of pods
operator ERROR 2017/05/27 17:11:46 operator.go:246: reconciliation failed: unable to add node gluster-4132804602: Failed to get list of pods
E0527 17:11:46.292331       1 operator.go:246] reconciliation failed: unable to add node gluster-4132804602: Failed to get list of pods