embercsi / ember-csi Goto Github PK

View Code? Open in Web Editor NEW

65.0 65.0 27.0 22.84 MB

Multi-vendor CSI plugin supporting over 80 storage drivers

License: Other

Makefile 1.08% Python 95.88% Shell 1.98% Dockerfile 1.06%

ember-csi's People

Contributors

Stargazers

Watchers

ember-csi's Issues

Graceful shutdown

Make sure that the service can gracefully shutdown.

Currently we have a brute shutdown on KeyboardInterrupt that will not give any grace period for ongoing operations to complete.

We need to have a configurable value and be able to gracefully shutdown when the CO requests it.

CRD metadata persistent plugin

We should have a metadata persistent plugin specific for Kubernetes using CRDs.

This pluging could live inside the cinderlib-CSI driver or be an external Python Library.

Request ID shared between different requests

All concurrent requests are sharing the same request ID, so when we receive a new request it overwrites the request ID of the previous thread and they all start reporting the same ID.

Timeline:

Request #1 comes in --> Start logging as #1
Request #2 comes in --> Start logging as #2
Request #1 logs as request #2
Request #2 logs as request #2
Request #3 comes in --> Start logging as #3
Request #1 logs as request #3
Request #2 logs as request #3

Travis CI

There is no CI, and we should add at least one CI job on Travis.

This job could use the LVM driver to run the tests.

UUID warning on ControllerPublishVolume

When we do a ControllerPublishVolume we get the following warning:

/home/geguileo/code/reuse-cinder-drivers/cinderlib-csi/.venv/lib/python2.7/site-packages/oslo_versionedobjects/fields.py:368: FutureWarning: u'think.localdomain' is an invalid UUID. Using UUIDFields with invalid UUIDs is no longer supported, and will be removed in a future release. Please update your code to input valid UUIDs or accept ValueErrors for invalid UUIDs. See https://docs.openstack.org/oslo.versionedobjects/latest/reference/fields.html#oslo_versionedobjects.fields.UUIDField for further details
  FutureWarning)

We shouldn't be getting any warnings unrelated to the CSI plugin.

Add logging

The plugin doesn't have a logging mechanism, only a couple of print statements.

It should have a proper logging mechanism and more data should be logged.

The logging mechanism should be tied to the logging mechanism provided by cinderlib (coming from oslo.logging) if possible. That way even internal driver information will be logged.

CSI v1.1 is out, and includes the new Expand Volume feature, and we should add this feature to Ember-CSI and also check if there have been any other changes to the spec that may require additional changes.

grpc connection is unavailable

Hi,
Using cinderlib/CSI with either XtremIO or RBD, I see the following issue: attach seems to work but mount fails with the following error:
MountVolume.MountDevice failed for volume "pvc-9bb7c9a98c3e11e8" : rpc error: code = Unavailable desc = grpc: the connection is unavailable

[root@kt-c7kb9 contrib]# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
kclaim Bound pvc-9bb7c9a98c3e11e8 1Gi RWO cinderlib 3m
[root@kt-c7kb9 contrib]# oc get all
NAME READY STATUS RESTARTS AGE
pod/csi-cinderlib-controller-0 3/3 Running 0 4m
pod/csi-cinderlib-ds-hqm4w 2/2 Running 0 4m
pod/sleep 0/1 ContainerCreating 0 4m

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-cinderlib-ds 1 1 1 1 1 4m

NAME DESIRED CURRENT AGE
statefulset.apps/csi-cinderlib-controller 1 1 4m
[root@kt-c7kb9 contrib]#
[root@kt-c7kb9 contrib]#
[root@kt-c7kb9 contrib]#
[root@kt-c7kb9 contrib]#
[root@kt-c7kb9 contrib]# oc describe pod sleep
Name: sleep
Namespace: csi
Node: localhost/10.19.139.83
Start Time: Fri, 20 Jul 2018 17:02:26 +0000
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"sleep","namespace":"csi"},"spec":{"containers":[{"args":["sleep","1000000"],"image...
openshift.io/scc=anyuid
Status: Pending
IP:
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port:
Host Port:
Args:
sleep
1000000
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
Mounts:
/tmp/mysql from myclaim-mount (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5krt9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
myclaim-mount:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kclaim
ReadOnly: false
default-token-5krt9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5krt9
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations:
Events:
Type Reason Age From Message

Normal Scheduled 4m default-scheduler Successfully assigned csi/sleep to localhost
Normal SuccessfulAttachVolume 4m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-9bb7c9a98c3e11e8"
Warning FailedMount 6s (x10 over 4m) kubelet, localhost MountVolume.MountDevice failed for volume "pvc-9bb7c9a98c3e11e8" : rpc error: code = Unavailable desc = grpc: the connection is unavailable
Warning FailedMount 6s (x2 over 2m) kubelet, localhost Unable to mount volumes for pod "sleep_csi(adda6e0b-8c3e-11e8-99ed-0017a4772454)": timeout expired waiting for volumes to attach or mount for pod "csi"/"sleep". list of unmounted volumes=[myclaim-mount]. list of unattached volumes=[myclaim-mount default-token-5krt9]

I also see the following in kubelet logs:

E0720 17:06:44.675459 7093 kubelet.go:1610] Unable to mount volumes for pod "sleep_csi(adda6e0b-8c3e-11e8-99ed-0017a4772454)": timeout expired waiting for volumes to attach or mount for pod "csi"/"sleep". list of unmounted volumes=[myclaim-mount]. list of unattached volumes=[myclaim-mount default-token-5krt9]; skipping pod
E0720 17:06:44.675494 7093 pod_workers.go:186] Error syncing pod adda6e0b-8c3e-11e8-99ed-0017a4772454 ("sleep_csi(adda6e0b-8c3e-11e8-99ed-0017a4772454)"), skipping: timeout expired waiting for volumes to attach or mount for pod "csi"/"sleep". list of unmounted volumes=[myclaim-mount]. list of unattached volumes=[myclaim-mount default-token-5krt9]
I0720 17:06:44.917838 7093 operation_generator.go:489] MountVolume.WaitForAttach entering for volume "pvc-9bb7c9a98c3e11e8" (UniqueName: "kubernetes.io/csi/com.redhat.cinderlib-csi^d8510041-dca7-4f10-8dd2-bf8612f1007d") pod "sleep" (UID: "adda6e0b-8c3e-11e8-99ed-0017a4772454") DevicePath "csi-bef4cd4d9c8fb774e7572baa224f885b2e3294a9f46fcdcf218c285114b8e859"
I0720 17:06:44.919864 7093 operation_generator.go:498] MountVolume.WaitForAttach succeeded for volume "pvc-9bb7c9a98c3e11e8" (UniqueName: "kubernetes.io/csi/com.redhat.cinderlib-csi^d8510041-dca7-4f10-8dd2-bf8612f1007d") pod "sleep" (UID: "adda6e0b-8c3e-11e8-99ed-0017a4772454") DevicePath "csi-bef4cd4d9c8fb774e7572baa224f885b2e3294a9f46fcdcf218c285114b8e859"
E0720 17:06:44.920299 7093 csi_attacher.go:282] kubernetes.io/csi: attacher.MountDevice failed to check STAGE_UNSTAGE_VOLUME: rpc error: code = Unavailable desc = grpc: the connection is unavailable
E0720 17:06:44.920372 7093 nestedpendingoperations.go:267] Operation for ""kubernetes.io/csi/com.redhat.cinderlib-csi^d8510041-dca7-4f10-8dd2-bf8612f1007d"" failed. No retries permitted until 2018-07-20 17:08:46.920347416 +0000 UTC m=+11877.628616092 (durationBeforeRetry 2m2s). Error: "MountVolume.MountDevice failed for volume "pvc-9bb7c9a98c3e11e8" (UniqueName: "kubernetes.io/csi/com.redhat.cinderlib-csi^d8510041-dca7-4f10-8dd2-bf8612f1007d") pod "sleep" (UID: "adda6e0b-8c3e-11e8-99ed-0017a4772454") : rpc error: code = Unavailable desc = grpc: the connection is unavailable"

Any idea what might be causing this?

Thanks
Kiran

Add unit tests

All our testing relies on functional tests. We should also add unit tests to increase coverage.
The CI is already running the single existing unit test that we have, so all that's left to do is to create the tests and they will be automatically run by the gate.

Block attach and detach don't work

Block volume attach and detach are not working since we are making some assumptions that are not correct and were clarified in the CSI spec v1.0.

Our incorrect assumptions were:

Staging path would be a directory for non block volumes and a file for block.
Target path would always exist and would be a directory for mount and a file for block.

In reality it is like this:

Staging path exists but is always a directory.
CO makes sure the parent of target path exists, but doesn't necessarily create the basename directory/file, and the SP must create in.
SP must remove the basename file/directory of the target path.
SP cannot leave anything in the staging directory.

CSI 1.0 Driver name should be forward domain notation

Per the CSI spec the name of a CSI driver must conform to the following requirements:

The name MUST follow domain name notation format (https://tools.ietf.org/html/rfc1035#section-2.3.1).

It SHOULD include the plugin's host company name and the plugin name, to minimize the possibility of collisions.

It MUST be 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), dots (.), and alphanumerics between.

This field is REQUIRED.

CSI 0.x required reverse domain notation. But CSI 1.0 dropped the "reverse" part of the requirement. The expectation is that the name will be normal (forward) domain notation.

So for CSI 1.0, instead of io.ember-csi.[x] the driver name should be [x].ember-csi.io

The csi-controller pod crashes when pvc is created and try to get bound

Hi @kirankt ,

I tried to test ember-csi following the document: https://github.com/kirankt/ember-csi/tree/master/examples/openshift. Once I created the pvc, the running csi-controller got to crash, the csi-driver and external-attacher can't run, so the pvc is always pending.

Here is the ceph.conf and keyring: http://pastebin.test.redhat.com/643939

[root@qe-shlao-ceph-openstack-master-etcd-1 openshift]# oc describe pvc
Name:          ember-csi-pvc
Namespace:     ember-csi
StorageClass:  ember-csi-sc
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner=io.ember-csi
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
  Type    Reason                Age               From                         Message
  ----    ------                ----              ----                         -------
  Normal  ExternalProvisioning  1m (x26 over 7m)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "io.ember-csi" or manually created by system administrator

[root@qe-shlao-ceph-openstack-master-etcd-1 openshift]# oc get all
NAME                   READY     STATUS             RESTARTS   AGE
pod/csi-controller-0   2/3       CrashLoopBackOff   2          1m
pod/csi-node-gx8np     2/2       Running            0          1h
 
NAME                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-node   3         1         1         1            1           <none>          1h
 
NAME                              DESIRED   CURRENT   AGE
statefulset.apps/csi-controller   1         1         1h

[root@qe-shlao-ceph-openstack-master-etcd-1 openshift]# oc describe pod/csi-controller-0
Name:               csi-controller-0
Namespace:          ember-csi
Priority:           0
PriorityClassName:  <none>
Node:               qe-shlao-ceph-openstack-node-1/172.16.122.87
Start Time:         Mon, 10 Sep 2018 12:44:43 -0400
Labels:             app=csi-controller
                    controller-revision-hash=csi-controller-6844c8bc
                    statefulset.kubernetes.io/pod-name=csi-controller-0
Annotations:        openshift.io/scc=ember-csi-scc
Status:             Running
IP:                 172.16.122.87
Controlled By:      StatefulSet/csi-controller
Containers:
  external-provisioner:
    Container ID:  docker://e07dd8bfdabeee6e2020ee33c19cfd7736f4ba40379fc96aae56d6fec0d07b3f
    Image:         quay.io/k8scsi/csi-provisioner:v0.3.0
    Image ID:      docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:d45e03c39c1308067fd46d69d8e01475cc0c9944c897f6eded4df07e75e5d3fb
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --provisioner=io.ember-csi
      --csi-address=/csi-data/csi.sock
    State:          Running
      Started:      Mon, 10 Sep 2018 12:44:44 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /csi-data from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ember-csi-controller-sa-token-hb2zz (ro)
  external-attacher:
    Container ID:  docker://e265819115d3012ec959ef8e89796b415e35d1b1e2b55b3d75d93db0bf870910
    Image:         quay.io/k8scsi/csi-attacher:v0.3.0
    Image ID:      docker-pullable://quay.io/k8scsi/csi-attacher@sha256:44b7d518e00d437fed9bdd6e37d3a9dc5c88ca7fc096ed2ab3af9d3600e4c790
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=5
      --csi-address=/csi-data/csi.sock
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 10 Sep 2018 12:46:57 -0400
      Finished:     Mon, 10 Sep 2018 12:47:57 -0400
    Ready:          False
    Restart Count:  2
    Environment:    <none>
    Mounts:
      /csi-data from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ember-csi-controller-sa-token-hb2zz (ro)
  csi-driver:
    Container ID:   docker://962b98ee798f6b16d90bb78192dd1819d22115935e157a03791939beb9d6587d
    Image:          akrog/ember-csi:master
    Image ID:       docker-pullable://docker.io/akrog/ember-csi@sha256:331881bdb41d8004ca7960e730c495c6151dabbca2153d3f40266f62f9943c1d
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 10 Sep 2018 12:47:12 -0400
      Finished:     Mon, 10 Sep 2018 12:47:44 -0400
    Ready:          False
    Restart Count:  3
    Environment:
      PYTHONUNBUFFERED:          0
      CSI_ENDPOINT:              unix:///csi-data/csi.sock
      KUBE_NODE_NAME:             (v1:spec.nodeName)
      CSI_MODE:                  controller
      X_CSI_PERSISTENCE_CONFIG:  {"storage":"crd"}
      X_CSI_BACKEND_CONFIG:      {"volume_backend_name": "rbd", "volume_driver": "cinder.volume.drivers.rbd.RBDDriver", "rbd_user": "cinder", "rbd_pool": "cinder_volumes", "rbd_ceph_conf": "/etc/ceph/ceph.conf", "rbd_keyring_conf": "/etc/ceph/keyring"}
    Mounts:
      /csi-data from socket-dir (rw)
      /dev from dev-dir (rw)
      /etc/ceph from ceph-secrets (rw)
      /etc/iscsi from iscsi-dir (rw)
      /etc/localtime from localtime (rw)
      /etc/lvm from lvm-dir (rw)
      /etc/multipath from multipath-dir (rw)
      /etc/multipath.conf from multipath-conf (rw)
      /lib/modules from modules-dir (rw)
      /run/udev from run-dir (rw)
      /var/lock/lvm from lvm-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ember-csi-controller-sa-token-hb2zz (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  socket-dir:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  iscsi-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/iscsi
    HostPathType:  Directory
  run-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /run/udev
    HostPathType:
  dev-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:
  lvm-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/lvm
    HostPathType:  Directory
  lvm-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lock/lvm
    HostPathType:
  multipath-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/multipath
    HostPathType:
  multipath-conf:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/multipath.conf
    HostPathType:
  modules-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  localtime:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/localtime
    HostPathType:
  ceph-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ceph-secrets
    Optional:    false
  ember-csi-controller-sa-token-hb2zz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ember-csi-controller-sa-token-hb2zz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason     Age              From                                     Message
  ----     ------     ----             ----                                     -------
  Normal   Scheduled  3m               default-scheduler                        Successfully assigned ember-csi/csi-controller-0 to qe-shlao-ceph-openstack-node-1
  Normal   Created    3m               kubelet, qe-shlao-ceph-openstack-node-1  Created container
  Normal   Started    3m               kubelet, qe-shlao-ceph-openstack-node-1  Started container
  Normal   Pulled     3m               kubelet, qe-shlao-ceph-openstack-node-1  Container image "quay.io/k8scsi/csi-provisioner:v0.3.0" already present on machine
  Normal   Pulled     2m (x2 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Container image "quay.io/k8scsi/csi-attacher:v0.3.0" already present on machine
  Normal   Created    2m (x2 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Created container
  Normal   Started    2m (x2 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Started container
  Normal   Pulling    1m (x3 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  pulling image "akrog/ember-csi:master"
  Normal   Pulled     1m (x3 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Successfully pulled image "akrog/ember-csi:master"
  Normal   Created    1m (x3 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Created container
  Normal   Started    1m (x3 over 3m)  kubelet, qe-shlao-ceph-openstack-node-1  Started container
  Warning  BackOff    1m (x3 over 2m)  kubelet, qe-shlao-ceph-openstack-node-1  Back-off restarting failed container
  Warning  BackOff    1m               kubelet, qe-shlao-ceph-openstack-node-1  Back-off restarting failed container


[root@qe-shlao-ceph-openstack-master-etcd-1 openshift]# oc logs pod/csi-controller-0 -c external-attacher
I0910 16:46:57.265147       1 main.go:74] Version: v0.3.0-1-g76ebff7
I0910 16:46:57.266518       1 connection.go:88] Connecting to /csi-data/csi.sock
I0910 16:46:57.266743       1 connection.go:115] Still trying, connection is CONNECTING
I0910 16:46:57.266920       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:46:58.267125       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:46:59.309251       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:00.485537       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:01.551533       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:02.526749       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:03.496711       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:04.571547       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:05.397918       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:06.260679       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:07.099535       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE
I0910 16:47:08.020025       1 connection.go:115] Still trying, connection is TRANSIENT_FAILURE

[root@qe-shlao-ceph-openstack-master-etcd-1 openshift]# oc logs pod/csi-controller-0 -c csi-driver
X_CSI_SYSTEM_FILES not specified.
Starting Ember CSI v0.0.2 in controller only mode (cinderlib: v0.2.2, cinder: v13.0.0.0rc2.dev74, CSI spec: v0.2.0)

Add tests

There are no automated tests of any kind, so everything has to be tested manually.

This is cumbersome, considering the number of different drivers supported by this plugin.

Running in "node" mode fails with multiple values for keyword node_id

While launching cinderlib-csi in "node" mode, the following error shows up:

[kthyagar@kt-c7kb7 cinderlib-csi{master}]$ python cinderlib_csi/cinderlib_csi.py
Starting cinderlib CSI v0.0.2 in node only mode (cinderlib: v0.2.1, cinder: v11.1.1, CSI spec: v0.2.0)
Traceback (most recent call last):
File "cinderlib_csi/cinderlib_csi.py", line 1005, in
main()
File "cinderlib_csi/cinderlib_csi.py", line 982, in main
node_id=node_id)
TypeError: init() got multiple values for keyword argument 'node_id'

[kthyagar@kt-c7kb7 cinderlib-csi{master}]$ env | grep CSI
X_CSI_BACKEND_CONFIG={"volume_backend_name": "rbd", "volume_driver": "cinder.volume.drivers.rbd.RBDDriver", "rbd_user": "cinder", "rbd_pool": "cinder_volumes", "rbd_ceph_conf": "/etc/ceph/ceph.conf", "rbd_keyring_conf": "/etc/ceph/ceph.client.cinder.keyring"}
CSI_MODE=node

In my local environment, I have gotten past this error, only to encounter many other issues deep within cinderlib itself.

Please let me know if you need any further info to help debug this.

Add support for CSI volume parameters

We should make the best of CSI volume parameters.
We can use them to specify the pool in a multi-pool driver, set extra-specs, and even set QoS settings.

Use ConfigMap to store backend storage credentials

The backend storage crendentials are currently listed in the csi driver manifest, we should use ConfigMaps to store them securely and separate that data from the manifest itself...

Encryption support

Ember-CSI could support encryption using LUKS since most of the plumbing is already there in OS-Brick, so we just need to add some code to Ember-CSI to support this through the CSI spec.

Improve iSCSI speed

iSCSI attach/detach speed is not optimal due to the presence of a big critical section in the os-brick code.

This section can be reduced using a combination of exclusive and shared locking on the same file lock.

The work should be done in upstream os-brick and then get it here once it is released, but if it takes a while to get merged we could consider carrying the patch downstream in Ember-CSI containers until it does.

Finer grained `vendor_version`

We are reporting the Ember-CSI version in GetPluginCapabilities response under the field vendor_version, which is not good enough to know when an individual component has changed.

We could have the same Ember-CSI version but have updated the cinder, os-brick, or cinderlib version, and the plugin would not be reporting this change to the CO.

For upgrades we'll need a finer grained reporting capability from the plugin, as we'll need to have multiple plugin versions running simultaneously while we update/upgrade, and according to the spec this is acceptable as long as the vendor_version is different:

All instances of the same version (see vendor_version of GetPluginInfoResponse) of the Plugin SHALL return the same set of capabilities, regardless of both: (a) where instances are deployed on the cluster as well as; (b) which RPCs an instance is serving.

One way to do this would be concatenating the Ember-CSI version with the OpenStack release and the date.

Remove stage and unstage calls

We currently have stage and unstage calls for convenience, but we could remove them to get a bit of a performance increase, as there would be fewer grpc calls.

Include RBD-NBD package in containers

Now that cinderlib supports rbd-nbd to attach RBD volumes we should include it in our containers, as it'd be the preferred connection tool for RBD volumes to avoid feature support mismatch between the kernel module and the Ceph cluster.

Add support for multiple controllers

Right now we only support a single controller, deployed as a StatefulSet. It would be good to support multiple controller plugins to increase the reliability and hopefully the performance of the plugin.

Volume soft delete is broken

We are doing soft-deletes when deleting a volume that has snapshots, but
when later we deleted the snapshots the soft-deleted volume was not
being deleted.

The reason was that we were not correctly setting the status of the
volume to 'deleted', because we were creating a new status attribute in
the cinderlib Volume object instead of changing the OVO's status
attribute.

Since the OVO's attributes are the only ones that get serialized, we
ended up saving the same status we had before and not the "deleted" one
we wanted.

Add support for eventlet in GRPC

Google's GRPC python library doesn't work with eventlet out of the box.
Ember-CSI has a workaround for it for non stream connections.
It would be best to create a proper fix for the grpc Python library to add support for eventlet and remove the workaround from our code, as this would ensure that new versions of the library would not break us.

Support node only mode

The code has some code meant to handle the 3 different modes (Controller, Node, Controller & Node) via the CSI_MODE environmental variable.

Only the Controller & Node joined functionality has been tested, so it's very likely that the other modes are not working as expected.

Simplify backend configuration options

Some of the configuration parameters name within a backend configuration are unnecessarily long, for example volume_backend_name, volume_driver, and use_multipath_for_image_xfer.
It would be convenient if Ember could abstract the real cinderlib names and simplify them for the users.

Also, the volume driver names are way too long since they need to specify the whole driver namespace. It would be great if we could have aliases.

List volumes fails on CSI v0.3

Listing volumes when running with CSI v0.3 will raise an exception:

2019-03-05 13:02:50 ERROR root [req-140029023167384] Exception calling application: 'All' object has no attribute 'TOPOLOGIES': AttributeError: 'All' object has no attribute 'TOPOLOGIES'
2019-03-05 13:02:50.626 27782 ERROR root Traceback (most recent call last):
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/.tox/py27/lib/python2.7/site-packages/grpc/_server.py", line 385, in _call_behavior
2019-03-05 13:02:50.626 27782 ERROR root     return behavior(argument, context), True
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/ember_csi/common.py", line 124, in dolog
2019-03-05 13:02:50.626 27782 ERROR root     result = f(self, request, context)
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/ember_csi/common.py", line 219, in checker
2019-03-05 13:02:50.626 27782 ERROR root     return f(self, request, context)
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/ember_csi/common.py", line 76, in wrapper
2019-03-05 13:02:50.626 27782 ERROR root     return func(self, request, context)
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/ember_csi/base.py", line 352, in CreateVolume
2019-03-05 13:02:50.626 27782 ERROR root     volume = self._convert_volume_type(vol)
2019-03-05 13:02:50.626 27782 ERROR root   File "/home/geguileo/code/reuse-cinder-drivers/ember/ember_csi/v0_3_0/csi.py", line 83, in _convert_volume_type
2019-03-05 13:02:50.626 27782 ERROR root     if self.TOPOLOGIES:
2019-03-05 13:02:50.626 27782 ERROR root AttributeError: 'All' object has no attribute 'TOPOLOGIES'

Latest upstream container is broken

The latest container image (ea711db5e3da) is broken. Eg:

$ docker run -it embercsi/ember-csi:latest /usr/bin/ember-csi
Traceback (most recent call last):
File "/usr/bin/ember-csi", line 6, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 3126, in
@_call_aside
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 3110, in _call_aside
f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 3139, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 583, in _build_master
return cls._build_from_requirements(requires)
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 596, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/usr/lib/python2.7/site-packages/pkg_resources/init.py", line 789, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (urllib3 1.24.1 (/usr/lib/python2.7/site-packages), Requirement.parse('urllib3<1.24,>=1.21.1'), set(['requests']))

Number of gRPC workers is hardcoded

We have the number of gRPC workers used by the server hardcoded to 10, we should be able to change it.

'No such file or director' error during stage/unstage.

[root@kt-c7kb7 kirankt]# oc create -f pvc.yml 
persistentvolumeclaim/ember-csi-pvc created
[root@kt-c7kb7 kirankt]# oc get pvc
NAME            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ember-csi-pvc   Bound     pvc-077174ec-ab36-11e8-a917-0017a4772406   1Gi        RWO            ember-csi-sc   6s
[root@kt-c7kb7 kirankt]# oc logs pod/ember-csi-node-z9qtr -c ember-csi-driver | less
X_CSI_SYSTEM_FILES not specified.
Starting Ember CSI v0.0.2 in node only mode (cinderlib: v0.2.2, cinder: v13.0.0.0rc2.dev46, CSI spec: v0.2.0)
Supported filesystems are: cramfs, minix, xfs, btrfs, ext2, ext3, ext4
Running as node
Debugging feature is ENABLED with ember_csi.rpdb and OFF. Toggle it with SIGUSR1.
Now serving on unix:///csi-data/csi.sock...
=> 2018-08-29 02:48:12.582640 GRPC [139733851126792]: GetPluginInfo without params
<= 2018-08-29 02:48:12.582699 GRPC in 0s [139733851126792]: GetPluginInfo returns
        name: "io.ember-csi"
        vendor_version: "0.0.2"
        manifest {
          key: "cinder-version"
          value: "13.0.0.0rc2.dev46"
        }
        manifest {
          key: "cinderlib-version"
          value: "0.2.2"
        }
        manifest {
          key: "mode"
          value: "node"
        }
        manifest {
          key: "persistence"
          value: "CRDPersistence"
        }
=> 2018-08-29 02:48:12.593167 GRPC [139733851127632]: NodeGetId without params
<= 2018-08-29 02:48:12.593220 GRPC in 0s [139733851127632]: NodeGetId returns
        node_id: "10.19.139.83"
[root@kt-c7kb7 kirankt]# oc create -f app.yml 
pod/my-csi-app created
[root@kt-c7kb7 kirankt]# oc logs pod/ember-csi-node-z9qtr -c ember-csi-driver-f
Error from server (BadRequest): container ember-csi-driver-f is not valid for pod ember-csi-node-z9qtr
[root@kt-c7kb7 kirankt]# oc logs pod/ember-csi-node-z9qtr -c ember-csi-driver -f
X_CSI_SYSTEM_FILES not specified.
Starting Ember CSI v0.0.2 in node only mode (cinderlib: v0.2.2, cinder: v13.0.0.0rc2.dev46, CSI spec: v0.2.0)
Supported filesystems are: cramfs, minix, xfs, btrfs, ext2, ext3, ext4
Running as node
Debugging feature is ENABLED with ember_csi.rpdb and OFF. Toggle it with SIGUSR1.
Now serving on unix:///csi-data/csi.sock...
=> 2018-08-29 02:48:12.582640 GRPC [139733851126792]: GetPluginInfo without params
<= 2018-08-29 02:48:12.582699 GRPC in 0s [139733851126792]: GetPluginInfo returns
	name: "io.ember-csi"
	vendor_version: "0.0.2"
	manifest {
	  key: "cinder-version"
	  value: "13.0.0.0rc2.dev46"
	}
	manifest {
	  key: "cinderlib-version"
	  value: "0.2.2"
	}
	manifest {
	  key: "mode"
	  value: "node"
	}
	manifest {
	  key: "persistence"
	  value: "CRDPersistence"
	}
=> 2018-08-29 02:48:12.593167 GRPC [139733851127632]: NodeGetId without params
<= 2018-08-29 02:48:12.593220 GRPC in 0s [139733851127632]: NodeGetId returns
	node_id: "10.19.139.83"
=> 2018-08-29 02:49:04.294972 GRPC [139733851127632]: NodeGetCapabilities without params
<= 2018-08-29 02:49:04.295237 GRPC in 0s [139733851127632]: NodeGetCapabilities returns
	capabilities {
	  rpc {
	    type: STAGE_UNSTAGE_VOLUME
	  }
	}
=> 2018-08-29 02:49:04.299448 GRPC [139733814980688]: NodeStageVolume with params
	volume_id: "93d59676-93bc-4b7a-95ee-93e4ccba0e67"
	publish_info {
	  key: "connection_info"
	  value: "{\"connector\": {\"initiator\": \"iqn.1994-05.com.redhat:b2a7cfc30af\", \"ip\": \"10.19.139.83\", \"platform\": \"x86_64\", \"host\": \"kt-c7kb9.cloud.lab.eng.bos.redhat.com\", \"do_local_attach\": false, \"os_type\": \"linux2\", \"multipath\": false}, \"conn\": {\"driver_volume_type\": \"rbd\", \"data\": {\"secret_uuid\": null, \"volume_id\": \"93d59676-93bc-4b7a-95ee-93e4ccba0e67\", \"auth_username\": \"cinder\", \"secret_type\": \"ceph\", \"name\": \"cinder_volumes/volume-93d59676-93bc-4b7a-95ee-93e4ccba0e67\", \"discard\": true, \"keyring\": \"[client.cinder]\\n\\tkey = AQD2o5RalmhvMhAApYsRfGUfL1A1m0aXgQsaLw==\\n\", \"cluster_name\": \"ceph\", \"hosts\": [\"172.31.142.11\", \"172.31.142.12\", \"172.31.142.13\"], \"auth_enabled\": true, \"ports\": [\"6789\", \"6789\", \"6789\"]}}}"
	}
	staging_target_path: "/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/csi/pv/pvc-077174ec-ab36-11e8-a917-0017a4772406/globalmount"
	volume_capability {
	  mount {
	    fs_type: "ext4"
	  }
	  access_mode {
	    mode: SINGLE_NODE_WRITER
	  }
	}
	volume_attributes {
	  key: "storage.kubernetes.io/csiProvisionerIdentity"
	  value: "1535510864943-8081-io.ember-csi"
	}
!! 2018-08-29 02:49:04.831390 GRPC in 1s [139733814980688]: Unexpected exception on NodeStageVolume ()
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 128, in dolog
	    result = f(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 169, in checker
	    return f(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 211, in wrapper
	    return func(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 858, in NodeStageVolume
	    conn.attach()
	  File "/usr/lib/python2.7/site-packages/cinderlib/objects.py", line 715, in attach
	    device = self.connector.connect_volume(self.conn_info['data'])
	  File "/usr/lib/python2.7/site-packages/nos_brick/__init__.py", line 32, in connect_volume
	    self._execute('which', 'rbd')
	  File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in _execute
	    result = self.__execute(*args, **kwargs)
	  File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute
	    on_completion=on_completion, *cmd, **kwargs)
	  File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 391, in execute
	    env=env_variables)
	  File "/usr/lib/python2.7/site-packages/eventlet/green/subprocess.py", line 58, in __init__
	    subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
	  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
	    errread, errwrite)
	  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
	    raise child_exception
	OSError: [Errno 2] No such file or directory
=> 2018-08-29 02:49:05.402198 GRPC [139733813464912]: NodeGetCapabilities without params
<= 2018-08-29 02:49:05.402368 GRPC in 0s [139733813464912]: NodeGetCapabilities returns
	capabilities {
	  rpc {
	    type: STAGE_UNSTAGE_VOLUME
	  }
	}
=> 2018-08-29 02:49:05.408270 GRPC [139733813538896]: NodeStageVolume with params
	volume_id: "93d59676-93bc-4b7a-95ee-93e4ccba0e67"
	publish_info {
	  key: "connection_info"
	  value: "{\"connector\": {\"initiator\": \"iqn.1994-05.com.redhat:b2a7cfc30af\", \"ip\": \"10.19.139.83\", \"platform\": \"x86_64\", \"host\": \"kt-c7kb9.cloud.lab.eng.bos.redhat.com\", \"do_local_attach\": false, \"os_type\": \"linux2\", \"multipath\": false}, \"conn\": {\"driver_volume_type\": \"rbd\", \"data\": {\"secret_uuid\": null, \"volume_id\": \"93d59676-93bc-4b7a-95ee-93e4ccba0e67\", \"auth_username\": \"cinder\", \"secret_type\": \"ceph\", \"name\": \"cinder_volumes/volume-93d59676-93bc-4b7a-95ee-93e4ccba0e67\", \"discard\": true, \"keyring\": \"[client.cinder]\\n\\tkey = AQD2o5RalmhvMhAApYsRfGUfL1A1m0aXgQsaLw==\\n\", \"cluster_name\": \"ceph\", \"hosts\": [\"172.31.142.11\", \"172.31.142.12\", \"172.31.142.13\"], \"auth_enabled\": true, \"ports\": [\"6789\", \"6789\", \"6789\"]}}}"
	}
	staging_target_path: "/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/csi/pv/pvc-077174ec-ab36-11e8-a917-0017a4772406/globalmount"
	volume_capability {
	  mount {
	    fs_type: "ext4"
	  }
	  access_mode {
	    mode: SINGLE_NODE_WRITER
	  }
	}
	volume_attributes {
	  key: "storage.kubernetes.io/csiProvisionerIdentity"
	  value: "1535510864943-8081-io.ember-csi"
	}
!! 2018-08-29 02:49:05.939252 GRPC in 1s [139733813538896]: Unexpected exception on NodeStageVolume ()
	Traceback (most recent call last):
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 128, in dolog
	    result = f(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 169, in checker
	    return f(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 211, in wrapper
	    return func(self, request, context)
	  File "/usr/lib/python2.7/site-packages/ember_csi/ember_csi.py", line 858, in NodeStageVolume
	    conn.attach()
	  File "/usr/lib/python2.7/site-packages/cinderlib/objects.py", line 715, in attach
	    device = self.connector.connect_volume(self.conn_info['data'])
	  File "/usr/lib/python2.7/site-packages/nos_brick/__init__.py", line 32, in connect_volume
	    self._execute('which', 'rbd')
	  File "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in _execute
	    result = self.__execute(*args, **kwargs)
	  File "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 143, in custom_execute
	    on_completion=on_completion, *cmd, **kwargs)
	  File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 391, in execute
	    env=env_variables)
	  File "/usr/lib/python2.7/site-packages/eventlet/green/subprocess.py", line 58, in __init__
	    subprocess_orig.Popen.__init__(self, args, 0, *argss, **kwds)
	  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
	    errread, errwrite)
	  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
	    raise child_exception
	OSError: [Errno 2] No such file or directory
=> 2018-08-29 02:49:07.009324 GRPC [139733814981168]: NodeGetCapabilities without params
<= 2018-08-29 02:49:07.009471 GRPC in 0s [139733814981168]: NodeGetCapabilities returns
	capabilities {
	  rpc {
	    type: STAGE_UNSTAGE_VOLUME
	  }
	}
=> 2018-08-29 02:49:07.012987 GRPC [139733814981288]: NodeStageVolume with params

Proper documentation

The README file, being the only documentation of the project, is really bloated.

It should be simplified, and there should be proper readthedocs documentation.

End to End testing

We should use Kubernetes end to end storage tests to verify that the plugin works and make it run in our CI.

While using the 'All' mode, the driver seems to only work in 'controller' mode only

Hi,
I am trying to deploy the Ember CSI driver as an all-in-one deployment largely based on the examples/kubevirt/csi.yml template. The driver seems to start up fine but the PVCs never get mounted. Peeking into the driver logs, I do not see any stage/unstage activity. Here is what I see:

[root@kt-c7kb7 openshift]# oc logs pod/ember-csi-aio-pod -c ember-driver -f
X_CSI_SYSTEM_FILES not specified.
Starting Ember CSI v0.0.2 (cinderlib: v0.2.2, cinder: v13.0.0.0rc2.dev59, CSI spec: v0.2.0)
Supported filesystems are: cramfs, minix, xfs, btrfs, ext2, ext3, ext4
Running as all with backend RBDDriver v1.2.0
Debugging feature is ENABLED with ember_csi.rpdb and OFF. Toggle it with SIGUSR1.
Now serving on unix:///csi-data/csi.sock...
=> 2018-08-31 02:02:10.670840 GRPC [140641126232144]: GetPluginInfo without params
<= 2018-08-31 02:02:10.670919 GRPC in 0s [140641126232144]: GetPluginInfo returns
	name: "io.ember-csi"
	vendor_version: "0.0.2"
	manifest {
	  key: "cinder-driver"
	  value: "RBDDriver"
	}
	manifest {
	  key: "cinder-driver-supported"
	  value: "True"
	}
	manifest {
	  key: "cinder-driver-version"
	  value: "1.2.0"
	}
	manifest {
	  key: "cinder-version"
	  value: "13.0.0.0rc2.dev59"
	}
	manifest {
	  key: "cinderlib-version"
	  value: "0.2.2"
	}
	manifest {
	  key: "mode"
	  value: "all"
	}
	manifest {
	  key: "persistence"
	  value: "CRDPersistence"
	}
=> 2018-08-31 02:02:10.680554 GRPC [140641126244552]: NodeGetId without params
<= 2018-08-31 02:02:10.680600 GRPC in 0s [140641126244552]: NodeGetId returns
	node_id: "kt-c7kb9.cloud.lab.eng.bos.redhat.com"
=> 2018-08-31 02:02:10.879080 GRPC [140641126244792]: GetPluginInfo without params
<= 2018-08-31 02:02:10.879124 GRPC in 0s [140641126244792]: GetPluginInfo returns
	name: "io.ember-csi"
	vendor_version: "0.0.2"
	manifest {
	  key: "cinder-driver"
	  value: "RBDDriver"
	}
	manifest {
	  key: "cinder-driver-supported"
	  value: "True"
	}
	manifest {
	  key: "cinder-driver-version"
	  value: "1.2.0"
	}
	manifest {
	  key: "cinder-version"
	  value: "13.0.0.0rc2.dev59"
	}
	manifest {
	  key: "cinderlib-version"
	  value: "0.2.2"
	}
	manifest {
	  key: "mode"
	  value: "all"
	}
	manifest {
	  key: "persistence"
	  value: "CRDPersistence"
	}
=> 2018-08-31 02:02:10.881839 GRPC [140641126244912]: Probe without params
<= 2018-08-31 02:02:10.881926 GRPC in 0s [140641126244912]: Probe returns nothing
=> 2018-08-31 02:02:10.884483 GRPC [140641126245032]: GetPluginCapabilities without params
<= 2018-08-31 02:02:10.884548 GRPC in 0s [140641126245032]: GetPluginCapabilities returns
	capabilities {
	  service {
	    type: CONTROLLER_SERVICE
	  }
	}
=> 2018-08-31 02:02:10.886930 GRPC [140641126245152]: ControllerGetCapabilities without params
<= 2018-08-31 02:02:10.887150 GRPC in 0s [140641126245152]: ControllerGetCapabilities returns
	capabilities {
	  rpc {
	    type: CREATE_DELETE_VOLUME
	  }
	}
	capabilities {
	  rpc {
	    type: PUBLISH_UNPUBLISH_VOLUME
	  }
	}
	capabilities {
	  rpc {
	    type: LIST_VOLUMES
	  }
	}
	capabilities {
	  rpc {
	    type: GET_CAPACITY
	  }
	}
=> 2018-08-31 02:02:16.133739 GRPC [140641126245272]: GetPluginCapabilities without params
<= 2018-08-31 02:02:16.133782 GRPC in 0s [140641126245272]: GetPluginCapabilities returns
	capabilities {
	  service {
	    type: CONTROLLER_SERVICE
	  }
	}
=> 2018-08-31 02:02:16.138038 GRPC [140641126246352]: ControllerGetCapabilities without params
<= 2018-08-31 02:02:16.138178 GRPC in 0s [140641126246352]: ControllerGetCapabilities returns
	capabilities {
	  rpc {
	    type: CREATE_DELETE_VOLUME
	  }
	}
	capabilities {
	  rpc {
	    type: PUBLISH_UNPUBLISH_VOLUME
	  }
	}
	capabilities {
	  rpc {
	    type: LIST_VOLUMES
	  }
	}
	capabilities {
	  rpc {
	    type: GET_CAPACITY
	  }
	}
=> 2018-08-31 02:02:16.142287 GRPC [140641126116360]: GetPluginInfo without params
<= 2018-08-31 02:02:16.142336 GRPC in 0s [140641126116360]: GetPluginInfo returns
	name: "io.ember-csi"
	vendor_version: "0.0.2"
	manifest {
	  key: "cinder-driver"
	  value: "RBDDriver"
	}
	manifest {
	  key: "cinder-driver-supported"
	  value: "True"
	}
	manifest {
	  key: "cinder-driver-version"
	  value: "1.2.0"
	}
	manifest {
	  key: "cinder-version"
	  value: "13.0.0.0rc2.dev59"
	}
	manifest {
	  key: "cinderlib-version"
	  value: "0.2.2"
	}
	manifest {
	  key: "mode"
	  value: "all"
	}
	manifest {
	  key: "persistence"
	  value: "CRDPersistence"
	}
=> 2018-08-31 02:02:16.147523 GRPC [140641126116960]: CreateVolume with params
	name: "pvc-e2535f29acc111e8"
	capacity_range {
	  required_bytes: 1073741824
	}
	volume_capabilities {
	  mount {
	  }
	  access_mode {
	    mode: SINGLE_NODE_WRITER
	  }
	}
creating volume
<= 2018-08-31 02:02:16.255768 GRPC in 0s [140641126116960]: CreateVolume returns
	volume {
	  capacity_bytes: 1073741824
	  id: "d228f642-aec3-4af0-a0a5-b392fd91a9e2"
	}
=> 2018-08-31 02:02:30.073385 GRPC [140641126116120]: ControllerPublishVolume with params
	volume_id: "d228f642-aec3-4af0-a0a5-b392fd91a9e2"
	node_id: "kt-c7kb9.cloud.lab.eng.bos.redhat.com"
	volume_capability {
	  mount {
	    fs_type: "ext4"
	  }
	  access_mode {
	    mode: SINGLE_NODE_WRITER
	  }
	}
	volume_attributes {
	  key: "storage.kubernetes.io/csiProvisionerIdentity"
	  value: "1535680902823-8081-io.ember-csi.aio"
	}
<= 2018-08-31 02:02:30.914675 GRPC in 1s [140641126116120]: ControllerPublishVolume returns
	publish_info {
	  key: "connection_info"
	  value: "{\"connector\": {\"initiator\": \"iqn.1994-05.com.redhat:b2a7cfc30af\", \"ip\": \"10.19.139.83\", \"platform\": \"x86_64\", \"host\": \"kt-c7kb9.cloud.lab.eng.bos.redhat.com\", \"do_local_attach\": false, \"os_type\": \"linux2\", \"multipath\": false}, \"conn\": {\"driver_volume_type\": \"rbd\", \"data\": {\"secret_uuid\": null, \"volume_id\": \"d228f642-aec3-4af0-a0a5-b392fd91a9e2\", \"auth_username\": \"cinder\", \"secret_type\": \"ceph\", \"name\": \"cinder_volumes/volume-d228f642-aec3-4af0-a0a5-b392fd91a9e2\", \"discard\": true, \"keyring\": \"[client.cinder]\\n\\tkey = AQD2o5RalmhvMhAApYsRfGUfL1A1m0aXgQsaLw==\\n\", \"cluster_name\": \"ceph\", \"hosts\": [\"172.31.142.11\", \"172.31.142.12\", \"172.31.142.13\"], \"auth_enabled\": true, \"ports\": [\"6789\", \"6789\", \"6789\"]}}}"
	}
^C
[root@kt-c7kb7 openshift]# oc get all
NAME                    READY     STATUS              RESTARTS   AGE
pod/ember-csi-aio-pod   4/4       Running             0          3m
pod/my-csi-app          0/1       ContainerCreating   0          2m

Publish allows for incompatible modes

We can now publish a volume for RW and then publish it again for RO, which should not be allowed, as they are incompatible.

State path doesn't work for locks or vols dirs

On commit 5021ca5 we added more sensible directory defaults for Ember-CSI as it'd be use in containers, but the way it was done wouldn't support changing it just by change the state_path parameter.

Create CONTRIBUTING guidelines

There is no document on how to contribute to the project, not even guidelines on preferred commit message format.

Volume listings don't work

Ember-CSI volume listing doesn't work because we are not using the right gRPC types when returning the data.

When running list-volumes using CSC we see:

$ kubectl exec -c csc csi-controller-0 csc controller list-volumes
Exception calling application: Parameter to MergeFrom() must be instance of same class: expected csi.v1.ListVolumesResponse.Entry got csi.v1.CreateVolumeResponse.

Kubernetes example deletes volume contents on second attach

Kubernetes example in the repository deploys CSI nodes in such a way that a second attachment of the PV will format the volume again, losing the data.

This happens because the lsblk command doesn't return the filesystem of the block device on the second attach, so the system thinks it's empty and formats it.

lsblk uses udev data to report the filesystem type, so we need to mount /dev/udev for it to be able to report the filesystem type.

Support multi-attach

We should support multi-attach for block volumes even if we cannot support it for mount volumes.

Repeated request IDs

[ This is not a duplicate of #100 ]
There is not enough "randomization" in the request ID generation, so we may end up seeing the same request ID in different request in a short period of time.

This happens even if we are sending requests serially.

Node probe errors

When we enable the probe passing "enable_probe":true in the X_CSI_EMBER_CONFIG the gRPC Probe call fail ratio skyrockets.

This seems to be caused by a collision on the name of the key-value CRD instance used to do the check for the Kubernetes access, as all nodes from the same backend are using the same name.

CSI Driver crashes when other CRDs are present

If CSI is deployed on a clean system it works fine, but if there is another app that creates CRDs e.g kubevirt, deploying the CSI driver after the fact causes crashes:

The log in the CSI driver container is:

Starting Ember CSI v0.0.2 in node only mode (cinderlib: v0.2.2.dev0, cinder: v11.1.1, CSI spec: v0.2.0)
Traceback (most recent call last):
  File "/usr/bin/ember-csi", line 11, in <module>
    load_entry_point('ember-csi', 'console_scripts', 'ember-csi')()
  File "/csi/ember_csi/ember_csi.py", line 1073, in main
    node_id=node_id)
  File "/csi/ember_csi/ember_csi.py", line 736, in __init__
    **cinderlib_config)
  File "/cinderlib/cinderlib/cinderlib.py", line 155, in global_setup
    cls.set_persistence(persistence_config)
  File "/cinderlib/cinderlib/cinderlib.py", line 133, in set_persistence
    cls.persistence = persistence.setup(persistence_config)
  File "/cinderlib/cinderlib/persistence/__init__.py", line 83, in setup
    invoke_kwds=config,
  File "/usr/lib/python2.7/site-packages/stevedore/driver.py", line 61, in __init__
    warn_on_missing_entrypoint=warn_on_missing_entrypoint
  File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 81, in __init__
    verify_requirements)
  File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 194, in _load_plugins
    self._on_load_failure_callback(self, ep, err)
  File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 186, in _load_plugins
    verify_requirements,
  File "/usr/lib/python2.7/site-packages/stevedore/named.py", line 158, in _load_one_plugin
    verify_requirements,
  File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 218, in _load_one_plugin
    obj = plugin(*invoke_args, **invoke_kwds)
  File "/csi/ember_csi/cl_crd.py", line 365, in __init__
    CRD.ensure_crds_exist()
  File "/csi/ember_csi/cl_crd.py", line 85, in ensure_crds_exist
    crds = K8S.ext_api.list_custom_resource_definition().to_dict()['items']
  File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/apiextensions_v1beta1_api.py", line 496, in list_custom_resource_definition
    (data) = self.list_custom_resource_definition_with_http_info(**kwargs)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/apiextensions_v1beta1_api.py", line 593, in list_custom_resource_definition_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 163, in __call_api
    return_data = self.deserialize(response_data, response_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 236, in deserialize
    return self.__deserialize(data, response_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 254, in __deserialize
    for sub_data in data]
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in __deserialize
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in <dictcomp>
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in __deserialize
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in <dictcomp>
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in __deserialize
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in <dictcomp>
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in __deserialize
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 259, in <dictcomp>
    for k, v in iteritems(data)}
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 620, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 622, in __deserialize_model
    instance = klass(**kwargs)
  File "/usr/lib/python2.7/site-packages/kubernetes/client/models/v1beta1_json_schema_props_or_array.py", line 52, in __init__
    self.json_schemas = json_schemas
  File "/usr/lib/python2.7/site-packages/kubernetes/client/models/v1beta1_json_schema_props_or_array.py", line 74, in json_schemas
    raise ValueError("Invalid value for `json_schemas`, must not be `None`")
ValueError: Invalid value for `json_schemas`, must not be `None`
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers
    finalizer()
  File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 482, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 725, in _help_stuff_finish
    inqueue.not_empty.acquire()
AttributeError: 'Queue' object has no attribute 'not_empty'

```--

Steps to reproduce:

oc apply -f https://github.com/kubevirt/kubevirt/releases/download/v0.7.0/kubevirt.yaml
oc apply -f csi.yml 
oc logs -f pod/csi-node-6rxqb -c csi-driver

We should not return the Snapshot size

We are returning a false value for Snapshots, as we are returning the volume size, which in general will be greater than the actual snapshot size.

Since the CSI spec allows us to return an undefined value we should do so, since we can't tell the real size via cinderlib.

Creating a volume from snapshot fails

Ember-CSI is setup to use CSI spec V1 and the snapshotter sidecar has been deployed, and we can create snapshots, but when we try to create a volume from snapshot we get the following error on the logs:

2019-01-29 13:02:13 INFO ember_csi.common [req-140230550950928] => GRPC CreateVolume pvc-d8750d67-23c5-11e9-a1c2-5254000952e0
2019-01-29 13:02:13 ERROR ember_csi.common [req-140230550950928] !! GRPC CreateVolume failed in 0s with Unexpected exception (Protocol message Volume has no "volume_content_source" field.)
        Traceback (most recent call last):
          File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 123, in dolog
            result = f(self, request, context)
          File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 218, in checker
            return f(self, request, context)
          File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 75, in wrapper
            return func(self, request, context)
          File "/usr/lib/python2.7/site-packages/ember_csi/base.py", line 351, in CreateVolume
            return self._convert_volume_type(vol)
          File "/usr/lib/python2.7/site-packages/ember_csi/v1_0_0/csi.py", line 114, in _convert_volume_type
            volume = types.Volume(**parameters)
        ValueError: Protocol message Volume has no "volume_content_source" field.
2019-01-29 13:02:13 ERROR root [req-140230550950928] Exception calling application: Protocol message Volume has no "volume_content_source" field.: ValueError: Protocol message Volume has no "volume_content_source" field.
2019-01-29 13:02:13.977 1 ERROR root Traceback (most recent call last):
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib64/python2.7/site-packages/grpc/_server.py", line 385, in _call_behavior
2019-01-29 13:02:13.977 1 ERROR root     return behavior(argument, context), True
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 159, in wrapper
2019-01-29 13:02:13.977 1 ERROR root     return f(*args, **kwargs)
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 123, in dolog
2019-01-29 13:02:13.977 1 ERROR root     result = f(self, request, context)
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 218, in checker
2019-01-29 13:02:13.977 1 ERROR root     return f(self, request, context)
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/common.py", line 75, in wrapper
2019-01-29 13:02:13.977 1 ERROR root     return func(self, request, context)
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/base.py", line 351, in CreateVolume
2019-01-29 13:02:13.977 1 ERROR root     return self._convert_volume_type(vol)
2019-01-29 13:02:13.977 1 ERROR root   File "/usr/lib/python2.7/site-packages/ember_csi/v1_0_0/csi.py", line 114, in _convert_volume_type
2019-01-29 13:02:13.977 1 ERROR root     volume = types.Volume(**parameters)
2019-01-29 13:02:13.977 1 ERROR root ValueError: Protocol message Volume has no "volume_content_source" field.
2019-01-29 13:02:13.977 1 ERROR root

Deploying with snapshotter doesn't work

If we deploy Ember-CSI to use CSI spec v1 and the snapshotter sidecar the sidecar keeps calling the Probe method and isn't able to start successfully.

This is cased by a 1 second timeout on the call as explained in the snapshotter issue 89 and we have proposed a PR to resolve it, but in the meantime we should be able to disable the probe via env variable.

Add filesystem builders into container

The container only has cramfs and minix filesystem support.

We should add packages to support other filesystems, like xfs, ext3, ext4, btrfs, etc.

Otherwise the new possibility of provisioning filesystems is mostly useless.

embercsi / ember-csi Goto Github PK

ember-csi's People

Contributors

Stargazers

Watchers

Forkers

ember-csi's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs