openshift / aws-efs-operator Goto Github PK

View Code? Open in Web Editor NEW

3.0 41.0 23.0 335 KB

Operator to manage AWS EFS on OpenShift

License: Apache License 2.0

Makefile 4.98% Dockerfile 0.21% Shell 30.29% Go 57.88% Python 6.63%

osdv4

aws-efs-operator's Introduction

AWS EFS Operator for OpenShift Dedicated

AWS EFS Operator for OpenShift Dedicated

This is an operator to manage read-write-many access to AWS EFS volumes in an OpenShift Dedicated cluster.

Overview

The operator watches for instances of a custom resource called SharedVolume. One SharedVolume enables mounting an EFS access point by creating a PersistentVolumeClaim you can use in a volume definition in a pod. Such mounts are ReadWriteMany -- i.e. assuming proper ownership and permissions, the contents are readable and writable by multiple containers, in different pods, on different worker nodes, in different namespaces or availability zones.

Pods in the same namespace can use the same SharedVolume's PersistentVolumeClaim to mount the same access point.

A SharedVolume specifying the same access point in a different namespace can be created to enable mounting the same access point by pods in different namespaces.

You can create SharedVolumes specifying different access points to create distinct data stores.

Installing

This operator is available via OperatorHub. More detailed information can be found here.

Usage

AWS EFS and Access Points

(A detailed discussion of EFS is beyond the scope of this document.)

Create an EFS file system, configured appropriately with respect to VPC, availability zones, etc.

Create a separate access point for each distinct data store you wish to access from your cluster. Be sure to configure ownership and permissions that will allow read and/or write access by your pod's uid/gid as desired.

Access points need not be backed by separate EFS file systems.

Working with `SharedVolume` resources

Create a `SharedVolume`.

This operator's custom resource, SharedVolume (which can be abbreviated sv) requires two pieces of information:

The ID of the EFS file system, which will look something like fs-1234cdef.
The ID of the Access Point, which will look something like fsap-0123456789abcdef.

Here is an example SharedVolume definition:

apiVersion: aws-efs.managed.openshift.io/v1alpha1
kind: SharedVolume
metadata:
  name: sv1
spec:
  accessPointID: fsap-0123456789abcdef
  fileSystemID: fs-1234cdef

If the above definition is in the file /tmp/sv1.yaml, create the resource with the command:

$ oc create -f /tmp/sv1.yaml
sharedvolume.aws-efs.managed.openshift.io/sv1 created

Note that a SharedVolume is namespace scoped. Create it in the same namespace in which you wish to run the pods that will use it.

Monitor the `SharedVolume`.

Watch the SharedVolume using oc get:

$ oc get sv sv1
NAME   FILE SYSTEM   ACCESS POINT             PHASE    CLAIM     MESSAGE
sv1    fs-1234cdef   fsap-0123456789abcdef    Pending

When the operator has finished its work, the PHASE will become Ready and a name will appear in the CLAIM column:

$ oc get sv sv1
NAME   FILE SYSTEM   ACCESS POINT             PHASE   CLAIM     MESSAGE
sv1    fs-1234cdef   fsap-0123456789abcdef    Ready   pvc-sv1

Check the `PersistentVolumeClaim`.

The CLAIM is the name of a PersistentVolumeClaim created by the operator in the same namespace as the SharedVolume. Validate that the PersisentVolumeClaim is ready for use by ensuring it is Bound:

$ oc get pvc pvc-sv1
NAME      STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-sv1   Bound    pv-proj2-sv1   1          RWX            efs-sc         23s

Create Pod(s).

Use the PersistentVolumeClaim in a pod's volume definition. For example:

kind: Pod
metadata:
  name: pod1
spec:
  volumes:
    - name: efsap1
      persistentVolumeClaim:
        claimName: pvc-sv1
  containers:
    - name: test-efs-pod
      image: centos:latest
      command: [ "/bin/bash", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      volumeMounts:
        - mountPath: /mnt/efs-data
          name: efsap1

$ oc create -f /tmp/pod1.yaml
pod/pod1 created

Validate access.

Within the pod's container, you should see the specified mountPath with the ownership and permissions you used when you created the access point in AWS. This should allow read and/or write access with normal POSIX semantics.

$ oc rsh pod1
sh-4.4$ cd /mnt/efs-data
sh-4.4$ ls -lFd .
drwxrwxr-x. 2 1000123456 root 6144 May 14 16:47 ./
sh-4.4$ echo "Hello world" > f1
sh-4.4$ cat f1
Hello world

Cleaning up

Once all pods using a SharedVolume have been destroyed, delete the SharedVolume:

$ oc delete sv sv1
sharedvolume.aws-efs.managed.openshift.io "sv1" deleted

The associated PersistentVolumeClaim is deleted automatically.

Note that the data in the EFS file system persists even if all associated SharedVolumes have been deleted. A new SharedVolume to the same access point will reveal that same data to attached pods.

Uninstalling

Uninstalling currently requires the following steps:

Delete all workloads using PersistentVolumeClaims generated by the operator.
Remove all instances of the SharedVolume CR from all namespaces. The operator will automatically remove the associated PVs and PVCs.
Uninstall the operator via OCM:
- Navigate to Operators => Installed Operators.
- Find and click "AWS EFS Operator".
- Click Actions => Uninstall Operator.
- Click "Uninstall".
Delete the SharedVolume CRD. This will trigger deletion of the remaining operator-owned resources. This must be done as cluster-admin:

      $ oc delete -n crd/sharedvolumes.aws-efs.managed.openshift.io

Troubleshooting

If you uninstall the operator while SharedVolume resources still exist, attempting to delete the CRD or SharedVolume CRs will hang on finalizers. In this state, attempting to delete workloads using PersistentVolumeClaims associated with the operator will also hang. If this happens, reinstall the operator, which will reconcile the current state appropriately and allow any pending deletions to complete. Then perform the uninstallation steps in order.

Limitations, Caveats, Known Issues

Size doesn't matter

You may notice that the PersistentVolumeClaim (and its associated PersistentVolume) created at the behest of your SharedVolume has a CAPACITY value. This is meaningless. The backing file system is elastic (hence the name) and grows as needed to a maximum of 47.9TiB unless it hits some other limit (e.g. a quota) first. However, the kubernetes APIs for PersistentVolume and PersistentVolumeClaim require that a value be specified. The number we chose is arbitrary.

Don't edit `SharedVolume`s

You can't switch out an access point or file system identifier in flight. If you need to connect your pod to a different access point, create a new SharedVolume. If you no longer need the old one, delete it.

We feel strongly enough about this that the operator is designed to try to "un-edit" your SharedVolume if it detects a change.

Don't mess with generated `PersistentVolumeClaim`s (or `PersistentVolume`s)

PersistentVolumeClaims are normally under the user's purview. However, deleting or modifying the PersistentVolumeClaim (or PersistentVolume) associated with a SharedVolume can leave it in an unusable state, even if the operator is able to resurrect the resources themselves.

The only supported way to delete a PersistentVolumeClaim (or PersistentVolume) associated with a SharedVolume is to delete the SharedVolume and let the operator do the rest.

Under the hood

The operator has two controllers. One monitors the resources necessary to run the AWS EFS CSI driver. These are set up once and should never change, except on operator upgrade.

The other controller is responsible for SharedVolume resources. It monitors all namespaces, allowing SharedVolumes to be created in any namespace. It creates a PersistentVolume/PersistentVolumeClaim pair for each SharedVolume. Though the PersistentVolume is technically cluster-scoped, it is inextricably bound to its PersistentVolumeClaim, which is namespace-scoped.

See the design for more.

aws-efs-operator's People

Contributors

Stargazers

Watchers

aws-efs-operator's Issues

How to create a custom named PVC ?

Everything works in my setup, however, i have one question, how can I create a custom named PVC when creating a shared volume.

apiVersion: aws-efs.managed.openshift.io/v1alpha1
kind: SharedVolume
metadata:
  name: okd-efs-1
  namespace: 3scale
spec:
  accessPointID: fsap-024004066af9dfb96
  fileSystemID: fs-269990f7
status:
  claimRef:
    apiGroup: ''
    kind: PersistentVolumeClaim
    name: pvc-okd-efs-1
  phase: Ready

Here are the PVC

NAME                    STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-okd-efs-1           Bound     pv-3scale-okd-efs-1                        1Gi        RWX            efs-sc         14m

Here are the PV

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   REASON   AGE
pv-3scale-okd-efs-1                        1Gi        RWX            Retain           Bound    3scale/pvc-okd-efs-1           efs-sc                  15m

Suppose i wanted to have a PVC with name Shared-Storage how can i do that ?

Statics don't get deleted

When the operator is installed, the statics controller creates the following resources:

Namespace scoped:
- DaemonSet (running the CSI driver)
- ServiceAccount
Cluster scoped:
- SecurityContextConstraints
- CSIDriver
- StorageClass

Currently (master commit 7eb38e7) there is no cleanup of these resources when the operator is unintsalled. (There is a delete_statics.sh script but it needs to be run by an admin.)

Jira: OSD-4083

2023 Change Freeze | branch:master

master branch is frozen for merges until Friday Jan-06-2023 @ 14:00 UTC

Handling PVCs directly rather than using SharedVolume

Many applications create PVC objects directly, but the AWS EFS operator seem to exclusively create them indirectly, via SharedVolume objects.

This design approach precludes the usage of the AWS EFS operator with a significant set of applications. As an added complication, the PVC indirectly created in association with the SharedVolume has a name generated as a function of the name of the SharedVolume, so that applications not only cannot create a PVC directly against the efs-sc storage class, they cannot make assumptions about the name of the PVC either, further limiting the range of use cases possible with the operator.

Would it be possible to get the operator to process PVC objects directly when they reference the efs-sc storage class, thus allowing a much wider range of scenarios where an application's configuration only has room for the name of the storage class and creates PVCs directly against that storage class?

SecurityContextConstraints don't get cleaned up

#21 almost fixed #15, but the OwnerReference on the SecurityContextConstraints efs-csi-scc doesn't seem to have the same effect as it does on all the other objects. The root cause is still being investigated, but this may be an upstream bug.

Operator projects using the removed APIs in k8s 1.22 requires changes.

Problem Description

Kubernetes has been deprecating API(s), which will be removed and are no longer available in 1.22. Operators projects using these APIs versions will not work on Kubernetes 1.22 or any cluster vendor using this Kubernetes version(1.22), such as OpenShift 4.9+. Following the APIs that are most likely your projects to be affected by:

apiextensions.k8s.io/v1beta1: (Used for CRDs and available since v1.16)
rbac.authorization.k8s.io/v1beta1: (Used for RBAC/rules and available since v1.8)
admissionregistration.k8s.io/v1beta1 (Used for Webhooks and available since v1.16)

Therefore, looks like this project distributes solutions in the repository and does not contain any version compatible with k8s 1.22/OCP 4.9. (More info). Following some findings by checking the distributions published:

aws-efs-operator.v0.0.5: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["sharedvolumes.aws-efs.managed.openshift.io"])
aws-efs-operator.v0.0.1: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["sharedvolumes.aws-efs.managed.openshift.io"])
aws-efs-operator.v0.0.2: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["sharedvolumes.aws-efs.managed.openshift.io"])
aws-efs-operator.v0.0.3: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["sharedvolumes.aws-efs.managed.openshift.io"])
aws-efs-operator.v0.0.4: this distribution is using APIs which were deprecated and removed in v1.22. More info: https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22. Migrate the API(s) for CRD: (["sharedvolumes.aws-efs.managed.openshift.io"])

NOTE: The above findings are only about the manifests shipped inside of the distribution. It is not checking the codebase.

How to solve

It would be very nice to see new distributions of this project that are no longer using these APIs and so they can work on Kubernetes 1.22 and newer and published in the community-operators collection. OpenShift 4.9, for example, will not ship operators anymore that do still use v1beta1 extension APIs.

Due to the number of options available to build Operators, it is hard to provide direct guidance on updating your operator to support Kubernetes 1.22. Recent versions of the OperatorSDK greater than 1.0.0 and Kubebuilder greater than 3.0.0 scaffold your project with the latest versions of these APIs (all that is generated by tools only). See the guides to upgrade your projects with OperatorSDK Golang, Ansible, Helm or the Kubebuilder one. For APIs other than the ones mentioned above, you will have to check your code for usage of removed API versions and upgrade to newer APIs. The details of this depend on your codebase.

If this projects only need to migrate the API for CRDs and it was built with OperatorSDK versions lower than 1.0.0 then, you maybe able to solve it with an OperatorSDK version >= v0.18.x < 1.0.0:

$ operator-sdk generate crds --crd-version=v1
INFO[0000] Running CRD generator.
INFO[0000] CRD generation complete.

Alternatively, you can try to upgrade your manifests with controller-gen (version >= v0.4.1) :

If this project does not use Webhooks:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role paths="./..."

If this project is using Webhooks:

Add the markers sideEffects and admissionReviewVersions to your webhook (Example with sideEffects=None and admissionReviewVersions={v1,v1beta1}: memcached-operator/api/v1alpha1/memcached_webhook.go):
Run the command:

$ controller-gen crd:trivialVersions=true,preserveUnknownFields=false rbac:roleName=manager-role webhook paths="./..."

For further information and tips see the comment.

aws-efs operator pod in CrashLoopBackOff state

Hello team,

I am using OSD 4.9.0 on my production cluster. Post installing the aws-efs operator, I noticed that the efs-operator pod is in CrashLoopBackOff state. Below are the pods logs located under openshift-operators namespace.

{"level":"info","ts":1636367532.7972922,"logger":"cmd","msg":"Operator Version: 0.0.1"}
{"level":"info","ts":1636367532.7973247,"logger":"cmd","msg":"Go Version: go1.16"}
{"level":"info","ts":1636367532.7973301,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1636367532.7973354,"logger":"cmd","msg":"Version of operator-sdk: v0.16.0"}
{"level":"info","ts":1636367532.7976806,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1636367535.6674385,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1636367535.6674695,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1636367538.522777,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1636367538.5229383,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1636367541.3824425,"logger":"cmd","msg":"Found. Checking whether update is needed.","resource":{"namespace":"openshift-operators","name":"efs-csi-sa"}}
{"level":"info","ts":1636367541.3824894,"logger":"cmd","msg":"No update needed."}
{"level":"info","ts":1636367541.3878217,"logger":"cmd","msg":"Found. Checking whether update is needed.","resource":{"name":"efs-csi-scc"}}
{"level":"info","ts":1636367541.3879697,"logger":"cmd","msg":"No update needed."}
{"level":"info","ts":1636367541.3955388,"logger":"cmd","msg":"Found. Checking whether update is needed.","resource":{"namespace":"openshift-operators","name":"efs-csi-node"}}
{"level":"info","ts":1636367541.395598,"logger":"cmd","msg":"Update needed. Updating..."}
{"level":"info","ts":1636367541.4047499,"logger":"cmd","msg":"Updated.","resource":{"namespace":"openshift-operators","name":"efs-csi-node"}}
{"level":"error","ts":1636367541.4048483,"logger":"cmd","msg":"Failed to retrieve.","resource":{"name":"efs.csi.aws.com"},"error":"no matches for kind \"CSIDriver\" in version \"storage.k8s.io/v1beta1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nopenshift/aws-efs-operator/pkg/util.(*EnsurableImpl).Ensure\n\t/workdir/pkg/util/ensurable.go:87\nopenshift/aws-efs-operator/pkg/controller/statics.EnsureStatics\n\t/workdir/pkg/controller/statics/statics.go:201\nmain.main\n\t/workdir/cmd/manager/main.go:167\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
{"level":"info","ts":1636367541.4092035,"logger":"cmd","msg":"Found. Checking whether update is needed.","resource":{"name":"efs-sc"}}
{"level":"info","ts":1636367541.4092307,"logger":"cmd","msg":"No update needed."}
{"level":"error","ts":1636367541.4092348,"logger":"cmd","msg":"Couldn't bootstrap static resources","error":"Encountered 1 error(s) ensuring statics","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nmain.main\n\t/workdir/cmd/manager/main.go:168\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}

Can someone please help me in this.

Thanks,
Pushkar

EOL Freeze

Issue to block PRs and merges now that operator is EOL

Does aws-efs-operator's 0.0.8 support dynamic provisioning of AWS ?

This is a followup on my filed issue for aws-efs-operator 0.0.5 #36
It was dependent upon the Openshift RFE openshift/enhancements#687 . This has been merged.
In the OperatorHub, I see AWS EFS Operator 0.0.8 provided by Red Hat pointing to container image sha256sum: 9c87267
Does this aws-efs-operator's 0.0.8 support dynamic provisioning of AWS ?

Upgrade aws-efs-operator's 0.0.5 to support dynamic provisioning of AWS EFS

AWS EFS Operator 0.0.5 provided by Red Hat is pointing towards 9 months old tag: https://quay.io/repository/app-sre/aws-efs-operator?tag=4bc180b

The existing Amazon EFS CSI driver supports dynamic provisioning.

Please upgrade the AWS EFS Operator.