GithubHelp home page GithubHelp logo

kubernetes-sigs / aws-fsx-csi-driver Goto Github PK

View Code? Open in Web Editor NEW
118.0 13.0 76.0 13.19 MB

CSI Driver of Amazon FSx for Lustre https://aws.amazon.com/fsx/lustre/

License: Apache License 2.0

Dockerfile 0.66% Makefile 1.87% Go 85.90% Shell 9.08% Mustache 0.63% Python 1.87%
aws fsx csi kubernetes k8s-sig-aws

aws-fsx-csi-driver's Introduction

GitHub release (latest SemVer) Go Report Card

Amazon FSx for Lustre CSI Driver

Overview

The Amazon FSx for Lustre Container Storage Interface (CSI) Driver implements CSI specification for container orchestrators (CO) to manage lifecycle of Amazon FSx for Lustre filesystems.

Troubleshooting

For help with troubleshooting, please refer to our troubleshooting doc.

Installation

For installation and deployment instructions, please refer to our installation doc

CSI Specification Compatibility Matrix

AWS FSx for Lustre CSI Driver \ CSI Version v0.3.0 v1.x.x
v1.2.0 no yes
v1.1.0 no yes
v1.0.0 no yes
v0.10.1 no yes
v0.10.0 no yes
v0.9.0 no yes
v0.8.3 no yes
v0.8.2 no yes
v0.8.1 no yes
v0.8.0 no yes
v0.7.1 no yes
v0.7.0 no yes
v0.6.0 no yes
v0.5.0 no yes
v0.4.0 no yes
v0.3.0 no yes
v0.2.0 no yes
v0.1.0 yes no

Features

The following CSI interfaces are implemented:

  • Controller Service: CreateVolume, DeleteVolume, ControllerExpandVolume, ControllerGetCapabilities, ValidateVolumeCapabilities
  • Node Service: NodePublishVolume, NodeUnpublishVolume, NodeGetCapabilities, NodeGetInfo, NodeGetId
  • Identity Service: GetPluginInfo, GetPluginCapabilities, Probe

FSx for Lustre CSI Driver on Kubernetes

The following sections are Kubernetes-specific. If you are a Kubernetes user, use the following for driver features, installation steps and examples.

Kubernetes Version Compatibility Matrix

AWS FSx for Lustre CSI Driver \ Kubernetes Version v1.11 v1.12 v1.13 v1.14-16 v1.17+
v1.2.0 no no no no yes
v1.1.0 no no no no yes
v1.0.0 no no no no yes
v0.10.1 no no no no yes
v0.10.0 no no no no yes
v0.9.0 no no no no yes
v0.8.3 no no no no yes
v0.8.2 no no no no yes
v0.8.1 no no no no yes
v0.8.0 no no no no yes
v0.7.1 no no no no yes
v0.7.0 no no no no yes
v0.6.0 no no no no yes
v0.5.0 no no no no yes
v0.4.0 no no no yes yes
v0.3.0 no no no yes yes
v0.2.0 no no no yes yes
v0.1.0 yes yes yes no no

Container Images

FSx CSI Driver Version Image
v1.2.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v1.2.0
v1.1.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v1.1.0
v1.0.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v1.0.0
v0.10.1 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.10.1
v0.10.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.10.0
v0.9.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.9.0
v0.8.3 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.8.3
v0.8.2 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.8.2
v0.8.1 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.8.1
v0.8.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.8.0
v0.7.1 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.7.1
v0.7.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.7.0
v0.6.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.6.0
v0.5.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.5.0
v0.4.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.4.0
v0.3.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.3.0
v0.2.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.2.0
v0.1.0 public.ecr.aws/fsx-csi-driver/aws-fsx-csi-driver:v0.1.0

Features

  • Static provisioning - FSx for Lustre file system needs to be created manually first, then it could be mounted inside container as a volume using the Driver.
  • Dynamic provisioning - uses persistent volume claim (PVC) to let Kubernetes create the FSx for Lustre filesystem for you and consumes the volume from inside container.
  • Mount options - mount options can be specified in storageclass to define how the volume should be mounted.

Notes:

  • For dynamically provisioned volumes, only one subnet is allowed inside a storageclass's parameters.subnetId. This is a limitation that is enforced by FSx for Lustre.

Examples

Before the example, you need to:

  • Get yourself familiar with how to setup Kubernetes on AWS and create FSx for Lustre filesystem if you are using static provisioning.
  • When creating FSx for Lustre file system, make sure its VPC is accessible from Kuberenetes cluster's VPC and network traffic is allowed by security group.
    • For FSx for Lustre VPC, you can either create FSx for lustre filesystem inside the same VPC as Kubernetes cluster or using VPC peering.
    • For security group, make sure port 988 is allowed for the security groups that are attached the lustre filesystem ENI.
  • Install FSx for Lustre CSI driver following the Installation steps.

Example links

Development

Please go through CSI Spec and General CSI driver development guideline to get some basic understanding of CSI driver before you start.

Requirements

  • Golang 1.21.0+

Dependency

Dependencies are managed through go module. To build the project, first turn on go mod using export GO111MODULE=on, to build the project run: make

Testing

  • To execute all unit tests, run: make test
  • To execute sanity tests, run: make test-sanity
  • To execute e2e tests, run: make test-e2e

License

This library is licensed under the Apache 2.0 License.

aws-fsx-csi-driver's People

Contributors

arielevs avatar benoitbayol avatar berry2012 avatar buzzsurfr avatar chenrui333 avatar christopherhein avatar chyz198 avatar d-nishi avatar dimitricole avatar gkao123 avatar jacobwolfaws avatar jeffwan avatar jmwurst avatar jpeddicord avatar k8s-ci-robot avatar khoang98 avatar leakingtapan avatar mberga14 avatar nckturner avatar nikhita avatar nxf5025 avatar olemarkus avatar patrickghadban avatar siddharthsalot avatar torredil avatar watsonso avatar wawa0210 avatar wongma7 avatar wuxingro avatar yiweng-amz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-fsx-csi-driver's Issues

Add security group constraints in dynamic provision documentation

Create a PV in dynamic provision mode and get a network failure complain that

 The file system cannot be created because the default security group in the subnet provided or the provided security groups do not permit Lustre LNET network traffic on port 988

This should be documented in README.md example

I0308 00:25:41.624635       1 controller.go:125] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I0308 00:25:41.628548       1 controller.go:39] CreateVolume: called with args &csi.CreateVolumeRequest{Name:"pvc-ffd932c1-4137-11e9-9659-0206ced0c378", CapacityRange:(*csi.CapacityRange)(0xc00042a4b0), VolumeCapabilities:[]*csi.VolumeCapability{(*csi.VolumeCapability)(0xc0001da500)}, Parameters:map[string]string{"s3ImportPath":"s3://dl-benchmark-result", "s3OutputPath":"s3://dl-benchmark-result/export", "securityGroupIds":"sg-xxxxxxxx", "subnetId":"subnet-xxxxxxxxxx"}, ControllerCreateSecrets:map[string]string(nil), VolumeContentSource:(*csi.VolumeContentSource)(nil), AccessibilityRequirements:(*csi.TopologyRequirement)(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
E0308 00:25:42.233480       1 driver.go:88] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-ffd932c1-4137-11e9-9659-0206ced0c378": CreateFileSystem failed: InvalidNetworkSettings: The file system cannot be created because the default security group in the subnet provided or the provided security groups do not permit Lustre LNET network traffic on port 988
	status code: 400, request id: xxxxx-1a24-43d3-a751-xxxxx

Grants controller necessary permissions through policy file

As a customer, I'd like to use policy file and grant permission to node groups rather than using aws_access_key_id and aws_secret_access_key. It's hard for me to provide fine grain control for it because documentation doesn't state minimum permissions CSI driver needs and I have to provide admin access to avoid permission errors and failures.

Solution:
I would suggest to explicitly list permissions in a policy file and then attach to node group.

/feature

Permission usage in blog post (v0.1.0)

Hi all,
as mentioned on the blog post I'm using Github issues for my "support" question.

In the above log post you run EKS 1.12 so it likely used v0.1.0 of this project. Yet the blog post doesn't seem to use a secret.yaml - which is the only permission-giving method that the readme of v0.1.0 mentions. The second option, giving policies to node roles, is only described on current master, which isn't compatible with Kubernetes below <1.14. Using AWS FSx I want to use AWS EKS which doesn't support 1.14 yet.

Can you confirm that I indeed need a secret.yaml with v0.1.0 / blog post reproducing?

With just the instructions from the blog post I'm getting this on kubelet:

.....eu-central-1.compute.internal kubelet[1040]: E0807 13:41:58.313421    1040 kubelet.go:1612] Unable to mount volumes for pod "fsx-linux2_default(da2492b0-b912-11e9-8413-02ed4ee22636)": timeout expired waiting for volumes to attach or mount for pod "default"/"fsx-linux2". list of unmounted volumes=[persistent-storage]. list of unattached volumes=[persistent-storage default-token-vl9v8]; skipping pod
.....eu-central-1.compute.internal kubelet[1040]: E0807 13:41:58.313453    1040 pod_workers.go:186] Error syncing pod da2492b0-b912-11e9-8413-02ed4ee22636 ("fsx-linux2_default(da2492b0-b912-11e9-8413-02ed4ee22636)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"fsx-linux2". list of unmounted volumes=[persistent-storage]. list of unattached volumes=[persistent-storage default-token-vl9v8]
.eu-central-1.compute.internal kubelet[1040]: I0807 13:41:58.613433    1040 kuberuntime_manager.go:513] Container {Name:fsx-plugin Image:amazon/aws-fsx-csi-driver:v0.1.0 Command:[] Args:[--endpoint=$(CSI_ENDPOINT) --logtostderr --v=5] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:CSI_ENDPOINT Value:unix:///var/lib/csi/sockets/pluginproxy/csi.sock ValueFrom:nil} {Name:AWS_ACCESS_KEY_ID Value: ....ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjec
.eu-central-1.compute.internal kubelet[1040]: I0807 13:41:58.613501    1040 kuberuntime_manager.go:513] Container {Name:csi-attacher Image:quay.io/k8scsi/csi-attacher:v0.4.2 Command:[] Args:[--csi-address=$(ADDRESS) --v=5] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:ADDRESS Value:/var/lib/csi/sockets/pluginproxy/csi.sock ValueFrom:nil}] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:socket-dir ReadOnly:false MountPath:/var/lib/....csi/sockets/pluginproxy/ SubPath: MountPropagation:<nil>} {Name:csi-c
.....eu-central-1.compute.internal kubelet[1040]: E0807 13:41:58.617821    1040 kuberuntime_manager.go:733] container start failed: CreateContainerConfigError: secrets "aws-secret" not found
.....eu-central-1.compute.internal kubelet[1040]: I0807 13:41:58.617842    1040 kuberuntime_manager.go:757] checking backoff for container "csi-attacher" in pod "fsx-csi-controller-0_kube-system(e008ee80-b8f0-11e9-8413-02ed4ee22636)"
.....eu-central-1.compute.internal kubelet[1040]: I0807 13:41:58.617909    1040 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=csi-attacher pod=fsx-csi-controller-0_kube-system(e008ee80-b8f0-11e9-8413-02ed4ee22636)
.....eu-central-1.compute.internal kubelet[1040]: E0807 13:41:58.617938    1040 pod_workers.go:186] Error syncing pod e008ee80-b8f0-11e9-8413-02ed4ee22636 ("fsx-csi-controller-0_kube-system(e008ee80-b8f0-11e9-8413-02ed4ee22636)"), skipping: [failed to "StartContainer" for "fsx-plugin" with CreateContainerConfigError: "secrets \"aws-secret\" not found"
.....eu-central-1.compute.internal kubelet[1040]: , failed to "StartContainer" for "csi-attacher" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=csi-attacher pod=fsx-csi-controller-0_kube-system(e008ee80-b8f0-11e9-8413-02ed4ee22636)"

At kubectl I just get a "timeout attaching volume".

What permissions exactly are meant with ".. should have enough permission to create FSx for Lustre filesystem."? Plus - can I supply a session token in the secret manifest? Using which key?

Thanks a lot.

/triage support

P.S.: I can mount the FSx volume on the hosting node, so Lustre vs. Node VPC/SGshould be ok.

Add more e2e tests

Is your feature request related to a problem? Please describe.

  • static provisioning
  • dynamic provisioning with s3

Additional context
Add any other context or screenshots about the feature request here.

How to install v0.1.0?

Now that master branch no longer works on 1.13, is there a one-line way to install v0.1.0 of the driver? Similar to:

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-fsx-csi-driver/master/deploy/kubernetes/manifest.yaml

Integrate with Prow Job

Is your feature request related to a problem? Please describe.
Integrate with Prow Job for following targets:

  • unit test
  • verify
  • e2e

/cc @wongma7

Incorrect project name?

/kind bug

Project name is Amazon FSx for Lustre, not AWS FSx for Lustre. This needs to be fixed.

Claim reference not updated in PV

/kind bug

What happened?
I created a PVC referencing a PV, then when I deleted it the claim ref was not updated in the PV, nor when I created another PVC referencing the same PV (the PVC stays Pending forever).

What you expected to happen?
The claim reference should be updated following the PVC modifications.

How to reproduce it (as minimally and precisely as possible)?
Create a PV and a PVC referencing this PV, then delete the PVC. The PV claim still references the deleted PVC.

Anything else we need to know?:
The workaround is to delete the PV every time the PVC needs to change.

Environment

  • Kubernetes version (use kubectl version): 1.14.8
  • Driver version: 0.0.2

Cut v0.3.0 tag

Changelog

/triage support

Add helm chart

Is your feature request related to a problem? Please describe.
Add helm chart

Update ServiceAccount, ClusterRole and ClusterRoleBinding name

Is your feature request related to a problem? Please describe.
In order to run multiple CSI drivers within the cluster, they should use different names for ServiceAccount, ClusterRole and ClusterRoleBinding for the driver controller service and node service.

Describe the solution you'd like in detail
See EBS CSI driver's manifest as a reference.

fsx-csi-controller-0 stuck in crashloopback after upgrade to CSI 1.0 on master and downgrade to v0.1.0 with kiam

/kind bug

What happened?
I was on the master tag and my cluster upgraded to the version of this week-end with the modification for CSI 1.0 (for k8s 1.14)

I downgraded to v0.1.0 (I have k8s 1.13) by adding right annotations for kiam in the node.yaml and controller.yaml (not using secret.yaml since kiam is here)

I cannot get the controller working again. I have a log on fsx-plugin that says:
could not get metadata from AWS: EC2 instance metadata is not available

What you expected to happen?
Running state on everything

How to reproduce it (as minimally and precisely as possible)?
see upper

Anything else we need to know?:
Not sure if it is driver related or kiam related :/

Environment

  • Kubernetes version (use kubectl version): 1.13
  • Driver version: v0.1.0

Dynamic provision doesn't support more subnets

What happened?

  Warning    ProvisioningFailed    7s    fsx.csi.aws.com_fsx-csi-controller-0_0d924577-4c05-11e9-8258-9298fcb694b4  failed to provision volume with StorageClass "fsx-sc": rpc error: code = Internal desc = Could not create volume "pvc-1ad68536-4c05-11e9-8a3f-0a169557a7b4": CreateFileSystem failed: BadRequest: 1 validation error detected: Value '[subnet-02dd4dbb726318783,subnet-05a291af34a4215e4,subnet-0e4121bf71352369c,subnet-0f4c76eabb0bef2f6,subnet-00fbb4354a2f78801,subnet-0dd49959579368dc2]' at 'subnetIds' failed to satisfy constraint: Member must satisfy constraint: [Member must have length less than or equal to 24, Member must have length greater than or equal to 15, Member must satisfy regular expression pattern: ^(subnet-[0-9a-f]{8,})$]
             status code: 400, request id: 1460fc50-593f-4f32-af2b-baef539109a1

What you expected to happen?
Can not attach multiple subnets

How to reproduce it (as minimally and precisely as possible)?

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
  subnetId: subnet-02dd4dbb726318783,subnet-05a291af34a4215e4,subnet-0e4121bf71352369c,subnet-0f4c76eabb0bef2f6,subnet-00fbb4354a2f78801,subnet-0dd49959579368dc2
  securityGroupIds: sg-069230325a57c415e

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version): 1.11
  • Driver version: 18.06

debug skip test cases

Is your feature request related to a problem? Please describe.
We have several e2e conformance tests that are skipped:

skip="\[Disruptive\]|should.provision.storage.with.mount.options|should.not.mount./.map.unused.volumes.in.a.pod|should unmount if pod is force deleted while kubelet is down"

We need to debug why they are failed and how to fix them.

Transfer ownership to kubernetes-sigs as sig-aws subproject

The intent of this pull request is to complete the steps necessary to migrate aws/aws-fsx-csi-driver repository to k8s-sig-aws as a subproject github.com/kubernetes-sigs/aws-fsx-csi-driver.

The steps below use rules for new and donated repositories:

Complete check-list:

  • Must contain the topic for the sponsoring SIG - e.g. k8s-sig-api-machinery. (Added through the Manage topics link on the repo page.)
  • Must adopt the Kubernetes Code of Conduct
  • All code projects use the Apache License version 2.0. Documentation repositories must use the Creative Commons License version 4.0.
  • Must adopt the CNCF CLA bot, merge bot and Kubernetes PR commands/bots.
  • All OWNERS of the project must also be active SIG members
  • SIG membership must vote using lazy consensus to create a new repository
  • SIG must already have identified all of their existing subprojects and code, with valid OWNERS files, in sigs.yaml
  • All contributors must have signed the CNCF Individual CLA (https://github.com/cncf/cla/blob/master/individual-cla.pdf) or CNCF Corporate CLA (https://github.com/cncf/cla/blob/master/corporate-cla.pdf)
  • Licenses of dependencies are acceptable; project owners can ping @caniszczyk for review of third party deps

Implement create FSx for Lustre with s3 integration

Sample storageclass spec:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
    name: fsx-sc
provisioner: fsx.csi.aws.com
parameters:
    lustreImportPath: s3://my-ml-model/optional-import-prefix/
    lustreExportPath: s3://my-ml-model/optional-export-prefix/
    lustreImportedFileChunkSize: 1024
    subnetId: subnet-056da83524edbe641
    securityGroupIds: sg-086f61ea73388fb6b,sg-0145e55e976000c9e

New Parameter Definitions:

  • lustreImportPath (optional): the path to s3 bucket (including optional prefix) where the data will be imported
  • lustreExportPath (optional): the path to s3 bucket (including optional prefix) where the data will be exported. The path must contain the same s3 bucket as specified in lustreImportPath
  • ImportedFileChunkSize (optional): For files imported from a data repository, this value determines the stripe count and maximum amount of data per file (in MiB) stored on a single physical disk

Ref:

Pods on different nodes fail to mount a static FSx for Lustre instance

/kind bug

What happened?
After creating a static FSx for Lustre, I am able to mount it into a pod from one node. However, if I schedule the pod onto a different node, that is running a fsx-csi-node pod, it times out waiting to mount.

What you expected to happen?
The mount to work consistently across nodes.

How to reproduce it (as minimally and precisely as possible)?
In an EKS cluster with an on demand c5.4xlarge node and a p2.xlarge spot node, use a nodeselector to schedule to the on demand node, observe working state, then reschedule to the spot instance.

Anything else we need to know?:
The instances use the same IAM role, are in different subnets.

The cis-provisioner reports

I1214 01:00:34.267838       1 controller.go:902] Provisioning succeeded, removing PVC 22038550-1e0d-11ea-8f97-06c5af6bfebe from claims in progress
I1214 01:00:34.273701       1 controller.go:979] Final error received, removing PVC 22038550-1e0d-11ea-8f97-06c5af6bfebe from claims in progress

Environment

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.8-eks-b8860f", GitCommit:"b8860f6c40640897e52c143f1b9f011a503d6e46", GitTreeState:"clean", BuildDate:"2019-11-25T00:55:38Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version:
    0.2.0

FSx for Lustre allows two new sizes for filesystems

Is your feature request related to a problem?/Why is this needed
Per Amazon FSx for Lustre Reduces Minimum File System Size to 1.2 TBs, FSx for Lustre works for sizes of 1,200 GiB, 2,400 GiB, or increments of 3,600 GiB.

/feature

Describe the solution you'd like in detail
Configure this driver to also allow filesystem sizes of 1200 GiB and 2400 GiB (in addition to all other increments of 3600 GiB).

Describe alternatives you've considered
None. Parity to match AWS feature/service.

Additional context
None

Surface error of lustre mount error

Is your feature request related to a problem? Please describe.
/bug

kubectl logs -f fsx-csi-node-6v8g5 -c fsx-plugin -n kube-system
…
I0311 22:17:04.776307       1 node.go:113] NodeGetCapabilities: called with args 
I0311 22:17:04.776878       1 node.go:113] NodeGetCapabilities: called with args 
I0311 22:17:04.781762       1 node.go:43] NodePublishVolume: called with args volume_id:"fs-0cb16fe804b83c4b7" target_path:"/var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount" volume_capability:<mount:<> access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_attributes:<key:"dnsname" value:"fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com" > 
I0311 22:17:04.781832       1 node.go:77] NodePublishVolume: creating dir /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
I0311 22:17:04.781856       1 node.go:82] NodePublishVolume: mounting fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx at /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
I0311 22:17:04.781871       1 mount_linux.go:146] Mounting cmd (mount) with arguments ([-t lustre fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount])
E0311 22:18:06.734519       1 mount_linux.go:151] Mount failed: exit status 5
Mounting command: mount
Mounting arguments: -t lustre fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
Output: mount.lustre: mount fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx at /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount failed: Input/output error
Is the MGS running?
 
E0311 22:18:06.734603       1 driver.go:88] GRPC error: rpc error: code = Internal desc = Could not mount "fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx" at "/var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount": mount failed: exit status 5
Mounting command: mount
Mounting arguments: -t lustre fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
Output: mount.lustre: mount fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx at /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount failed: Input/output error
Is the MGS running?
 
I0311 22:19:21.789527       1 node.go:113] NodeGetCapabilities: called with args 
I0311 22:19:21.790112       1 node.go:113] NodeGetCapabilities: called with args 
I0311 22:19:21.795139       1 node.go:43] NodePublishVolume: called with args volume_id:"fs-0cb16fe804b83c4b7" target_path:"/var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount" volume_capability:<mount:<> access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_attributes:<key:"dnsname" value:"fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com" > 
I0311 22:19:21.795210       1 node.go:77] NodePublishVolume: creating dir /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
I0311 22:19:21.795235       1 node.go:82] NodePublishVolume: mounting fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx at /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount
I0311 22:19:21.795247       1 mount_linux.go:146] Mounting cmd (mount) with arguments ([-t lustre fs-0cb16fe804b83c4b7.fsx.us-east-1.amazonaws.com@tcp:/fsx /var/lib/kubelet/pods/517e11f2-443f-11e9-ab62-0ea6535d36ce/volumes/kubernetes.io~csi/tensorpack-fsx/mount])
kubectl get pod my-fsx -n kubeflow -o wide
NAME      READY     STATUS              RESTARTS   AGE       IP        NODE                              NOMINATED NODE
my-fsx    0/1       ContainerCreating   0          1h        <none>    ip-192-168-206-203.ec2.internal   <none>

flock mountoption

We are running some bioinformatic pipeline with NextFlow, which requires file lock when creating the local database to manage the subprocesses. It requires mountoptions flock, or localflock enabled. Unfortunately, I got the error message saying the mountoption is not
supported.

Availability zone topology awareness

Is your feature request related to a problem? Please describe.
FSx file systems reside in 1 zone. They can be accessed from other AZs in the region but then that incurs data transfers charges.

From the FSx for Lustre creation page: "For best performance with your file system, access it from compute instances within the Availability Zone that it resides in. “Accessing your file system from other Availability Zones also incurs data transfer charges. For more information, see Data Transfer on the Amazon EC2 Pricing page."

Describe the solution you'd like in detail
By default, CreateVolume should create PVs with a node affinity matching the zone (based on the subnet in the StorageClass) the file system is created in.

Describe alternatives you've considered
Dynamic provisioning would still work without this change but this would be a good improvement.

Additional context

No data available when opening a file in lustre

/kind bug

What happened?
We have created the file system manually and using the static provisioning to create pvc. It gets mounted successfully and I can list the files in fsx.
-rwxr-xr-x 1 root root 2.6K Apr 11 14:43 romain_manual_fix.csv

However, when I cat a file to see the content I get No data available. when I open it in an editor it is empty.

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-07-10T10:13:58Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
  • Driver version:

Add node selector for linux node only

Is your feature request related to a problem? Please describe.
Since the driver won't work on Windows, we need to add node selector to driver manifest to only deploy it linux nodes.

Add e2e tests

Is your feature request related to a problem? Please describe.
We should add e2e test for supported features:

  • static provisioning
  • dynamic provisioning
  • dynamic provisioning with s3
  • flock mount option

Upgrade to golang 1.12

Is your feature request related to a problem? Please describe.
Upgrade to golang 1.12

Add version command

./aws-fsx-csi-driver --version should return version with following information in Json format including:

  • git commit sha
  • driver version
  • go compiler version
  • build date
  • Platform OS and Arch

Removes driver manifest's dependency on secret

Is your feature request related to a problem? Please describe.
Currently, aws secret is a hard dependency of driver manifest file even when instance profile role is used for IAM authentication. We should update the secret vendoring in manifest file to make it optional instead of as a hard dependency

Migration to use shared testing orchestration framework

Is your feature request related to a problem? Please describe.
Migration to use shared test orchestration framework. We will develop a shared test orchestration framework that contains the code for creating/tearing down test clusters (EKS and Kops), and for setting up drivers for testing. We should migrate to use the new framework once it is ready.

Additional context
This is independent of the kubernetes testing framework/utility that provides an easy way to retrieve / create / delete / assert API objects for testing.

Releated to: aws/aws-k8s-tester#48

/cc @wongma7

Implement s3 export in CSI

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like in detail
The current S3 export requires users to run a container in privileged mode with CAP_SYS_ADMIN capability. It's insecure to expose this permission to the end user for the export purpose.

Describe alternatives you've considered
This functionality should be handled by CSI driver if s3Export is specified in storageclass.

Additional context
No

MountOptions not honored

/kind bug

What happened?
Following the static provisioning example with a StorageClass containing:

mountOptions:
- flock

Results in the fsx-csi-node logging:

NodePublishVolume: mounting fs-02a9921e59e81b716.fsx.us-east-1.amazonaws.com@tcp:/fsx at /var/lib/kubelet/pods/fe090689-7b05-46fd-8095-3389ff3d80fa/volumes/kubernetes.io~csi/fsx-pv/mount with options []

And the volume is mounted without flock:

172.19.184.185@tcp:/fsx on /var/lib/kubelet/pods/fe090689-7b05-46fd-8095-3389ff3d80fa/volumes/kubernetes.io~csi/fsx-pv/mount type lustre (rw,lazystatfs)

What you expected to happen?

The mountOption attribute to be respected.

How to reproduce it (as minimally and precisely as possible)?

Deploy the example manifests in https://github.com/kubernetes-sigs/aws-fsx-csi-driver/tree/master/examples/kubernetes/static_provisioning

Environment

  • Kubernetes version (use kubectl version): 1.15.6
  • Driver version: tried both v0.2.0 and latest

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.