GithubHelp home page GithubHelp logo

maorfr / skbn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nuvo/skbn

66.0 3.0 36.0 111 KB

Copy files and directories between Kubernetes and cloud storage

License: Apache License 2.0

Dockerfile 0.56% Makefile 4.46% Go 94.97%
kubernets go golang aws s3 azure abs azure-blob-storage gcp gcs google-cloud-storage minio

skbn's Introduction

Release Travis branch Docker Pulls Go Report Card license

Skbn

Skbn is a tool for copying files and directories between Kubernetes and cloud storage providers. It is named after the 1981 video game Sokoban. Skbn uses an in-memory buffer for the copy process, to avoid excessive memory consumption. Skbn currently supports the following providers:

  • AWS S3
  • Minio S3
  • Azure Blob Storage
  • Google Cloud Storage

Install

Prerequisites

  1. git
  2. dep

From a release

Download the latest release from the Releases page or use it with a Docker image

From source

mkdir -p $GOPATH/src/github.com/maorfr && cd $_
git clone https://github.com/maorfr/skbn.git && cd skbn
make

Usage

Copy files from Kubernetes to S3

skbn cp \
    --src k8s://<namespace>/<podName>/<containerName>/<path> \
    --dst s3://<bucket>/<path>

Copy files from S3 to Kubernetes

skbn cp \
    --src s3://<bucket>/<path> \
    --dst k8s://<namespace>/<podName>/<containerName>/<path>

Copy files from Kubernetes to Azure Blob Storage

skbn cp \
    --src k8s://<namespace>/<podName>/<containerName>/<path> \
    --dst abs://<account>/<container>/<path>

Copy files from Azure Blob Storage to Kubernetes

skbn cp \
    --src abs://<account>/<container>/<path> \
    --dst k8s://<namespace>/<podName>/<containerName>/<path>

Copy files from Kubernetes to Google Cloud Storage

skbn cp \
    --src k8s://<namespace>/<podName>/<containerName>/<path> \
    --dst gcs://<bucket>/<path>

Advanced usage

Copy files from source to destination in parallel

skbn cp \
    --src ... \
    --dst ... \
    --parallel <n>
  • n is the number of files to be copied in parallel (for full parallelism use 0)

Set in-memory buffer size

Skbn copies files using an in-memory buffer. To control the buffer size:

skbn cp \
    --src ... \
    --dst ... \
    --buffer-size <f>
  • f is the in-memory buffer size (in MB) to use for files copy. This flag should be used with caution when used in conjunction with --parallel
  • The default value for buffer-size is 6.75 MB, and was decided based on benchmark

Minio S3 support

Skbn supports file copy from and to a Minio S3 endpoint. To let skbn know how your minio is configured, you can set the following environment variables:

AWS_ACCESS_KEY_ID=<your username>
AWS_SECRET_ACCESS_KEY=<your password>
AWS_S3_ENDPOINT=http(s)://<host>:<port>
AWS_S3_NO_SSL=true # disables SSL
AWS_S3_FORCE_PATH_STYLE=true # enforce path style bucket access

Added bonus section

Copy files from S3 to Azure Blob Storage

skbn cp \
    --src s3://<bucket>/<path> \
    --dst abs://<account>/<container>/<path>

Copy files from Azure Blob Storage to S3

skbn cp \
    --src abs://<account>/<container>/<path> \
    --dst s3://<bucket>/<path>

Copy files from Kubernetes to Kubernetes

skbn cp \
    --src k8s://<namespace>/<podName>/<containerName>/<path> \
    --dst k8s://<namespace>/<podName>/<containerName>/<path>

Copy files from S3 to S3

skbn cp \
    --src s3://<bucket>/<path> \
    --dst s3://<bucket>/<path>

Copy files from Azure Blob Storage to Azure Blob Storage

skbn cp \
    --src abs://<account>/<container>/<path> \
    --dst abs://<account>/<container>/<path>

Credentials

Kubernetes

Skbn tries to get credentials in the following order:

  1. if KUBECONFIG environment variable is set - skbn will use the current context from that config file
  2. if ~/.kube/config exists - skbn will use the current context from that config file with an out-of-cluster client configuration
  3. if ~/.kube/config does not exist - skbn will assume it is working from inside a pod and will use an in-cluster client configuration

AWS

Skbn uses the default AWS credentials chain. In addition, the AWS_REGION environment variable should be set (default is eu-central-1).

Azure Blob Storage

Skbn uses AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY environment variables for authentication.

Google Cloud Storage

Skbn uses Google Application Default Credentials. Basically, it will first look for the GOOGLE_APPLICATION_CREDENTIALS environment variable. If it is not defined, it will look for the default service account, or throw an error if none is configured.

Examples

  1. In-cluster example
  2. Code example

skbn's People

Contributors

alexbarta avatar ivanovoleg avatar jaloliebherr avatar maorfr avatar simon3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

skbn's Issues

Support for local filesystem

Is it possible to support a file: prefix to be able to copy from/to the local filesystem from within the container itself? It would be useful to be able to download/upload things from remote stores into the local/shared container volume mount as an init container, etc.

Support Compression Option (gzip)

This is a great tool for backups but ideally we could have an option to compress the files on the way to cloud storage.

This seems possible by using a Pipe through the compression/gzip module.

As for reading files, it appears the readers may already support decompression based the content disposition.

[Question] Any examples to restore from thinBackup zipped file into Jenkins in GKE?

https://github.com/helm/charts/tree/master/stable/jenkins

Restore from backup
To restore a backup, you can use the kube-tasks underlying tool called skbn, which copies files from cloud storage to Kubernetes. The best way to do it would be using a Job to copy files from the desired backup tag to the Jenkins pod. See the skbn in-cluster example for more details.


I have uploaded into Google Storage:

gsutil cp ~/Desktop/jenkins-full.tgz gs://migrate_sample

so the:

Authenticated URL: https://storage.cloud.google.com/migrate_sample/jenkins-full.tgz?authuser=1
gsutil URI: gs://migrate_sample.tgz

How to use the following jobs.yaml? I see it is not configured for GCP but AWS?

apiVersion: batch/v1
kind: Job1
metadata:
  labels:
    app: skbn
  name: skbn
spec:
  template:
    metadata:
      labels:
        app: skbn
      annotations:
        iam.amazonaws.com/role: skbn # We are using kube2iam <- Swap skbn -> kube2iam and create ServiceAccount? https://github.com/jtblin/kube2iam/blob/master/examples/eks-example.yml
    spec:
      restartPolicy: OnFailure
      serviceAccountName: skbn
      containers:
      - name: skbn
        image: maorfr/skbn
        command: ["skbn"]
        args:
        - cp
        - --src
        - k8s://namespace/pod/container/path/to/copy/from
        - --dst
        - s3://bucket/path/to/copy/to
        imagePullPolicy: IfNotPresent
        env:
        - name: AWS_REGION
          value: eu-central-1

Also in this case I would need to swap src and dst from example to:

    - --src
    - gs://migrate_sample/jenkins-full.tgz
    - --dst
    - k8s://namespace/pod/container/path/to/copy/to

How do we get k8s://namespace/pod/container/path/to/copy/to

Assuming namespace is jenkins, would it be

k8s://jenkins/my-jenkins-0/var/backups

Files failing to copy does not exit the process

I am not sure if this is due to using the parallelism setting or not, however if a file fails to copy (in our case from K8S to AWS) then the parent process is not terminated. This leads to copy jobs sat spinning making no progress for days at a time. The desired behaviour here is that, should a file fail to copy, the entire process should exit with a non-0 status code to allow the copy to be retried if desired.

Help with permissions.

I am trying to restore files from S3 to a Jenkins Helm instillation.

I have skbn installed locally and get the following error. I have tried running as an admin and the Jenkins service account we have in AWS.

error in Stream: Unauthorized dst: file: default/[pod-id]/jenkins-master/var/jenkins_home/jobs/admin/config.xml

I also setup the in-cluster from the examples and get the same error.

The path in Jenkins is owned by root

If I do not add the AWS secret and just use ~/.kube/config I get the following error:

2019/09/19 10:17:15 AccessDenied: Access Denied
	status code: 403, request id: 926596AA106D4016, host id: QgFAcBC8IHi2KgMjnYvz2uL7Ozx0[...]

What could I be missing in any of the solutions I have tried?

go mod tidy build fail

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-X main.GitTag=0.6.0 -X main.GitCommit=326c955" -o bin/skbn cmd/skbn.go

github.com/maorfr/skbn/pkg/skbn

../../../test/pkg/mod/github.com/maorfr/[email protected]/pkg/skbn/abs.go:87:83: not enough arguments in call to bu.Download
have (context.Context, number, number, azblob.BlobAccessConditions, bool)
want (context.Context, int64, int64, azblob.BlobAccessConditions, bool, azblob.ClientProvidedKeyOptions)
../../../test/pkg/mod/github.com/maorfr/[email protected]/pkg/skbn/kube.go:229:19: clientset.Core undefined (type *kubernetes.Clientset has no field or method Core)
../../../test/pkg/mod/github.com/maorfr/[email protected]/pkg/skbn/skbn.go:135:14: undefined: nio
make: *** [Makefile:21: build] Error 1

Parallelize transfers

Please parallelize the file transfers, ~4 transfers concurrently.

This will greatly speed up the overall transfer process.

AWS SSO Support

I don't think this is currently supporting SSO, right?
It looks like it looks for aws credentials (including access key and secret, but when using SSO, you would not have access key or secret. All cli commands work fine, but the tool fails, probably because it's not finding the access key and secret in the config/credentials.

Possibility to exclude files

Hi,

Would if be possible to exclude certain paths/folders from being copied?

Is this feature desirable for your project? If I were to implement such a feature to send a PR, do you have any tips to where it would be best implemented in your code base?

Thanks

[Question] go get github.com/maorfr/skbn/pkg/skbn fails?

go get github.com/maorfr/skbn/pkg/skbn

โœ— go get github.com/maorfr/skbn/pkg/skbn

github.com/maorfr/skbn/pkg/skbn

../../../../../golib/pkg/mod/github.com/maorfr/[email protected]/pkg/skbn/abs.go:87:24: not enough arguments in call to bu.BlobURL.Download
        have (context.Context, number, number, azblob.BlobAccessConditions, bool)
        want (context.Context, int64, int64, azblob.BlobAccessConditions, bool, azblob.ClientProvidedKeyOptions)
../../../../../golib/pkg/mod/github.com/maorfr/[email protected]/pkg/skbn/kube.go:229:18: clientset.Core undefined (type *kubernetes.Clientset has no field or method Core)

asdf current golang
golang 1.17.5

Mac OS Monterey
12.0.1

Fails when filesize is too large

I am using the Cassandra Cain tool (https://github.com/maorfr/cain) to backup a Cassandra datacenter and it is failing when backing up files which are 500 GB in size.

Having looked at the code for this project, I've noticed that this project is using the default UploadPartSize of 5 megabytes (https://github.com/aws/aws-sdk-go/blob/fe72a52350a8962175bb71c531ec9724ce48abd8/service/s3/s3manager/upload.go#L26), this gives us the ability to files up to 52 GB in size.

Please can you consider increasing this value to 64 bytes so SKBN is able to upload files up to 671 GB in size?

Cannot install skbn: make: *** [Makefile:46: bootstrap] Error 1

i am using Ubuntu 18

(base) hahamark@hahamark-ThinkPad-X1-Carbon-7th:/src/github.com/maorfr$ git clone https://github.com/maorfr/skbn.git && cd skbn
fatal: could not create work tree dir 'skbn': Permission denied
(base) hahamark@hahamark-ThinkPad-X1-Carbon-7th:/src/github.com/maorfr$ sudo git clone https://github.com/maorfr/skbn.git && cd skbn
Cloning into 'skbn'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 641 (delta 1), reused 5 (delta 1), pack-reused 634
Receiving objects: 100% (641/641), 109.93 KiB | 453.00 KiB/s, done.
Resolving deltas: 100% (305/305), done.
(base) hahamark@hahamark-ThinkPad-X1-Carbon-7th:/src/github.com/maorfr/skbn$ make
dep ensure
/src/github.com/maorfr/skbn is not within a known GOPATH/src
make: *** [Makefile:46: bootstrap] Error 1

any ideas of how to fix it?

Go modules

Could the dependency configuration please be updated to modern go mod?

Minor parallelism improvements

Can we please provide a faster default for the number of parallel file transfers? I think a default setting of 4 concurrent file transfers would really improve out of the box performance.

  • 4 matches similar concurrency settings for FTP clients.
  • 4 integrates well with modern CPU core count configurations.
  • 4 has low risk of triggering transfer errors in Kubernetes.
  • 4 has low risk of triggering transfer errors in S3.

Also, I wonder if the value of 0 for the concurrency setting should be rejected by skbn. This is because full concurrency of all files at once, tends to trigger transfer cooldown errors. Because file trees with very many files, will basically create a denial of service attack as far as S3 sees.

Improve performance and resilience with tar or rsync-style transfers

Hi,

I noticed that skbn is a bit slow and fragile. The transfers can take hours to complete, and if they are interrupted, then the user is currently required to restart the whole process. Not fun!

Fortunately, this is something that rsync, or even tarballs can improve. Because these tools reduce transfer overhead.

I would like to see skbn behave more like that. Then I can have more confidence that everything is running smoothly in my Kubernetes cluster :)

Release ARM64 and AMD64 macOS binaries

Please release Apple M1 chip ARM64 binaries in addition to Apple Intel chip AMD64 binaries, so that new Mac users can benefit from skbn pre-built binaries.

[bug] Incorrect scheme used for google cloud storage

skbn uses this scheme gcs:// for gcp storage buckets., but the correct scheme is gs://.

gsutil ls gcs://leeroy-backups/
InvalidUrlError: Unrecognized scheme "gcs"
gsutil ls gs://leeroy-backups
echo $status
0

Official docs

I'm assuming gcs was valid at some point?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.