GithubHelp home page GithubHelp logo

alexcoco / gcs-rsync Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cboudereau/gcs-rsync

0.0 0.0 0.0 162 KB

Lightweight Google Cloud Storage sync Rust Client with better performance than gsutil rsync

Home Page: https://docs.rs/gcs-rsync/

License: MIT License

Rust 99.64% Dockerfile 0.36%

gcs-rsync's Introduction

gcs-rsync

build codecov License:MIT docs.rs crates.io crates.io (recent) docker

Lightweight and efficient Rust gcs rsync for Google Cloud Storage.

gcs-rsync is faster than gsutil rsync according to the following benchmarks.

no hard limit to 32K objects or specific conf to compute state.

How to install as crate

Cargo.toml

[dependencies]
gcs-rsync = "0.4"

How to install as cli tool

cargo install --example gcs-rsync gcs-rsync

~/.cargo/bin/gcs-rsync

How to run with docker

Mirror local folder to gcs

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToUpload>:/source:ro superbeeeeeee/gcs-rsync -r -m /source gs://<YourBucket>/<YourFolderToUpload>/

Mirror gcs to folder

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/ /dest

Mirror partial gcs with prefix to folder

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/<YourPrefix> /dest

Include or Exclude files using glob pattern

CLI gcs-rsync

-i (include glob pattern) and -x (exclude glob pattern) multiple times.

An example where any json or toml are included recursively except any test.json or test.toml recursively

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m -i **/*.json -i **/*.toml -x **/test.json -x **/test.toml
 gs://<YourBucket>/YourFolderToUpload>/ /dest

Library

with_includes and with_excludes client builders are used to fill includes and excludes glob patterns.

Benchmark

Important note about gsutil: The gsutil ls command does not list all object items by default but instead list all prefixes while adding the -r flag slowdown gsutil performance. The ls performance command is very different to the rsync implementation.

new files only (first time sync)

  • gcs-rsync: 2.2s/7MB
  • gsutil: 9.93s/47MB

winner: gcs-rsync

gcs-rsync sync bench

rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         2.20
user         0.13
sys          0.21
             7606272  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1915  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 394  messages sent
                1255  messages received
                   0  signals received
                  54  voluntary context switches
                5814  involuntary context switches
           636241324  instructions retired
           989595729  cycles elapsed
             3895296  peak memory footprint

gsutil sync bench

rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
Operation completed over 215 objects/50.3 KiB.
real         9.93
user         8.12
sys          2.35
            47108096  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              196391  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
               36089  messages sent
               87309  messages received
                   5  signals received
               38401  voluntary context switches
               51924  involuntary context switches
            12986389  instructions retired
            12032672  cycles elapsed
              593920  peak memory footprint

no change (second time sync)

  • gcs-rsync: 0.78s/8MB
  • gsutil: 2.18s/47MB

winner: gcs-rsync (due to size and mtime check before crc32c like gsutil does)

gcs-rsync sync bench

cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         1.79
user         0.13
sys          0.12
             7864320  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1980  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 397  messages sent
                1247  messages received
                   0  signals received
                  42  voluntary context switches
                4948  involuntary context switches
           435013936  instructions retired
           704782682  cycles elapsed
             4141056  peak memory footprint

gsutil sync bench

/usr/bin/time -lp --  gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/
real         2.18
user         1.37
sys          0.66
            46899200  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              100108  page reclaims
                1732  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                6311  messages sent
               12752  messages received
                   4  signals received
                6145  voluntary context switches
               14219  involuntary context switches
            13133297  instructions retired
            13313536  cycles elapsed
              602112  peak memory footprint

gsutil rsync config

gsutil -m -q rsync -r -d ./your-dir gs://your-bucket
/usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

About authentication

All default functions related to authentication use GOOGLE_APPLICATION_CREDENTIALS env var as default conf like official Google libraries do on other languages (golang, dotnet)

Other functions (from and from_file) provide the custom integration mode.

For more info about OAuth2, see the related README in the oauth2 mod.

How to run tests

Unit tests

cargo test --lib

Integration tests + Unit tests

TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast

Examples

Upload object

cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"

Download object

cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

Download public object

cargo run --release --example download_public_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

Delete object

cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"

List objects

cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"

List objects with default service account

GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"

List objects

list a bucket having more than 60K objects

time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l

Profiling

Humans are terrible at guessing-about-performance

export CARGO_PROFILE_RELEASE_DEBUG=true
sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

Native bin build (static shared lib)

docker rust rust:alpine3.14
apk add --no-cache musl-dev pkgconfig openssl-dev

LDFLAGS="-static -L/usr/local/musl/lib" LD_LIBRARY_PATH=/usr/local/musl/lib:$LD_LIBRARY_PATH CFLAGS="-I/usr/local/musl/include" PKG_CONFIG_PATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x86_64-unknown-linux-musl --example bucket_to_folder_sync

gcs-rsync's People

Contributors

cboudereau avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.