GithubHelp home page GithubHelp logo

dvc-fs's Introduction

DVC filesystem abstraction layer

Note: This is an old archive of the original repository, before the code was restructured.

โ†’ Please click here to navigate to the current maintained version โ†

DVC filesystem abstraction layer (0.8.3)

PyPI version Build and test Lint code

This package provides high-level API work easy writing/reading/listing files inside the DVC. It can be used for automation systems integrated with data pipelines.

dvc-fs provides basic compatibility (Still work in progress) with PyFilesystem2 API.

๐Ÿ’พ Installation

To install this package please do:

  $ python3 -m pip install "dvc-fs==0.8.3"

Or with Poetry:

  $ poetry install dvc-fs

โ“ Usage

Using via PyFielsystem2:

The dvc-fs package is integrated with PyFilesystem, so you can do:

from fs import open_fs
fs1 = open_fs("dvc://github.com/covid-genomics/data-artifacts") # Clone by https
fs2 = open_fs("dvc://[email protected]/covid-genomics/data-artifacts") # Clone by ssh
fs3 = open_fs("dvc://<PAT>@github.com/covid-genomics/data-artifacts") # Clone by https with personal access token
 # You can also use normal HTTPS and create env variable GIT_TOKEN
 # In that case Personal Access Token will be injected in the clone url

And now the usage is as follows:

from fs import open_fs
with open_fs("dvc://github.com/covid-genomics/data-artifacts") as fs:
    fs.writetext("fs_test/fasta2.txt", "TEST")

Explicitly creating DVCFS:

This method allows you to explicitly create DVCFS class in your applciation:

from dvc_fs.fs import DVCFS
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    for path in fs.walk.files():
        # Print all paths in repo
        print(path)

Basic features

List all files with walk():

from dvc_fs.fs import DVCFS
with DVCFS(
    "https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git"
) as fs:
    for path in fs.walk.files():
        print(path)

Check if the file exists:

from dvc_fs.fs import DVCFS
from os import environ
with DVCFS(
    dvc_repo=f"https://{environ['GIT_TOKEN']}@github.com/covid-genomics/data-artifacts.git"
) as fs:
    for path in ["cirk//BVic HIs_SH2021.xlsx"]:
         print(fs.exists(path=path))

Removing files:

from dvc_fs.fs import DVCFS
with DVCFS(
    "https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/data-artifacts.git"
) as fs:
    fs.writetext("data/to_remove.txt", "TEST STRING ABCDEF 123456")
    fs.remove("data/to_remove.txt")

Writing to the repository from various sources

Read and write contents:

from dvc_fs.fs import DVCFS
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    contents = fs.readtext('data/1.txt')
    print(f"THIS IS CONTENTS: {contents}")
    fs.writetext("test.txt", contents+"!")

You can also directly use DVC high-level api via the Client:

from dvc_fs.client import Client, DVCPathUpload
# Git repo with DVC configured
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    fs.bulk_update([
        # Upload local file ~/local_file_path.txt to DVC repo under path data/1.txt
        DVCPathUpload("data/1.txt", "~/local_file_path.txt"),
    ])

The upload operator supports various types of data inputs that you can feed into it.

Uploading a string as a file:

from dvc_fs import DVCStringUpload, DVCPathUpload
from datetime import datetime
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    fs.bulk_update([
        DVCStringUpload("data/1.txt", f"This will be saved into DVC. Current time: {datetime.now()}"),
    ])

Uploading local file using its path:

from dvc_fs import DVCPathUpload
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    fs.bulk_update([
        DVCPathUpload("data/1.txt", "~/local_file_path.txt"),
    ])

Upload content generated by a python function:

from dvc_fs import DVCCallbackUpload
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    fs.bulk_update([
        DVCCallbackUpload("data/1.txt", lambda: "Test data"),
    ])

We can use download operation similarily to the upload. The syntax is the same:

from dvc_fs import DVCCallbackDownload
# Download DVC file data/1.txt and print it on the screen
with DVCFS("https://<GITHUB_PERSONAL_TOKEN>@github.com/covid-genomics/dvc_repo.git") as fs:
    fs.bulk_update([
        DVCCallbackDownload("data/1.txt", lambda content: print(content)),
    ])

Versioning

To bump project version before release please use the following command (for developers):

    $ poetry run bump2version minor

dvc-fs's People

Contributors

styczynski avatar piotrstyczynski avatar hasanozdem1r avatar gizzio avatar

Forkers

hasanozdem1r

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.