GithubHelp home page GithubHelp logo

fs.googledrivefs's Introduction

fs.googledrivefs

codecov PyPI version

Implementation of pyfilesystem2 file system for Google Drive

Installation

  pip install fs.googledrivefs

Usage

  from google.oauth2.credentials import Credentials
  from fs.googledrivefs import GoogleDriveFS

  credentials = Credentials(oauth2_access_token,
    refresh_token=oauth2_refresh_token,
    token_uri="https://www.googleapis.com/oauth2/v4/token",
    client_id=oauth2_client_id,
    client_secret=oauth2_client_secret)

  fs = GoogleDriveFS(credentials=credentials)

  # fs is now a standard pyfilesystem2 file system, alternatively you can use the opener...

  from fs.opener import open_fs

  fs2 = open_fs("googledrive:///?access_token=<oauth2 access token>&refresh_token=<oauth2 refresh token>&client_id=<oauth2 client id>&client_secret=<oauth2 client secret>")

  # fs2 is now a standard pyfilesystem2 file system

Default Google Authentication

If your application is accessing the Google Drive API as a GCP Service Account, fs.googledrivefs will default to authenticating using the Service Account credentials specified by the GOOGLE_APPLICATION_CREDENTIALS environment variable. This can greatly simplify the URLs used by the opener:

  from fs.opener import open_fs

  fs2 = open_fs("googledrive:///required/path")

You can also use the same method of authentication when using GoogleDriveFS directly:

  import google.auth
  from fs.googledrivefs import GoogleDriveFS

  credentials, _ = google.auth.default()
  fs = GoogleDriveFS(credentials=credentials)

Using fs.googledrivefs with an organisation's Google Account

While access to the Google Drive API is straightforward to enable for a personal Google Account, a user of an organisation's Google Account will typically only be able to enable an API in the context of a GCP Project. The user can then configure a Service Account to access all or a sub-set of the user's files using fs.googledrivefs with the following steps:

  • create a GCP Project
  • enable the Google Drive API for that Project
  • create a Service Account for that Project
  • share any Drive directory (or file) with that Service Account (using the accounts email)

Notes on forming fs urls for GCP Service Accounts

Say that your is drive is structured as follows:

/alldata
  /data1
  /data2
   :

Also say that you have given your application's service account access to everything in data1. If your application opens url /alldata/data1 using fs.opener.open_fs(), then fs.googledrivefs must first get the info for alldata to which it has no access and so the operation fails.

To address this we can tell fs.googledrivefs to treat data1 as the root directory by supplying the file id of data1 as the request parameter root_id. The fs url you would now use is googledrive:///?root_id=12345678901234567890:

  from fs.opener import open_fs

  fs2 = open_fs("googledrive:///?root_id=12345678901234567890")

You can also use the rootId when using GoogleDriveFS directly:

  import google.auth
  from fs.googledrivefs import GoogleDriveFS

  credentials, _ = google.auth.default()
  fs = GoogleDriveFS(credentials=credentials, rootId="12345678901234567890")

Note that any file or directory's id is readily accessible from it's web url.

Development

To run the tests, set the following environment variables:

  • GOOGLEDRIVEFS_TEST_CLIENT_ID - your client id (see Google Developer Console)
  • GOOGLEDRIVEFS_TEST_CLIENT_SECRET - your client secret (see Google Developer Console)
  • GOOGLEDRIVEFS_TEST_CREDENTIALS_PATH - path to a json file which will contain the credentials

Then generate the credentials json file by running

  python tests/generate-credentials.py

Then run the tests by executing

  pytest

in the root directory (note that if GOOGLEDRIVEFS_TEST_CREDENTIALS_PATH isn't set then the test suite will try to use the default Google credentials). The tests may take an hour or two to complete. They create and destroy many, many files and directories mostly under the /test-googledrivefs directory in the user's Google Drive and a few in the root directory

Note that, if your tests are run using a service account, you can set the root id using GOOGLEDRIVEFS_TEST_ROOT_ID.

fs.googledrivefs's People

Contributors

jvtm avatar msb avatar rkhwaja avatar zmej-serow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fs.googledrivefs's Issues

Suggestions to make it easier/better

Really useful package thanks! A few suggestions to make better:

  1. Please include install instructions
  2. Can you change the logging so it uses a different logger or logs to debug. Currently it uses root logger at info level so if you do hundreds of calls you get a lot of log messages and no way to silence them.
  3. Many people using google drive are primarily trying to access their own account. Would be great if you could make this simpler as the oauth2 docs are a bit wordy and not very clear. I have written some code for this leveraging pydrive which you could use as a base.
    `
    from pydrive.auth import GoogleAuth
    import logging
    import json
    from configparser import ConfigParser
    from os.path import expanduser
    HOME = expanduser("~")

logging.getLogger("googleapiclient").setLevel(logging.ERROR)
logging.getLogger("oauth2client").setLevel(logging.ERROR)

def gdrive(force=False, credsfile=f"{HOME}/.gdrive.json"):
""" return credentials for google drive

onetime setup::
    pip install fs.googledrivefs
    enable google drive api
    create creds
        credentials/create oauth clientid
        select web application
        authorised javascript origins http://localhost:8080
        authorised redirect urls http://localhost:8080/ [NOTE THE / on the end]
        download client_secrets.json and move to ~/.gdrive.json

Usage to open google drive using pyfilesystem::
    fs1 = fs.open_fs(f"googledrive://{gdrive()}")

:param force: force reauthentication. needed if token expired or cancelled.
:param credsfile: location of client_secrets.json downloaded from google
:return: authorisation string for google drive
"""
# get creds that are needed to obtain tokens
creds = json.load(open(credsfile))
web = creds["web"]
client_id = web["client_id"]
client_secret = web["client_secret"]

# get tokens once per device and save in credsfile
if force or "refresh_token" not in web:
    gauth = GoogleAuth()
    gauth.settings["client_config_file"] = credsfile
    gauth.settings["get_refresh_token"] = True
    gauth.LocalWebserverAuth()
    web["access_token"] = gauth.credentials.access_token
    web["refresh_token"] = gauth.credentials.refresh_token
    json.dump(creds, open(credsfile, "w"))
access_token = web["access_token"]
refresh_token = web["refresh_token"]

return f"?client_id={client_id}&client_secret={client_secret}&" \
       f"access_token={access_token}&refresh_token={refresh_token}"`

Speed up tests by speeding up the implementation

  • Would like to use http caching. Can't find confirmation of whether the default http transport has caching and etag support turned on and no documentation on how to customize the transport
  • Another option is to use the batch requests
  • Implement more pyfilesystem functions like copy, copydir, move, movedir, download, upload which can be made more efficient by using what's available within google drive
  • Remove superfluous calls by using all information that comes from calls - i.e. less testing of exists() and then throw away the info

Support 3.11

Need 3.11 wheel for protobuf and maybe typed-ast, wrapt, lazy-object-proxy

Caching implementation for files

Is there any form of cache implementation for files?

Something similar to fs.remote or CacheFS?

It would really be helpful to cache the files (probably using some algorithms such as LRU) because each call to access a file (especially small files like images and such) would call an API which is really slow.

Shared Drive support

Hi,

For a project using Google Drive as one possible input source for files, we spotted that shared drives are not supported.

I have a local patch that adds support for:

  • file.list
  • file.get
  • a bit more optimized binary reads (iterator / next_chunk(), not logging the contents) in openbin() helper
  • more direct download() which can skip creating additional local tempfile (this code path gets used by eg copy_file())

So that basic functionality for iterating the filesystem and downloading / copying files (to another pyfilesystem) works.

Links:

My question is, would be worth sending this patch as pull-request? Perhaps as-is (read operations, some internals ready for other operations, download optimization), or maybe checking if also the write operations work. For my use-case reading was enough.

The changes operate so that URL opener + GoogleDriveFS init accepts optional drive_id, and it actually works well together with also specifying root_id (so, a directory that exists within that shared drive).

I'm not sure how well the current tests could cover the case of a shared drive.

Note: ended up applying also some ttl_cache decorators to few internal methods for speedups, but that's another story :)

makedirs fails on root

on google root drive makedir works:

fs.open_fs(f"googledrive://{gdrive()}").makedir("ttt")

but makedirs raises exception:

fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")

KeyError Traceback (most recent call last)
in
----> 1 fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")

~\Anaconda3\Lib\site-packages\fs\base.py in makedirs(self, path, permissions, recreate)
1066 self.check()
1067 with self._lock:
-> 1068 dir_paths = tools.get_intermediate_dirs(self, path)
1069 for dir_path in dir_paths:
1070 try:

~\Anaconda3\Lib\site-packages\fs\tools.py in get_intermediate_dirs(fs, dir_path)
77 for path in recursepath(abspath(dir_path), reverse=True):
78 try:
---> 79 resource = fs.getinfo(path)
80 except ResourceNotFound:
81 intermediates.append(abspath(path))

~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in getinfo(self, path, namespaces)
244 if metadata is None or isinstance(metadata, list):
245 raise ResourceNotFound(path=path)
--> 246 return self._infoFromMetadata(metadata)
247
248 def setinfo(self, path, info): # pylint: disable=redefined-outer-name,too-many-branches,unused-argument

~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in _infoFromMetadata(self, metadata)
214 rawInfo = {
215 "basic": {
--> 216 "name": "" if isRoot else metadata["name"],
217 "is_dir": isFolder
218 },

KeyError: 'name'

Fix intermittent failures on test runs

This is only to account for temporary GoogleDrive API failures, not for intermittent bugs in this project itself.

Possibilities

  • Just rerun tests on failure some number of times
  • Handle certain types of errors internally in the code by retrying, if you can distinguish them as being temporary

fails opening a folder

This fails with refresh error. It seems it is only possible to open the root folder as a filesystem.

gd = fs.open_fs(f"googledrive://{creds.gdrive()}/eedata")
gd.listdir("/")

Note that this works:

gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedirs("eedata", recreate=True)
gd.listdir("/")

Also this fails saying directory exists!

gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedir("eedata", recreate=True)
gd.listdir("/")

fails copying a file

This fails to work copying file to root of google drive. copy_file does work copying to subfolders:

from fs.copy import copy_file
local = fs.open_fs("")
gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
copy_file(local, "xg.npy", gd, "xg.npy")

[root:INFO]:openbin: xg.npy, wb, -1 (googledrivefs.py:333, time=Dec-10 20:11)
[root:INFO]:Looking up xg.npy (googledrivefs.py:43, time=Dec-10 20:11)
[root:INFO]:Not found in [] (googledrivefs.py:45, time=Dec-10 20:11)
[root:INFO]:Looking up (googledrivefs.py:43, time=Dec-10 20:11)
[root:INFO]:Not found in [] (googledrivefs.py:45, time=Dec-10 20:11)

ResourceNotFound Traceback (most recent call last)
in
4 local = fs.open_fs("")
5 gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
----> 6 copy_file(local, "xg.npy", gd, "xg.npy")

~\Anaconda3\lib\site-packages\fs\copy.py in copy_file(src_fs, src_path, dst_fs, dst_path)
141 else:
142 with _src_fs.openbin(src_path) as read_file:
--> 143 _dst_fs.upload(dst_path, read_file)
144
145

~\Anaconda3\lib\site-packages\fs\base.py in upload(self, path, file, chunk_size, **options)
1320 """
1321 with self._lock:
-> 1322 with self.openbin(path, mode="wb", **options) as dst_file:
1323 tools.copy_file_data(file, dst_file, chunk_size=chunk_size)
1324

~\Anaconda3\lib\site-packages\fs\googledrivefs\googledrivefs.py in openbin(self, path, mode, buffering, **options)
346 # make sure that the parent directory exists if we're writing
347 if parsedMode.writing and parentDirItem is None:
--> 348 raise ResourceNotFound(parentDir)
349 return _UploadOnClose(fs=self, path=path, thisMetadata=item, parentMetadata=parentDirItem, parsedMode=parsedMode)
350

ResourceNotFound: resource '' not found

googledrivefs.makedirs() fails intermittently

We run an automatic process that creates a lot of nested folders on Google Drive.
googledrivefs.makedirs('parentfolder/username/dataset name', recreate=True) fails intermittently if both 'parentfolder/username' and 'parentfolder/username/dataset name' do not exists.

  File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/base.py", line 1076, in makedirs
    self.makedir(path, permissions=permissions)
  File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/googledrivefs/googledrivefs.py", line 338, in makedir
    raise ResourceNotFound(path=path)
fs.errors.ResourceNotFound: resource 'ci/Campaigns/70633 - Redacted UK/Data' not found

It seems Drive API is eventually consistent somewhere and there is a race condition in GoogleDriveFS.

"New Folder" Subfolder on copy_fs

I'm trying to copy/upload a local folder to my googledrive via fs.googledrivefs with the following code:

from google.oauth2.credentials import Credentials
from fs.googledrivefs import GoogleDriveFS
from fs.osfs import OSFS
from fs.copy import copy_fs
import json
import os

# path on localdrive with one file in it
test_path = ...

# load necessary gdrive information
with open(os.path.join(base_path, 'files', 'gdrive_credentials'), 'r') as f:
    fs_content = f.read()
tokens = json.loads(fs_content)


credentials = Credentials(tokens['access_token'],
                          refresh_token=tokens['refresh_token'],
                          token_uri="https://www.googleapis.com/oauth2/v4/token",
                          client_id=tokens['client_id'],
                          client_secret=tokens['client_secret'])
fs = GoogleDriveFS(credentials=credentials)

# Create new subfolder on gdrive
fs.makedir('test')
info = fs.getinfo('test')
root_id = info.raw['sharing']['id']
#upload file from local drive
copy_fs(OSFS(test_path), GoogleDriveFS(credentials=credentials, rootId=root_id))
fs.close()

After doing so, the 'test' folder has an subfolder named 'New Folder', which is definitely not on the local drive.
grafik
grafik

I'm using Linux Mint 20 with python 3.8
grafik

And one question: Is there an easier way of selecting the destination folder on the googledrive than by searching for the rootId and passing it to GoogleDriveFS?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.