GithubHelp home page GithubHelp logo

fs.googledrivefs's Issues

fails opening a folder

This fails with refresh error. It seems it is only possible to open the root folder as a filesystem.

gd = fs.open_fs(f"googledrive://{creds.gdrive()}/eedata")
gd.listdir("/")

Note that this works:

gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedirs("eedata", recreate=True)
gd.listdir("/")

Also this fails saying directory exists!

gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedir("eedata", recreate=True)
gd.listdir("/")

Fix intermittent failures on test runs

This is only to account for temporary GoogleDrive API failures, not for intermittent bugs in this project itself.

Possibilities

  • Just rerun tests on failure some number of times
  • Handle certain types of errors internally in the code by retrying, if you can distinguish them as being temporary

makedirs fails on root

on google root drive makedir works:

fs.open_fs(f"googledrive://{gdrive()}").makedir("ttt")

but makedirs raises exception:

fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")

KeyError Traceback (most recent call last)
in
----> 1 fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")

~\Anaconda3\Lib\site-packages\fs\base.py in makedirs(self, path, permissions, recreate)
1066 self.check()
1067 with self._lock:
-> 1068 dir_paths = tools.get_intermediate_dirs(self, path)
1069 for dir_path in dir_paths:
1070 try:

~\Anaconda3\Lib\site-packages\fs\tools.py in get_intermediate_dirs(fs, dir_path)
77 for path in recursepath(abspath(dir_path), reverse=True):
78 try:
---> 79 resource = fs.getinfo(path)
80 except ResourceNotFound:
81 intermediates.append(abspath(path))

~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in getinfo(self, path, namespaces)
244 if metadata is None or isinstance(metadata, list):
245 raise ResourceNotFound(path=path)
--> 246 return self._infoFromMetadata(metadata)
247
248 def setinfo(self, path, info): # pylint: disable=redefined-outer-name,too-many-branches,unused-argument

~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in _infoFromMetadata(self, metadata)
214 rawInfo = {
215 "basic": {
--> 216 "name": "" if isRoot else metadata["name"],
217 "is_dir": isFolder
218 },

KeyError: 'name'

Suggestions to make it easier/better

Really useful package thanks! A few suggestions to make better:

  1. Please include install instructions
  2. Can you change the logging so it uses a different logger or logs to debug. Currently it uses root logger at info level so if you do hundreds of calls you get a lot of log messages and no way to silence them.
  3. Many people using google drive are primarily trying to access their own account. Would be great if you could make this simpler as the oauth2 docs are a bit wordy and not very clear. I have written some code for this leveraging pydrive which you could use as a base.
    `
    from pydrive.auth import GoogleAuth
    import logging
    import json
    from configparser import ConfigParser
    from os.path import expanduser
    HOME = expanduser("~")

logging.getLogger("googleapiclient").setLevel(logging.ERROR)
logging.getLogger("oauth2client").setLevel(logging.ERROR)

def gdrive(force=False, credsfile=f"{HOME}/.gdrive.json"):
""" return credentials for google drive

onetime setup::
    pip install fs.googledrivefs
    enable google drive api
    create creds
        credentials/create oauth clientid
        select web application
        authorised javascript origins http://localhost:8080
        authorised redirect urls http://localhost:8080/ [NOTE THE / on the end]
        download client_secrets.json and move to ~/.gdrive.json

Usage to open google drive using pyfilesystem::
    fs1 = fs.open_fs(f"googledrive://{gdrive()}")

:param force: force reauthentication. needed if token expired or cancelled.
:param credsfile: location of client_secrets.json downloaded from google
:return: authorisation string for google drive
"""
# get creds that are needed to obtain tokens
creds = json.load(open(credsfile))
web = creds["web"]
client_id = web["client_id"]
client_secret = web["client_secret"]

# get tokens once per device and save in credsfile
if force or "refresh_token" not in web:
    gauth = GoogleAuth()
    gauth.settings["client_config_file"] = credsfile
    gauth.settings["get_refresh_token"] = True
    gauth.LocalWebserverAuth()
    web["access_token"] = gauth.credentials.access_token
    web["refresh_token"] = gauth.credentials.refresh_token
    json.dump(creds, open(credsfile, "w"))
access_token = web["access_token"]
refresh_token = web["refresh_token"]

return f"?client_id={client_id}&client_secret={client_secret}&" \
       f"access_token={access_token}&refresh_token={refresh_token}"`

Support 3.11

Need 3.11 wheel for protobuf and maybe typed-ast, wrapt, lazy-object-proxy

"New Folder" Subfolder on copy_fs

I'm trying to copy/upload a local folder to my googledrive via fs.googledrivefs with the following code:

from google.oauth2.credentials import Credentials
from fs.googledrivefs import GoogleDriveFS
from fs.osfs import OSFS
from fs.copy import copy_fs
import json
import os

# path on localdrive with one file in it
test_path = ...

# load necessary gdrive information
with open(os.path.join(base_path, 'files', 'gdrive_credentials'), 'r') as f:
    fs_content = f.read()
tokens = json.loads(fs_content)


credentials = Credentials(tokens['access_token'],
                          refresh_token=tokens['refresh_token'],
                          token_uri="https://www.googleapis.com/oauth2/v4/token",
                          client_id=tokens['client_id'],
                          client_secret=tokens['client_secret'])
fs = GoogleDriveFS(credentials=credentials)

# Create new subfolder on gdrive
fs.makedir('test')
info = fs.getinfo('test')
root_id = info.raw['sharing']['id']
#upload file from local drive
copy_fs(OSFS(test_path), GoogleDriveFS(credentials=credentials, rootId=root_id))
fs.close()

After doing so, the 'test' folder has an subfolder named 'New Folder', which is definitely not on the local drive.
grafik
grafik

I'm using Linux Mint 20 with python 3.8
grafik

And one question: Is there an easier way of selecting the destination folder on the googledrive than by searching for the rootId and passing it to GoogleDriveFS?

Speed up tests by speeding up the implementation

  • Would like to use http caching. Can't find confirmation of whether the default http transport has caching and etag support turned on and no documentation on how to customize the transport
  • Another option is to use the batch requests
  • Implement more pyfilesystem functions like copy, copydir, move, movedir, download, upload which can be made more efficient by using what's available within google drive
  • Remove superfluous calls by using all information that comes from calls - i.e. less testing of exists() and then throw away the info

fails copying a file

This fails to work copying file to root of google drive. copy_file does work copying to subfolders:

from fs.copy import copy_file
local = fs.open_fs("")
gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
copy_file(local, "xg.npy", gd, "xg.npy")

[root:INFO]:openbin: xg.npy, wb, -1 (googledrivefs.py:333, time=Dec-10 20:11)
[root:INFO]:Looking up xg.npy (googledrivefs.py:43, time=Dec-10 20:11)
[root:INFO]:Not found in [] (googledrivefs.py:45, time=Dec-10 20:11)
[root:INFO]:Looking up (googledrivefs.py:43, time=Dec-10 20:11)
[root:INFO]:Not found in [] (googledrivefs.py:45, time=Dec-10 20:11)

ResourceNotFound Traceback (most recent call last)
in
4 local = fs.open_fs("")
5 gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
----> 6 copy_file(local, "xg.npy", gd, "xg.npy")

~\Anaconda3\lib\site-packages\fs\copy.py in copy_file(src_fs, src_path, dst_fs, dst_path)
141 else:
142 with _src_fs.openbin(src_path) as read_file:
--> 143 _dst_fs.upload(dst_path, read_file)
144
145

~\Anaconda3\lib\site-packages\fs\base.py in upload(self, path, file, chunk_size, **options)
1320 """
1321 with self._lock:
-> 1322 with self.openbin(path, mode="wb", **options) as dst_file:
1323 tools.copy_file_data(file, dst_file, chunk_size=chunk_size)
1324

~\Anaconda3\lib\site-packages\fs\googledrivefs\googledrivefs.py in openbin(self, path, mode, buffering, **options)
346 # make sure that the parent directory exists if we're writing
347 if parsedMode.writing and parentDirItem is None:
--> 348 raise ResourceNotFound(parentDir)
349 return _UploadOnClose(fs=self, path=path, thisMetadata=item, parentMetadata=parentDirItem, parsedMode=parsedMode)
350

ResourceNotFound: resource '' not found

googledrivefs.makedirs() fails intermittently

We run an automatic process that creates a lot of nested folders on Google Drive.
googledrivefs.makedirs('parentfolder/username/dataset name', recreate=True) fails intermittently if both 'parentfolder/username' and 'parentfolder/username/dataset name' do not exists.

  File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/base.py", line 1076, in makedirs
    self.makedir(path, permissions=permissions)
  File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/googledrivefs/googledrivefs.py", line 338, in makedir
    raise ResourceNotFound(path=path)
fs.errors.ResourceNotFound: resource 'ci/Campaigns/70633 - Redacted UK/Data' not found

It seems Drive API is eventually consistent somewhere and there is a race condition in GoogleDriveFS.

Caching implementation for files

Is there any form of cache implementation for files?

Something similar to fs.remote or CacheFS?

It would really be helpful to cache the files (probably using some algorithms such as LRU) because each call to access a file (especially small files like images and such) would call an API which is really slow.

Shared Drive support

Hi,

For a project using Google Drive as one possible input source for files, we spotted that shared drives are not supported.

I have a local patch that adds support for:

  • file.list
  • file.get
  • a bit more optimized binary reads (iterator / next_chunk(), not logging the contents) in openbin() helper
  • more direct download() which can skip creating additional local tempfile (this code path gets used by eg copy_file())

So that basic functionality for iterating the filesystem and downloading / copying files (to another pyfilesystem) works.

Links:

My question is, would be worth sending this patch as pull-request? Perhaps as-is (read operations, some internals ready for other operations, download optimization), or maybe checking if also the write operations work. For my use-case reading was enough.

The changes operate so that URL opener + GoogleDriveFS init accepts optional drive_id, and it actually works well together with also specifying root_id (so, a directory that exists within that shared drive).

I'm not sure how well the current tests could cover the case of a shared drive.

Note: ended up applying also some ttl_cache decorators to few internal methods for speedups, but that's another story :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.