rkhwaja / fs.googledrivefs Goto Github PK
View Code? Open in Web Editor NEWImplementation of a pyfilesystem2 filesystem for Google Drive
License: MIT License
Implementation of a pyfilesystem2 filesystem for Google Drive
License: MIT License
Use the **options argument of openbin/upload
This fails with refresh error. It seems it is only possible to open the root folder as a filesystem.
gd = fs.open_fs(f"googledrive://{creds.gdrive()}/eedata")
gd.listdir("/")
Note that this works:
gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedirs("eedata", recreate=True)
gd.listdir("/")
Also this fails saying directory exists!
gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
gd = gd.makedir("eedata", recreate=True)
gd.listdir("/")
This is only to account for temporary GoogleDrive API failures, not for intermittent bugs in this project itself.
Possibilities
on google root drive makedir works:
fs.open_fs(f"googledrive://{gdrive()}").makedir("ttt")
but makedirs raises exception:
fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")
KeyError Traceback (most recent call last)
in
----> 1 fs.open_fs(f"googledrive://{gdrive()}").makedirs("qqq")
~\Anaconda3\Lib\site-packages\fs\base.py in makedirs(self, path, permissions, recreate)
1066 self.check()
1067 with self._lock:
-> 1068 dir_paths = tools.get_intermediate_dirs(self, path)
1069 for dir_path in dir_paths:
1070 try:
~\Anaconda3\Lib\site-packages\fs\tools.py in get_intermediate_dirs(fs, dir_path)
77 for path in recursepath(abspath(dir_path), reverse=True):
78 try:
---> 79 resource = fs.getinfo(path)
80 except ResourceNotFound:
81 intermediates.append(abspath(path))
~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in getinfo(self, path, namespaces)
244 if metadata is None or isinstance(metadata, list):
245 raise ResourceNotFound(path=path)
--> 246 return self._infoFromMetadata(metadata)
247
248 def setinfo(self, path, info): # pylint: disable=redefined-outer-name,too-many-branches,unused-argument
~\Anaconda3\Lib\site-packages\fs\googledrivefs\googledrivefs.py in _infoFromMetadata(self, metadata)
214 rawInfo = {
215 "basic": {
--> 216 "name": "" if isRoot else metadata["name"],
217 "is_dir": isFolder
218 },
KeyError: 'name'
The doc at https://developers.google.com/drive/api/v3/reference/files now says that it's writable.
I think that it was not writable in the v2 API (https://developers.google.com/drive/api/v2/reference/files)
At the moment it just does enough to pass the base pyfilesystem2 tests
https://developers.google.com/drive/api/v3/reference/files/update
Would like to update createdTime and modifiedTime.
Can update modifiedTime using setinfo, but createdTime can only be set when the item is created
Key error: "size" in _infoFromMetadata
Probably the same error on google spreadsheets etc too.
oauth2client has been deprecated
Maybe allows more efficient mirror implementation
https://developers.google.com/drive/api/v3/reference/files#resource
Follow https://medium.com/@cjolowicz/hypermodern-python-5-documentation-13219991028c
Cover the additional, non-inherited methods on the filesystem objects
Looks like this never worked
Really useful package thanks! A few suggestions to make better:
logging.getLogger("googleapiclient").setLevel(logging.ERROR)
logging.getLogger("oauth2client").setLevel(logging.ERROR)
def gdrive(force=False, credsfile=f"{HOME}/.gdrive.json"):
""" return credentials for google drive
onetime setup::
pip install fs.googledrivefs
enable google drive api
create creds
credentials/create oauth clientid
select web application
authorised javascript origins http://localhost:8080
authorised redirect urls http://localhost:8080/ [NOTE THE / on the end]
download client_secrets.json and move to ~/.gdrive.json
Usage to open google drive using pyfilesystem::
fs1 = fs.open_fs(f"googledrive://{gdrive()}")
:param force: force reauthentication. needed if token expired or cancelled.
:param credsfile: location of client_secrets.json downloaded from google
:return: authorisation string for google drive
"""
# get creds that are needed to obtain tokens
creds = json.load(open(credsfile))
web = creds["web"]
client_id = web["client_id"]
client_secret = web["client_secret"]
# get tokens once per device and save in credsfile
if force or "refresh_token" not in web:
gauth = GoogleAuth()
gauth.settings["client_config_file"] = credsfile
gauth.settings["get_refresh_token"] = True
gauth.LocalWebserverAuth()
web["access_token"] = gauth.credentials.access_token
web["refresh_token"] = gauth.credentials.refresh_token
json.dump(creds, open(credsfile, "w"))
access_token = web["access_token"]
refresh_token = web["refresh_token"]
return f"?client_id={client_id}&client_secret={client_secret}&" \
f"access_token={access_token}&refresh_token={refresh_token}"`
Should be able to use requests.
They won't remove the dependency - would have to replace with Cloud Client Libraries for Python
Attempts to download Google native files (E.g., docs, sheets, slides) produce an error when using get_media():
Only files with binary content can be downloaded. Use Export with Google Docs files.
a potential solution could be to get the file's metadata type to determine the method of export, then use export_media() -- though this has a limit of files up to 10MB.
here's some more info from a stackoverflow article
thanks!
https://developers.google.com/drive/api/v2/multi-parenting
Deadline is 2020-09-30
Need 3.11 wheel for protobuf and maybe typed-ast, wrapt, lazy-object-proxy
I'm trying to copy/upload a local folder to my googledrive via fs.googledrivefs with the following code:
from google.oauth2.credentials import Credentials
from fs.googledrivefs import GoogleDriveFS
from fs.osfs import OSFS
from fs.copy import copy_fs
import json
import os
# path on localdrive with one file in it
test_path = ...
# load necessary gdrive information
with open(os.path.join(base_path, 'files', 'gdrive_credentials'), 'r') as f:
fs_content = f.read()
tokens = json.loads(fs_content)
credentials = Credentials(tokens['access_token'],
refresh_token=tokens['refresh_token'],
token_uri="https://www.googleapis.com/oauth2/v4/token",
client_id=tokens['client_id'],
client_secret=tokens['client_secret'])
fs = GoogleDriveFS(credentials=credentials)
# Create new subfolder on gdrive
fs.makedir('test')
info = fs.getinfo('test')
root_id = info.raw['sharing']['id']
#upload file from local drive
copy_fs(OSFS(test_path), GoogleDriveFS(credentials=credentials, rootId=root_id))
fs.close()
After doing so, the 'test' folder has an subfolder named 'New Folder', which is definitely not on the local drive.
I'm using Linux Mint 20 with python 3.8
And one question: Is there an easier way of selecting the destination folder on the googledrive than by searching for the rootId and passing it to GoogleDriveFS?
Maybe wait for a resolution on PyFilesystem/pyfilesystem2#301
This fails to work copying file to root of google drive. copy_file does work copying to subfolders:
from fs.copy import copy_file
local = fs.open_fs("")
gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
copy_file(local, "xg.npy", gd, "xg.npy")
ResourceNotFound Traceback (most recent call last)
in
4 local = fs.open_fs("")
5 gd = fs.open_fs(f"googledrive://{creds.gdrive()}")
----> 6 copy_file(local, "xg.npy", gd, "xg.npy")
~\Anaconda3\lib\site-packages\fs\copy.py in copy_file(src_fs, src_path, dst_fs, dst_path)
141 else:
142 with _src_fs.openbin(src_path) as read_file:
--> 143 _dst_fs.upload(dst_path, read_file)
144
145
~\Anaconda3\lib\site-packages\fs\base.py in upload(self, path, file, chunk_size, **options)
1320 """
1321 with self._lock:
-> 1322 with self.openbin(path, mode="wb", **options) as dst_file:
1323 tools.copy_file_data(file, dst_file, chunk_size=chunk_size)
1324
~\Anaconda3\lib\site-packages\fs\googledrivefs\googledrivefs.py in openbin(self, path, mode, buffering, **options)
346 # make sure that the parent directory exists if we're writing
347 if parsedMode.writing and parentDirItem is None:
--> 348 raise ResourceNotFound(parentDir)
349 return _UploadOnClose(fs=self, path=path, thisMetadata=item, parentMetadata=parentDirItem, parsedMode=parsedMode)
350
ResourceNotFound: resource '' not found
We run an automatic process that creates a lot of nested folders on Google Drive.
googledrivefs.makedirs('parentfolder/username/dataset name', recreate=True) fails intermittently if both 'parentfolder/username' and 'parentfolder/username/dataset name' do not exists.
File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/base.py", line 1076, in makedirs
self.makedir(path, permissions=permissions)
File "/home/runner/work/fides/fides/.direnv/python-3.8.6/lib/python3.8/site-packages/fs/googledrivefs/googledrivefs.py", line 338, in makedir
raise ResourceNotFound(path=path)
fs.errors.ResourceNotFound: resource 'ci/Campaigns/70633 - Redacted UK/Data' not found
It seems Drive API is eventually consistent somewhere and there is a race condition in GoogleDriveFS.
Is there any form of cache implementation for files?
Something similar to fs.remote or CacheFS?
It would really be helpful to cache the files (probably using some algorithms such as LRU) because each call to access a file (especially small files like images and such) would call an API which is really slow.
Hi,
For a project using Google Drive as one possible input source for files, we spotted that shared drives are not supported.
I have a local patch that adds support for:
file.list
file.get
next_chunk()
, not logging the contents) in openbin()
helperdownload()
which can skip creating additional local tempfile (this code path gets used by eg copy_file()
)So that basic functionality for iterating the filesystem and downloading / copying files (to another pyfilesystem) works.
Links:
My question is, would be worth sending this patch as pull-request? Perhaps as-is (read operations, some internals ready for other operations, download optimization), or maybe checking if also the write operations work. For my use-case reading was enough.
The changes operate so that URL opener + GoogleDriveFS init accepts optional drive_id
, and it actually works well together with also specifying root_id
(so, a directory that exists within that shared drive).
I'm not sure how well the current tests could cover the case of a shared drive.
Note: ended up applying also some ttl_cache
decorators to few internal methods for speedups, but that's another story :)
Options:
https://developers.google.com/drive/api/v3/reference/files/list
Filter on:
Return an iterator which returns paths and hides the paging stuff
Return the incompleteSearch flag? Throw an error? Optionally throw an error?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.