GithubHelp home page GithubHelp logo

googledrive's Introduction

googledrive

CRAN status R-CMD-check Codecov test coverage

Overview

googledrive allows you to interact with files on Google Drive from R.

Installation

Install the CRAN version:

install.packages("googledrive")

Or install the development version from GitHub:

# install.packages("pak")
pak::pak("tidyverse/googledrive")

Usage

Please see the package website: https://googledrive.tidyverse.org

Here’s a teaser that uses googledrive to view some of the files you see on https://drive.google.com (up to n_max = 25, in this case):

library("googledrive")
drive_find(n_max = 25)
#> # A dribble: 25 × 3
#>    name                       id                                drive_resource
#>    <chr>                      <drv_id>                          <list>        
#>  1 2021-09-16_r_logo.jpg      1dandXB0QZpjeGQq_56wTXKNwaqgsOa9D <named list>  
#>  2 2021-09-16_r_about.html    1XfCI_orH4oNUZh06C4w6vXtno-BT_zmZ <named list>  
#>  3 2021-09-16_imdb_latin1.csv 163YPvqYmGuqQiEwEFLg2s1URq4EnpkBw <named list>  
#>  4 2021-09-16_chicken.txt     1axJz8GSmecSnaYBx0Sb3Gb-SXVaTzKw7 <named list>  
#>  5 2021-09-16_chicken.pdf     14Hd6_VQAeEgcwBBJamc-FUlnXhp117T2 <named list>  
#>  6 2021-09-16_chicken.jpg     1aslW1T-B8UKzAEotDWpmRFaMyMux5-it <named list>  
#>  7 2021-09-16_chicken.csv     1Mj--zJYZJSMKsNVjk2tYFef5LnCsNoDT <named list>  
#>  8 pqr                        143iq-CswFTwJTjVfKkcFMDW0jYqDeUj2 <named list>  
#>  9 mno                        1gcUTnFbsF6uioJrLCsVQ78_F1wEzyNtI <named list>  
#> 10 jkl                        17T40phn99w0hY-B_Ev0deTvVg9fmUSnt <named list>  
#> # ℹ 15 more rows

Contributing

If you’d like to contribute to the development of googledrive, please read these guidelines.

Please note that the googledrive project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Privacy

Privacy policy

googledrive's People

Contributors

batpigandme avatar falnesio avatar hadley avatar ianmcook avatar jcheng5 avatar jennybc avatar jimhester avatar lucymcgowan avatar michaelchirico avatar smingerson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

googledrive's Issues

Abbreviated file ids

The full ids are very long. I wonder if there is any official support or convention for using, e.g. the first 7 characters, as we do with SHA-1's in git.

I suppose even in the absence of that, we could make things feel that way in some places in these 2 packages.

API key

We should send an API key with every request, because there are fruitful things that can be done w/o auth, such as accessing world-readable files.

@craigcitro is going to tell us about a specific header to send that clarifies whether the API key or auth token should be consulted re: quota, in the case where both are present in the request.

drive_mkdir() interface

This feels like it needs some work. Should take inspiration from dir.create() and mkdir. For example, there shouldn't be separate arguments dir and path. I think that should be just path.

pkgdown

This is not urgent, but it's another thing that is easier to start sooner than later. I found it helpful to poke around the _pkgdown.yml of other tidyverse packages that have gone before you. The pkgdown site for pkgdown itself 🤓 also has good basic info.

pkgdown

  • Add template _pkgdown.yml
template:
  package: tidytemplate
  default_assets: false
  • Add to .Rbuildignore

  • build_site() then push.

  • In admin use docs/ for documentation and turn off wiki

  • Update url on github repo page.

  • Add url to DESCRIPTION.

drive_mkdir doesn't allow for multiple folders with the same name in the same path

as mentioned in #24
I think there may be a small bug in the get_leaf function - when I run:

drive_mkdir("baz", path = "test_8675309/foo/yo")

I see

## Error: The path 'test_8675309/foo/yo/' identifies more than one file:
## File of type folder, with id 0B0Gh-SuuA2nTOEVQX1BteW1Fa3M.
## File of type folder, with id 0B0Gh-SuuA2nTN3VMYWdOZlhrM0k.

but on drive I have:
test_8675309
+-- foo
| +-- yo (id 0B0Gh-SuuA2nTOEVQX1BteW1Fa3M)
+-- yo (id 0B0Gh-SuuA2nTN3VMYWdOZlhrM0k)

so in fact test_8675309/foo/yo should be unique (since the other one that it's fussing about is actually test_8675309/yo)

Also, when checking through the path, match here doesn't allow for nested folders with the same name. For example if we had "foo/foo", which is perfectly logical, match only matches the first, so we would have depth 1 for both, where they should each be listed as both depth 1 and 2.

MIME type helpers and checkers

  • condense into a lookup function
  • should be used drive_upload() and drive_list()
  • function where user gives input (file) and it will tell you what mime type(s) it can be uploaded as
  • sketch:
drive_mimetype(name = , type = , mime_type = )

name would be the name of a file to check, type would human readable type, mime_type would be the full mime type. You'd just fill in one.

other notes:

  • drive_upload() shouldn't default to folder
  • "Google Doc" "Google Sheet" "Google Slide"
  • in drive.mime_type
    • create column "type" variable for the ones without extensions (generally the Google types) for easier lookup
    • create column "group" (maybe later)

Helpers for working with q in drive_find()

We need some internal support for q handling. I'll come back to flesh this out. Brain dump and notes from #7.

Forming q. At its simplest, people have to provide q verbatim. But it might be nice to accept a vector of q's or multiple q's in ..., which drive_list() would then concatenate with "and". It feels like there's also a need for an internal "append this to q" helper.

How do we -- or do we -- reconcile defaults (such as “not being in trash”) with user’s supplied q and/or q clauses we build up? Re: user … should they be forced to declare whether their q should trump everything or be added to the defaults?

Do we attempt to pull user-supplied q apart into clauses? Have some clause-checking machinery, then “and” it all back together?

drive_upload overwrite should check if the name is unique

I think this can be solved by changing

   old_doc <- drive_list(path = path,
                          pattern = paste0("^", name, "$"),
                          verbose = FALSE)
    if (!is.null(old_doc)) {
      id <- old_doc$id[1]
    }
  }

to

    old_doc <- drive_list(path = output
                          verbose = FALSE)
    if (!is.null(old_doc)) {
      id <- old_doc$id

    }
  }

Struggling with drive_mv() function

Hi,

I'm unsure whether here is the proper place to ask questions, probably it is not a bug. I am struggling with moving a googlesheet file using drive_mv(). The task is pretty straightforward with the following detail.

file_to_move = drive_file("1TrAW3z8OglPzMPrdhhIBrv85OVZR6wdW5wDWuxbBm5g")
destination_folder = drive_file("0B4md9-n1qmKTflE4ZHo5aXBLSmpyR0JuREg3OEFXa0V0SGozMTU0RXFQdTdWaXJFVDVGemc")

Both statements work perfectly fine and they could be opened successfully via drive_browse i.e.
drive_browse(file_to_move)
drive_browse(destination_folder)

Unfortunately when it comes to the final action which is

drive_mv(file_to_move, folder=destination_folder)

, it gives me this error message: "Error in regmatches(path, m)[[1]] : subscript out of bounds"

I have been searching for the solution but still have no luck.

Best regards,
Prakasit

Scopes

We need to work through scenarios when googledrive kicks off auth but then a googlesheets request is made. Right now it would trigger a second auth process because httr will recognize it as distinct. Or at lease it should once #3 is handled. The token obtained by googledrive won't have the necessary scopes to do Google Sheets work. I'm not sure yet what to do about all this. I think it connects to our eventual use of gauth.

Fields handling

Moved out of #7.

We need to rationalize fields and the columns that appear in the result. Do we equate these two? This feels connected with writing a general handler for an instance of file resource (Drive-speak).

README.Rmd

Also not urgent, but good to initiate once the UI starts to settle down. Below is boilerplate content, so use your judgment.

README.Rmd

  • Switch from .md to .Rmd

  • Install instructions:

```{r, eval = FALSE}
# The easiest way to get readr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just readr:
 install.packages("readr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/readr")
```
  • Overview: 1-2 paragraphs including brief description/goals. Add h2, and set:
home:
  strip_header: true
  • Link to appropriate R4DS chapter:
If you are new to readr, the best place to start is the
[data import chapter](http://r4ds.had.co.nz/data-import.html) in R
for data science.
  • Add link to _pkgdown.yaml
  links:
  - text: Learn more
    href: http://r4ds.had.co.nz/data-import.html
  • Usage section shows a few examples.

Uniform handling of messages

We should work out some policies about messages, warning, and errors. Re: the form of the message and the mechanics of how we'll make them. There's lots of diversity right now.

NULL parents and "Shared with me" files

The parents field is frequently NULL for files that are "Shared with me". This makes such files impossible to access via drive_path(), because these paths are technically not rooted.

Current workaround: pull all "Shared with me" files and isolate the one you want with pattern.

drive_list(q = "sharedWithMe = true", pattern = "A reactive build system")

Path handling toolkit

I have a second generation solution for resolving a requested path, such as foo/bar/baz.

I'd like to look ahead a bit and package it more generally, i.e. for use beyond drive_ls().

@LucyMcGowan Will you help me list the ways other functions will need to work with paths?

drive_list(): User can supply path. We need to make sure such a path exists and is uniquely defined. Then determine file id for the terminal leaf and list it.

Functions that create or move files: We need to determine the maximal existing stem, in case some folders need to be created. Example: user asks to write a file to foo/bar/baz/yo/myfile.txt. We determine that foo/bar exists and that we need to create baz within foo/bar and yo within foo/bar/baz.

What else do we need?

Bake in the base URL for Drive API

I just changed where the base URL for Drive API is stored and it caused changes in most of the .R files. That doesn't seem desirable.

I don't think this package makes any calls to URLs that are not under the main Drive base URL. If I've got that right, then the base URL should get baked into the function that builds requests.

A slight variant of this is the files URL. That should probably NOT be stored anywhere, but, instead, get built into requests by specifying path = "files" (in the API sense, not the drive_list() sense!)

Added later: I think, in fact the api_url, is ready and waiting to be used here. I see it in every built request.

Re-wrangle stored info re: Drive API

From #20

You might want to consider a two-layer request builder, like here:
https://github.com/tidyverse/googlesheets2/blob/8caf705a92b7c0638783ce478eb0449a3a92c1de/R/gs_build_request.R#L47

The inner layer, gs_build_request(), is very dumb and mechanical. But would work if someone really knew what they were doing and only wanted help setting base URL, path param expansion, and managing auth.

The outer layer, gs_generate_request(), is smarter and what I plan to use! It needs to be given a known endpoint and then it looks up everything we know, checks that input looks good, and massages it into shape before calling gs_build_request().

Mimetype table

  • Store this in .drive environment.
  • Centralize all info needed to act this way versus that based on mime type.

"Valve" that only lets a dribble with one row through

Maybe we need sg like just_one(msg) that is dribble in, dribble out. It passes a dribble through if it has 1 row. Otherwise it errors with msg. Seems like we could use it internally and maybe users would too, in order to write pipelines that start with drive_search() etc. but need to verify that only one file is being operated on.

drive_upload(): when name or id already exists

  • In order to upload with a new mimeType, it seems you have to both upload & change the metadata simultaneously
  • It seems the only way to do so is with a multipart upload
    • don't know how to do that without writing a temporary file
  • This was asked about on SO.
  • Addressed a bit here: r-lib/httr#253

Elevate `type` to an explicit argument of drive_list()

Needs to happen after #19 gets merged.

We need to decide what values of type are acceptable.

Summarizing discussion from #7, which I'm closing.

@LucyMcGowan proposed this mapping:

type q
document q="mimeType='application/vnd.google-apps.document'"
spreadsheet q="mimeType='application/vnd.google-apps.spreadsheet'"
presentation q="mimeType='application/vnd.google-apps.presentation'"
folder q="mimeType='application/vnd.google-apps.folder'"
file q="mimeType !='application/vnd.google-apps.folder'"

Jenny pointed out the need to be inspired by this dropdown menu when searching files on Drive in the browser:

I think this will ultimately require beefing up the mime type table in some way, so we can lookup a human-friendly value and retrieve the proper mime type, to build a q clause.

Early feature / experiment request: Sharing settings

I think this is already on the to do list, but allow me to be specific ...

I would dearly love to get access to this info via API. Go to a Google Sheet. Click on the blue Share button in the upper right corner and you get this popup:

screen shot 2017-05-02 at 2 50 44 pm

Obtain a googledrive-specific client secret and id

Looks like you're using the secret and id from googlesheets here in googledrive

screen shot 2017-05-10 at 7 54 10 pm

We should create an app/project for this package specifically. I should probably be the one to do that, using the existing account that owns the googlesheets project.

Upload folder & contents

  • currently with drive_upload() you can input a folder, and it will upload a folder of that name to google drive, but not the contents of the folder 🙃🌴

change `drive_file()` to output a dribble

  • drive_file() should output something similar to a single row output from drive_list()
  • perhaps develop a drive tibble (dribble) method
  • switch to gfiles as only an internal thing

Make it easy to get top-level listing of My Drive

I now realize that drive.files.list lists My Drive, by default, without regard to folder hierarchy. I put this in drive_list():

  ## if path reduces to root (i.e., "My Drive"), make it an explicit NULL
  path <- rationalize_path(path)

which should be revisited. I now think path = "~" or path = "~/" or path = "/" should get a listing of files with root as direct parent. Which is different from path = NULL.

Tidyverse-wide project

Create a project to own API key and client id and secret that we will use uniformly across googledrive, googlesheets*, and bigrquery.

Download a file

How do you download the file with the use of the already authenticated client? It's not indicated in the functionalities. Is this supported already?

Turn on appveyor

I think it's a good time to fire up appveyor. In #9 I expand the checking on travis. Between that and this, we're in pretty good shape re: automatic checking.

devtools::use_appveyor() should do the trick. I don't think there's anything about this package that would require non-default settings. But it is possible that I have to be the one to activate it on appveyor within the tidyverse organization, after you do the bit here in the repo.

"house" code style

@LucyMcGowan

There is a developing notion of a tidyverse code style. I might point this out at times but it feels like it would be more efficient to do one massive run at this.

To get a sense of what I'm talking about, install lintr from CRAN and, in the googledrive package/project, run devtools::lint(). You should get a long report of all the style "violations". There is a small number of themes here, but they each come up a lot.

Maybe you can do one massive lint-related PR?

Also let's figure out if your editor / dev process could be changed so that more of this just happens automatically.

Interface of drive_list()

I think you should make heavy reference to list.files(). Specifically:

  • path: Should default to meaning "list my Drive files", which strikes me as the closest match for "current working directory". But then user should also be able to specify a folder here as well.
  • pattern: that's what the regex should be called, instead of search
  • I think type should possibly be elevated to an argument in the signature of drive_ls() itself, as opposed to getting mopped up in ....
  • Lots of the other args of list.files() are also worth considering. Good to ask yourself: is there is an obvious translation of this into the world of Drive?

In general, you need to think about how to help people craft searches. Obviously step one is simply to link to the file searching guide, as we've done. And to include some good examples. But this is an example of something that might be ripe for an add-in down the road.

Suggestion to change prefix of functions from "gd"

Feel free to ignore this if you have already considered it, but "gd" can be interpreted as an expletive.

My suggestion would be to use "gdr" instead. It does break the two-letter prefix consistency with googlesheets, but avoids the expletive interpretation.

Use on.exit() for test cleanup

Wherever possible, let's use on.exit() to specify clean up actions for tests. This will result in more things getting cleaned up properly in a wider variety of situations.

Here's an example of what I mean:

https://github.com/tidyverse/readxl/blob/d182cd3b0f84097fb9cb12c6cbfa442802840128/tests/testthat/test-read-excel.R#L14

Basically before or just after you create something, put an on.exist() action in place to ensure its removal.

Once you read this @LucyMcGowan, you can close it.

Use as.POSIXct() for dates

When processing Drive file metadata, default to as.POSIXct() over as.Date() in places like these:

last_modified = as.Date(proc_res$modifiedTime),
created = as.Date(proc_res$createdTime),

Or maybe use httr::parse_http_date()?

Main message: store dates as POSIXct and implement that with some consistent choice of conversion function.

create demos or articles of quirky use cases

some I've thought of so far:

  • ability to push slides made in RStudio to drive every time you render
  • creating a "live" resume like this with Google Drive + shiny (or a semi-live one that just needs re-rendering like this)
  • storing an .rda file on Google drive & using it in an R analysis (like #17)
  • collaborating on a .Rmd on Google Drive & rendering in R (tricky, because versioning...would work fine if it was only edited on the Google Drive, but hard to manage that)
  • creating a document & adding a tibble of people by sending them an email (with glue!)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.