GithubHelp home page GithubHelp logo

r-lib / gargle Goto Github PK

View Code? Open in Web Editor NEW
112.0 15.0 32.0 5.01 MB

Infrastructure for calling Google APIs from R, including auth

Home Page: https://gargle.r-lib.org/

License: Other

R 99.09% HTML 0.91%
r google authentication rstats package

gargle's Introduction

gargle

CRAN status Codecov test coverage R-CMD-check

The goal of gargle is to take some of the agonizing pain out of working with Google APIs. This includes functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests.

The target user of gargle is an R package author who is wrapping one of the ~250 Google APIs listed in the APIs Explorer. gargle aims to play roughly the same role as Google’s official client libraries, but for R. gargle may also be useful to useRs making direct calls to Google APIs, who are prepared to navigate the details of low-level API access.

gargle’s functionality falls into two main domains:

  • Auth. The token_fetch() function calls a series of concrete credential-fetching functions to obtain a valid access token (or it quietly dies trying).
    • This covers explicit service accounts, application default credentials, Google Compute Engine, (experimentally) workload identity federation, and the standard OAuth2 browser flow.
    • gargle offers the Gargle2.0 class, which extends httr::Token2.0. It is the default class for user OAuth 2.0 credentials. There are two main differences from httr::Token2.0: greater emphasis on the user’s email (e.g. Google identity) and default token caching is at the user level.
  • Requests and responses. A family of functions helps to prepare HTTP requests, (possibly with reference to an API spec derived from a Discovery Document), make requests, and process the response.

See the articles for holistic advice on how to use gargle.

Installation

You can install the released version of gargle from CRAN with:

install.packages("gargle")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("r-lib/gargle")

Basic usage

gargle is a low-level package and does not do anything visibly exciting on its own. But here’s a bit of usage in an interactive scenario where a user confirms they want to use a specific Google identity and loads an OAuth2 token.

library(gargle)

token <- token_fetch()
#> The gargle package is requesting access to your Google account.
#> Enter '1' to start a new auth process or select a pre-authorized account.
#> 1: Send me to the browser for a new auth process.
#> 2: [email protected]
#> 3: [email protected]
#> Selection: 2

token
#> ── <Token (via gargle)> ─────────────────────────────────────────────────────
#> oauth_endpoint: google
#>            app: gargle-clio
#>          email: [email protected]
#>         scopes: ...userinfo.email
#>    credentials: access_token, expires_in, refresh_token, scope, token_type, id_token

Here’s an example of using request and response helpers to make a one-off request to the Web Fonts Developer API. We show the most popular web font families served by Google Fonts.

library(gargle)

req <- request_build(
  method = "GET",
  path = "webfonts/v1/webfonts",
  params = list(
    sort = "popularity"
  ),
  key = gargle_api_key(),
  base_url = "https://www.googleapis.com"
)
resp <- request_make(req)
out <- response_process(resp)

out <- out[["items"]][1:8]
sort(vapply(out, function(x) x[["family"]], character(1)))
#> [1] "Lato"             "Material Icons"   "Montserrat"       "Noto Sans JP"    
#> [5] "Open Sans"        "Poppins"          "Roboto"           "Roboto Condensed"

Please note that the ‘gargle’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Privacy policy

gargle's People

Contributors

acroz avatar akgold avatar batpigandme avatar campbead avatar collinberke avatar craigcitro avatar hadley avatar jcheng5 avatar jdtrat avatar jennybc avatar jimhester avatar jimjam-slam avatar maelle avatar markedmondson1234 avatar michaelchirico avatar michaelquinn32 avatar samterfa avatar wlongabaugh avatar yihui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gargle's Issues

Force re-auth

Think about making an explicit method for this vs. being implied by supplying novel email.

Doctor function or mode

As you fall through the credential functions, perhaps those errors should be logged in the internal state, for possible post mortem inspection via dr_token() or the like.

Add userinfo.email scope via default arg

Still not convinced how to handle this. Because that makes it look like a user-supplied scope would displace that scope, when in fact we would add the user-supplied scope(s) to this.

Expose scope control in client packages

This will happen in client packages, e.g. googledrive. But it's a general pattern, so I want to track this issue here.

Wrappers like googledrive::drive_auth() should expose scopes, just like they expose email, etc. The default should be some big, generous scope like "https://www.googleapis.com/auth/drive". But this lets a more advanced, more conservative user request the minimal scope needed for their goals.

I could even present a chooser of the scopes they might want to consider for a specific package.

We could even check requests against the token to tell people whether their token supports the requested action. I would have to see how cryptic the errors sent by the API are. If they are informative, it would be nice to just let the API tell people the token is underscoped.

cc @jimhester

Release gargle 0.1.3

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • usethis::use_cran_comments()
  • devtools::check()
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish pkgdown reference index
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • Update cran-comments.md
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Update install instructions in README
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Shiny auth?

It is with great excitement I see the activity here (been watching since @jennybc pinged me about its prototype a while back), I hope to move all auth things of googleAuthR to depend on this package for its next major version, for which it is well due. Hopefully it will also mean more cross compatibility between all the Google API packages out there.

Is multi-user Shiny authentication on the road map for this package? That was pretty much the reason I went down the rabbit hole of googleAuthR, as it was a tougher nut to crack due to the token needing to be a Shiny reactive object, which meant keeping the token in a global of the package didn't work. I got around that at the time with a with_shiny wrapper function that added an argument to each API call function with the token explicitly supplied, but there must be a better way.

If I can help at all, I will try to chip in, at the very least documentation or something once its developed.

Tighten up the errors

As opposed to current stop_glue(), use abort() + glue() for simple stuff. For errors used in multiple places, use a function that creates a classed condition.

Use for General Authentication?

I'd like to authenticate users within my company (all of whom have Gmail accounts) for uses other than Google APIs specifically (eg, dashboard permissions). I need a way to verify their identity; after that I can deal with verifying the permissions associated with their identity. Would using gargle to verify that the user is who they say they are make sense for this use case, in your opinion?

Auth and Deauth Function Factories

As a package developer, I'd like to easily create auth and deauth functions, so that I can easily implement gargle.

As far as I can tell, only the scopes, error message, and package name for googledrive::drive_auth are unique to googledrive (and nothing is unique to googledrive::drive_deauth). Having factories for these functions would make gargle implementation more straightforward.

I'll submit a PR shortly to close this, but wanted an issue for discussions in case my PR doesn't go as smoothly as I expect.

Credential mismatch

It is possible we'd want to catch these errors and offer some helpful advice:

401: Invalid Credentials

When you're messing with API key, client id, secret, and cached OAuth tokens, it's pretty easy to get into a mismatch situation, which will generally result in a 401.

For example, we can actually check if the app baked into a cached token is the same as the current app. That can be a reason for failed token refresh. The remedy is to re-initiate OAuth flow.

If the app matches the one used to create a cached token, user should then consider if the API key is not from the same project as the app credentials. I don't know of a way to check for this, so it could just be a suggestion.

https://developers.google.com/drive/v3/web/handle-errors

Publishing gargle to CRAN or other repo

Hello, I'm using install_github("r-lib/gargle") via a Travis-ci job and this fails about 50% of the time because of rate limits from Github. I do not have the same issue with normal CRAN packages, so the question is: Do you have any plans to publish gargle on CRAN or another repo in the near future?

Thank you.

Remove credentials_travis()?

Here's the current definition of credentials_travis():

https://github.com/r-lib/gargle/blob/master/R/travis-credentials.R

In English: it checks that the TRAVIS env var is "true" and then calls credentials_service_account().

From where I sit, it looks like I could delete it.

We certainly don't use it on travis for testing gargle itself or the downstream wrappers I look after. bigrquery, googledrive, etc. use the same approach as gargle, which is to decrypt a service account token, embedded in the shipped package.

@craigcitro What do you think?

Should I be using the nominal or actual scopes in the hash?

Historically, going back to & deriving from httr, the token hash incorporates the nominal scope(s), i.e. the requested scope(s).

But from studying my current token collection, I see that this is not the same as actual scope(s).

With the email, we take the definitive email from the token itself. Indeed, we must, because there's not necessarily any other way for gargle to know the email.

Should I be doing anything about this? Documenting it well for now.

options(width = 999)
#knitr::opts_knit$set(width = 999)
devtools::load_all(".")
#> Loading gargle
tokens <- cache_load(gargle_default_oauth_cache_path())

nominal <- purrr::map(tokens, c("params", "scope"))
nominal <- purrr::map_chr(nominal, ~ commapse(base_scope(normalize_scopes(.x))))

actual <- purrr::map(tokens, c("credentials", "scope"))
f <- function(s) {
  if (is.null(s)) return(NA_character_)
  strsplit(s, split = "\\s+")[[1]]
}
actual <- purrr::map(actual, f)
actual <- purrr::map_chr(actual, ~ commapse(base_scope(normalize_scopes(.x))))

tibble::tibble(nominal, actual)
#> # A tibble: 14 x 2
#>    nominal                               actual                                                   
#>    <chr>                                 <chr>                                                    
#>  1 email, ...bigquery, ...cloud-platform ...bigquery, ...cloud-platform, ...userinfo.email, openid
#>  2 email                                 ""                                                       
#>  3 email, mouthwash                      ""                                                       
#>  4 email, ...drive                       ...drive, ...plus.me, ...userinfo.email                  
#>  5 email                                 ...plus.me, ...userinfo.email                            
#>  6 email, ...spreadsheets.readonly       ...plus.me, ...spreadsheets.readonly, ...userinfo.email  
#>  7 email, ...drive                       ...drive, ...userinfo.email, openid                      
#>  8 email, ...drive                       ...drive, ...userinfo.email, openid                      
#>  9 email, ...drive                       ...drive, ...userinfo.email, openid                      
#> 10 email, ...drive                       ...drive, ...userinfo.email, openid                      
#> 11 email, ...drive                       ...drive, ...plus.me, ...userinfo.email                  
#> 12 email, ...drive.readonly              ...drive.readonly, ...plus.me, ...userinfo.email         
#> 13 email, ...bigquery, ...cloud-platform ...bigquery, ...cloud-platform, ...userinfo.email, openid
#> 14 email, ...spreadsheets                ...spreadsheets, ...userinfo.email, openid

Created on 2019-04-16 by the reprex package (v0.2.1.9000)

Response processing + error message helper

Seems like we should encode all our knowledge about error messages in one place. bigquery currently has this:

signal_reason <- function(reason, message) {
  if (is.null(reason)) {
    stop(message, call. = FALSE)
  } else {
    cl <- c(paste0("bigrquery_", reason), "error", "condition")
    message <- paste0(message, " [", reason, "]")

    cond <- structure(list(message = message), class = cl)
    stop(cond)
  }
}

Prepare repo for Google bot

@craigcitro I did not fill in the year or copyright holder in LICENSE. What should be there? You have write access here so you can tell me what to do or do it yourself.

I'm not expecting any action on this over the holidays!! Just thought I'd do it since the related convo keeps going over on bigquery. If you want to tell someone to watch this space or contribute in this space, now at least it exists.

Enable APIs

I've gone here:

https://console.developers.google.com

logged in with the necessary Google identify and enabled the Google Drive and Sheets API for the gargle-associated project. I suspect more will need to be enabled in the long run, hence this issue. But I decided to do on an as needed basis. None of this matters until I adopt the new app credentials yet, but I will do that soon that's in #6.

@craigcitro Let me know if there are other APIs I should immediately enable.

Refactor is_legit_token() into ... check_token()?

is_legit_token() is a janky function I've passed from package to package, eventually landing here.

I think it should probably become a checker.
Lose verbose and put it under same chattiness control as everything else.
Used in googledrive, so adjust accordingly there.

"scopes" or "scope"?

This is a note to self.

I wish we referred to scopes everywhere, because, in general, the character vector or space-delimited string holding scopes can hold scopes, i.e., multiple scopes.

As much as possible, I have made this true for the surface of gargle.

But httr generally uses the singular form, scope. So there are still some instances of scope here internally that are either absolutely required in order to call a function in httr or adjacent to such usage.

Packages that could use gargle

Packages that could conceivably sign on to the gargle bandwagon and perhaps lead to some good de-duplication of developer and user pain:

just a quick "issue as brain dump", others are welcome to chime in

Before long, I'd like to reach out to others for awareness and invite some discussion.

Get rid of the magrittr dependency

Thoughts @hadley? Currently %>% is used 5x in gce-token.R and that's it. It wouldn't be hard to remove. So far gargle's dependencies are quite minimal and I've gone to some trouble to not use purrr. If it's likely to get pulled in eventually, it would be nice to predict that and save myself some grief. But I think it can be avoided.

Suggested function renaming

get_application_default_credentials -> credentials_app_default
get_gce_credentials ->                 credentials_gce
get_service_account_credentials ->     credentials_service_account
get_travis_credentials ->              credentials_travis
get_user_oauth2_credentials ->         credentials_user_oauth2

A common prefix is better for autocomplete, IMO.

Refactor token caching

  • Cache gargle-fetched tokens only or primarily at the user level.
  • Be able to auto discover tokens (already true with httr caching), including service tokens (not already true) I now think the service token part is separate from this
  • Consider designating a primary Google identity.
  • Store tokens ... in a tibble instead of a list? Lookup based on variables that hold app, scopes, email versus current hashing scheme? Remember you don't need an exact match on scopes, rather you just need to verify the needed scopes are present.

Rename options like gargle.oauth_cache

We are converging on the convention gargle_oauth_cache in tidyverse/r-lib, generally.

@craigcitro I see a few options in credential functions you contributed:

use_ip <- getOption("gargle.gce.use_ip", FALSE)

timeout <- getOption("gargle.gce.timeout", default = 0.8)

Do you / do Googlers really care about those options being named gargle.gce.timeout and gargle.gce.use_ip?

API keys and tokens

Notes re: managing the API key. These notes probably more relevant to packages that use than gargle itself.

If sending an API key and an OAuth token at the same time, they must both come from the same project. Note also that you cannot refresh an existing token with a new app. When the app changes, all tokens must be remade.

Do not send an API key with a token from a service account.

cc @craigcitro

Nicer ergonomics around scopes

Default scope checking should be based on equality or inclusion.

But many APIs have scope situation that are more complicated. For example, Gmail's "full" scope supersedes the "compose" scope, but there's no way to detect that w/o external knowledge.

In this comment @craigcitro also points out a relevant situation re: bigquery.

Kill the zombie tidyverse/gargle AppVeyor project

Now that @jimhester has straightened out some of the weird ownership issues, gargle has a proper AppVeyor project below the r-lib organization. But the previous history & logs did not come along for the ride. Before long, I should kill the lingering project in the tidyverse account, so I don't confuse myself later.

Solidify (my understanding of) credentials_gce()

My notes.

Can the option gargle.gce.use_ip be renamed to gargle_gce_use_ip, in order to obey the same convention as other gargle options?

Ditto for gargle.gce.timeout.

The error handling in gce_metadata_request() is different from everything else in gargle, at this point. Specifically, I'm comparing to response_process() and related helpers.

Beyond this, I would have to get myself in a position to actually play with this myself.

Dealing with multiple authentication scopes/APIs

One thing I've coded around in httr is dealing with authentication when you have multiple Google login emails/APIs/scopes in one script. For instance a typical workflow for me is to download from Google Analytics and Search Console, then upload it to Cloud Storage and BigQuery.

At the beginning there was much confusion from me and users when the scopes/auth failed to match the service, and the dreaded auth errors started up which I'm sure you know are a pain to debug.

My workaround for this was to ensure explicit authentication each time, and to make sure the options specified in the auth cache file overwrote anything existing in the R environment so that you could write code like:

ga_auth("home.oauth")
...get Google Analytics data...
bq_auth("work.oauth")
...get BigQuery data...

...or setup a shared token with mutual scopes:

gar_auth("both.oauth", scopes = c("google_analytics", "big_query"))
...get Google Analytics data...
...get BigQuery data...

I realised one day that the .httr-oauth file actually contains a list of multiple authentications sometimes, so I think its possible it did support it but I never found an easy way to make it work. I actually code to make sure that only the first token stored in the .httr-oauth is used if there is a list of tokens, as pragmatically debugging one token at a time seemed much easier.

For this reason I also took out the cache=TRUE or FALSE option and made it explicitly a filename each time as well, to avoid accidental overwrite of say a .httr-oauth file.

So my question is how is this handled in gargle, and can I help at all.

p.s. is there an example of gargle being used in a package now or some kind of getting started guide I can use to contribute? Is it only googlesheets4 for now? Is gargle in a state I can start a branch experimenting with it?

Extend support of application default credentials

I've not given much attention to application default credentials for this first release. I've used it successfully now (with a service account token), but I've got a few lingering questions and ideas for future refinement/extended support.

Also, parking notes and links here.


Google Cloud Application Default Credentials (ADC) are not credentials. ADC is a strategy to locate Google Cloud Service Account credentials. If the environment variable GOOGLE_APPLICATION_CREDENTIALS is set, ADC will use the filename that the variable points to for service account credentials.

from https://www.jhanley.com/google-cloud-application-default-credentials

That does not fully capture all the locations checked by gargle::credentials_app_default(), but that is the first place it checks.

Official ADC docs: https://cloud.google.com/docs/authentication/production and https://cloud.google.com/sdk/docs/

I put in some minimal docs for gargle::credentials_app_default() via 4256749.

credentials_app_default() looks for a file at a path encapsulated by credentials_app_default_path(). Here's where it looks, in order, where ALL_CAPS indicates env var:

GOOGLE_APPLICATION_CREDENTIALS
CLOUDSDK_CONFIG/application_default_credentials.json
(APPDATA %||% SystemDrive %||% C:)/gcloud/application_default_credentials.json (Windows)
~/.config/gcloud/application_default_credentials.json (not Windows)

If a file exists at the path returned by credentials_app_default_path(), we parse it as JSON.

It is assumed that it (via info$type) declares itself to be an OAuth or service account token.

If it's OAuth, there's a bit of fiddling with scopes, then a new httr::Token2.0 is instantiated "by hand".

  • Question: how does one even end up with an OAuth2 token stored as this sort of JSON? I'm thinking maybe via the gcloud cli?
  • If I knew exactly how to do this, gargle could offer a way to write existing user tokens this way, as a means of moving a project over to a server. For people who resist service account tokens and insist on user tokens, this would be a nice accommodation.

Question: should I just leave credentials_app_default() as is? Possible tweaks:

  • Use init_oauth2.0 () instead of httr::Token2.0$new()? This seems to be a preferred workflow and yet I don't think it's possible, because we want to shove an existing refresh token in there.
  • Is the new gargle::Gargle2.0 subclass of httr::Token2.0 relevant?

If this is a service account token, we call credentials_service_account() and then we're done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.