ropensci / roadoi Goto Github PK

View Code? Open in Web Editor NEW

64.0 7.0 3.0 540 KB

Use Unpaywall with R

Home Page: https://docs.ropensci.org/roadoi

License: Other

R 100.00%

open-access oadoi r webclient code4lib altmetrics rstats r-package peer-reviewed unpaywall

roadoi's Introduction

roadoi - Use Unpaywall with R

roadoi interacts with the Unpaywall REST API, an openly available web-interface which returns metadata about open access versions of scholarly works.

This client supports the most recent API Version 2.

API Documentation: https://unpaywall.org/products/api

How do I use it?

Use the oadoi_fetch() function in this package to get open access status information and full-text links from Unpaywall.

roadoi::oadoi_fetch(dois = c("10.1038/ng.3260", "10.1093/nar/gkr1047"), 
                    email = "[email protected]")
#> # A tibble: 2 x 21
#>   doi      best_oa_location  oa_locations oa_locations_emb…
#>   <chr>    <list>            <list>       <list>           
#> 1 10.1038… <tibble [1 × 11]> <tibble [1 … <tibble [0 × 0]> 
#> 2 10.1093… <tibble [1 × 10]> <tibble [6 … <tibble [0 × 0]> 
#> # … with 17 more variables: data_standard <int>,
#> #   is_oa <lgl>, is_paratext <lgl>, genre <chr>,
#> #   oa_status <chr>, has_repository_copy <lgl>,
#> #   journal_is_oa <lgl>, journal_is_in_doaj <lgl>,
#> #   journal_issns <chr>, journal_issn_l <chr>,
#> #   journal_name <chr>, publisher <chr>,
#> #   published_date <chr>, year <chr>, title <chr>,
#> #   updated_resource <chr>, authors <list>

There are no API restrictions. However, providing an email address is required and a rate limit of 100k is suggested. If you need to access more data, use the data dump instead.

RStudio Addin

This package also has a RStudio Addin for easily finding free full-texts in RStudio.

How do I get it?

Install and load from CRAN:

install.packages("roadoi")
library(roadoi)

To install the development version, use the devtools package

devtools::install_github("ropensci/roadoi")
library(roadoi)

Documentation

See https://docs.ropensci.org/roadoi/ to get started.

roadoi's People

Contributors

Stargazers

Watchers

Forkers

mtub ahobert delwen

roadoi's Issues

Move readme figures into ./man/figures

See https://docs.ropensci.org/roadoi/ and this comment from Hadley: ropensci-org/rotemplate#19 (comment)

You can also have a look where other packages store readme.md figures:

Prepare CRAN submission

Because oaDOI was officially announced as backend for unpaywall, we are safe to make a stable version available via CRAN.

Prevent submission of DOIs with non-standard whitespace

httr::GET("https://api.unpaywall.org/v2/10.1177/ 0042098012452322&[email protected]", 
    httr::timeout(5))
#> Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [api.unpaywall.org] Operation timed out after 5003 milliseconds with 0 bytes received

^{Created on 2021-09-09 by the reprex package (v2.0.0)}

Use GET endpoint instead of POST method

from @jasonpriem #1

We completely rebuilt a lot of oaDOI over the last month in order to deal with the massive traffic we were getting. As part of that, we found that is was actually easier to scale (using Heroku Dynos) if we got one DOI per request, using the GET endpoint. That's reflected in the new API docs, which no longer mention the POST. It's also for the most part easier for people to use.

So, if you get the chance or have the interest, we'd love if this wrapper could use the GET endpoint as well.

Error: Timeout was reached

Hi- I was successfully using roadoi several months ago (thanks for the tool!) on large queries, using:

oa_out = roadoi::oadoi_fetch(dois = this_doi, email = "[email protected]")

But since I have come back to the project in the past two weeks, I have not been able to query more than a small number of DOIs at a time before getting the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

For example, to extract data for 700 DOIs, I had to restart about 20 times, successfully downloading data for 5-50 DOIs at a time before timing out.

Any idea what is going wrong? I would like to be able to query tens of thousands of DOIs in the near future. Thanks for your help.

enhancement: subtitles and output-type

For the compliance checking use case it's important to know the type of the output.
Some OA policies only apply to outputs of type journal-article whilst there might be slightly different compliance rules for a book-chapter , book , or monograph et cetera...

I know both the subtitle field and the type field are available via the CrossRef API.
Could both of these fields please be represented in the default output from oadoi_fetch ?

Subtitle is useful because often some output titles are just "Introduction" and the subtitle contains the more informative bit... container.title might also be useful to report from oadoi_fetch, although perhaps it was a conscious decision to leave outputs without journal titles?

Or is it simply that the oadoi API does not return these fields? If so I should report this to them, because output type really does matter for this use case!

Error : 'vec_dim' is not an exported object from 'namespace:vctrs'

I ran into this issue after installing the package today and running this example code from the Vignette:

library(roadoi)
roadoi::oadoi_fetch(dois = c(
    "10.1186/s12864-016-2566-9",
    "10.1103/physreve.88.012814"), 
  email = "[email protected]")

Maybe the vctrs:::vec_dim() used to be exported from that namespace but is no longer?

Write a vignette

Provide a long form documentation including nice use cases.

How to write vignettes: http://r-pkgs.had.co.nz/vignettes.html

Addin throws error when nothing was found and when multiple

... and when multiple empty lines are submitted

API limit is now 100k

in fact there is no limit, but usage of data dump is recommended when the number of records exceed 100k per day:
https://oadoi.org/api

Upgrade client to support V2

Will become default on 1 October.

Docu: https://oadoi.org/api/v2

Error "SSL certificate problem" with oadoi_fetch function

Hi! I am getting an error when using the oadoi_fetch function (see attached screehnshot below) since at least a week ago. Any idea of what is causing it and how to solve it? Thank you!

Question about stop condition in roadoi::oadoi_fetch

Hi, first off thanks for this tool:)
I was wondering if there's a particular reason for stopping the script when either a DOI is incorrect or oaDOI doesn't return results within the timeout. When the dois list given as an argument to the function is large, the probability of zero results increases, i.e. when the last DOI in the list is incorrect the result of the function will not be available, even for the correct DOIs (same with timeouts).
Is there any reason for not using a tryCatch mechanism and 'remember' the erroneous DOIs? (or is it a feature not yet implemented? ;-)
Thnxs,
Caspar Treijtel (Library of the University of Amsterdam)

Sys.sleep() in between API requests?

Let's assume that the API provider wants me to send requests one by one, ie. not sending a new request before the previous one is taken care of. I wonder if a Sys.sleep() anywhere inside the oadoi_fetch_ function would do the trick? Or, would it need to be (if possible) in the "scope" of the plyr::llply function?

The reason why I'm asking is that I'm learning proper ways to deal with APIs, and your roadoi code acts as an excellent, well-built tutorial. In other words, my question does not refer to the Unpaywall API as such.

Thanks for any pointers!

Error : object ‘textAreaInput’ is not exported by 'namespace:shiny'

Apologies, for my late start on getting around to review your package!

I had trouble installing from CRAN, I'll move on to trying the development version after this.

> install.packages("roadoi")
Installing package into ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
--2017-06-12 07:29:55--  https://cran.rstudio.com/src/contrib/roadoi_0.2.tar.gz
Resolving cran.rstudio.com (cran.rstudio.com)... 52.85.67.12
Connecting to cran.rstudio.com (cran.rstudio.com)|52.85.67.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 559051 (546K) [application/x-gzip]
Saving to: ‘/tmp/RtmpAkRkEv/downloaded_packages/roadoi_0.2.tar.gz’

     0K .......... .......... .......... .......... ..........  9%  451K 1s
    50K .......... .......... .......... .......... .......... 18%  216K 2s
   100K .......... .......... .......... .......... .......... 27% 3.83M 1s
   150K .......... .......... .......... .......... .......... 36% 2.76M 1s
   200K .......... .......... .......... .......... .......... 45%  172K 1s
   250K .......... .......... .......... .......... .......... 54% 3.07M 1s
   300K .......... .......... .......... .......... .......... 64% 3.60M 0s
   350K .......... .......... .......... .......... .......... 73% 2.58M 0s
   400K .......... .......... .......... .......... .......... 82% 4.33M 0s
   450K .......... .......... .......... .......... .......... 91% 2.76M 0s
   500K .......... .......... .......... .......... .....     100% 4.61M=0.8s

2017-06-12 07:29:56 (728 KB/s) - ‘/tmp/RtmpAkRkEv/downloaded_packages/roadoi_0.2.tar.gz’ saved [559051/559051]

* installing *source* package ‘roadoi’ ...
** package ‘roadoi’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
Error : object ‘textAreaInput’ is not exported by 'namespace:shiny'
ERROR: lazy loading failed for package ‘roadoi’
* removing ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.2/roadoi’
Warning in install.packages :
  installation of package ‘roadoi’ had non-zero exit status

The downloaded source packages are in
	‘/tmp/RtmpAkRkEv/downloaded_packages’
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.5

different results between roadoi & unpaywall

For http://doi.org/10.1016/j.tree.2012.08.021 I get different results when using roadoi & unpaywall.org . Why? In theory they both use the same source API : https://oadoi.org/

roadoi gives is_free_to_read FALSE and oa_color_lang "closed"
but unpaywall.org gives it a green tab (actually it should probably be a blue tab?)

How and why do roadoi and unpaywall differ sometimes?

email starting with capital letter rejected

An email starting with a capital letter is rejected, while starting with a small letter is accepted:

> !grepl(roadoi:::email_regex(), "[email protected]")
[1] TRUE
> !grepl(roadoi:::email_regex(), "[email protected]")
[1] FALSE

No DOI should result in empty list with same format

Not sure is this the same as #19.

If I try to process a not complete list of DOIs like:

DOIs <- c("10.1016/j.jbiotec.2010.07.030","10.1186/1471-2164-11-245","")
OAStatus <- oadoi_fetch(dois= DOIs,email = "[email protected]")

I got the following result:

Fehler: Columns doi, data_standard, is_oa, journal_issns, journal_name, publisher, title, updated must be 1d atomic vectors or lists
Zusätzlich: Warnmeldung:
In is.na(req$journal_is_oa) :
is.na() auf nicht-(Liste oder Vektor) des Typs 'NULL' angewendet

The incomplete DOI list I got from a publication database. The result I like to merge easy with the existing data. Certainly there are workarounds for the problem but it would be nice if this would work out of the box.

add "email" to API call

Thanks so much for making this R wrapper, it's fantastic! !!! yay :)

When you get a chance, could you add "email" as a param to the call signature and your usage example, and email=EMAIL to the api request? Tracking the oadoi (via roadoi) usage allows us to get in touch if something breaks, and also means we can report stats to our funders, which helps us keep our non-profit afloat. :) Thanks!

what is "blue" open access?

I notice from the README that you have green, gold, and blue options for oa_color.

The oaDOI API seemingly only returns gold & green(?): https://oadoi.org/api

I'm guessing "blue" OA might be equivalent to when is_boai_license is TRUE but I couldn't find this explicitly stated anywhere. So at the very least some clarification is needed.

Personally I'm not a fan of the terms 'gold' and 'green' - they are oft misunderstood and aren't very fitting words for things they describe ('gold' often implies expensive, but it doesn't have to be). But gold and green are popular terms and thus are probably here to stay whether I like it or not.

The same can't be said for blue OA. I think very few people in the world know what blue OA is and thus I'd urge you not to use that term in this package. The functionality is great - keep that, it's good to know what subset of gold OA is BOAI-compliant. Just rename it BOAI-compliant OA or something like that?

Fix CRAN warnings

tidyr::unnest() fails because of the duplicated column name updated

rebranding of oadoi

oaDOI has become Unpaywall Data
http://unpaywall.org/data

To Do:

change API endpoint
change links
use Unpaywall Data instead of oaDOI in documentation files