GithubHelp home page GithubHelp logo

ropensci / roadoi Goto Github PK

View Code? Open in Web Editor NEW
64.0 7.0 3.0 540 KB

Use Unpaywall with R

Home Page: https://docs.ropensci.org/roadoi

License: Other

R 100.00%
open-access oadoi r webclient code4lib altmetrics rstats r-package peer-reviewed unpaywall

roadoi's Introduction

roadoi - Use Unpaywall with R

R build status codecov.io cran version rstudio mirror downloads review

roadoi interacts with the Unpaywall REST API, an openly available web-interface which returns metadata about open access versions of scholarly works.

This client supports the most recent API Version 2.

API Documentation: https://unpaywall.org/products/api

How do I use it?

Use the oadoi_fetch() function in this package to get open access status information and full-text links from Unpaywall.

roadoi::oadoi_fetch(dois = c("10.1038/ng.3260", "10.1093/nar/gkr1047"), 
                    email = "[email protected]")
#> # A tibble: 2 x 21
#>   doi      best_oa_location  oa_locations oa_locations_emb…
#>   <chr>    <list>            <list>       <list>           
#> 1 10.1038… <tibble [1 × 11]> <tibble [1 … <tibble [0 × 0]> 
#> 2 10.1093… <tibble [1 × 10]> <tibble [6 … <tibble [0 × 0]> 
#> # … with 17 more variables: data_standard <int>,
#> #   is_oa <lgl>, is_paratext <lgl>, genre <chr>,
#> #   oa_status <chr>, has_repository_copy <lgl>,
#> #   journal_is_oa <lgl>, journal_is_in_doaj <lgl>,
#> #   journal_issns <chr>, journal_issn_l <chr>,
#> #   journal_name <chr>, publisher <chr>,
#> #   published_date <chr>, year <chr>, title <chr>,
#> #   updated_resource <chr>, authors <list>

There are no API restrictions. However, providing an email address is required and a rate limit of 100k is suggested. If you need to access more data, use the data dump instead.

RStudio Addin

This package also has a RStudio Addin for easily finding free full-texts in RStudio.

How do I get it?

Install and load from CRAN:

install.packages("roadoi")
library(roadoi)

To install the development version, use the devtools package

devtools::install_github("ropensci/roadoi")
library(roadoi)

Documentation

See https://docs.ropensci.org/roadoi/ to get started.

Meta

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License: MIT

Please use the issue tracker for bug reporting and feature requests.

ropensci_footer

roadoi's People

Contributors

ahobert avatar delwen avatar karthik avatar njahn82 avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mtub ahobert delwen

roadoi's Issues

Use GET endpoint instead of POST method

from @jasonpriem #1

We completely rebuilt a lot of oaDOI over the last month in order to deal with the massive traffic we were getting. As part of that, we found that is was actually easier to scale (using Heroku Dynos) if we got one DOI per request, using the GET endpoint. That's reflected in the new API docs, which no longer mention the POST. It's also for the most part easier for people to use.

So, if you get the chance or have the interest, we'd love if this wrapper could use the GET endpoint as well.

Error: Timeout was reached

Hi- I was successfully using roadoi several months ago (thanks for the tool!) on large queries, using:

oa_out = roadoi::oadoi_fetch(dois = this_doi, email = "[email protected]")

But since I have come back to the project in the past two weeks, I have not been able to query more than a small number of DOIs at a time before getting the error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached

For example, to extract data for 700 DOIs, I had to restart about 20 times, successfully downloading data for 5-50 DOIs at a time before timing out.

Any idea what is going wrong? I would like to be able to query tens of thousands of DOIs in the near future. Thanks for your help.

enhancement: subtitles and output-type

For the compliance checking use case it's important to know the type of the output.
Some OA policies only apply to outputs of type journal-article whilst there might be slightly different compliance rules for a book-chapter , book , or monograph et cetera...

I know both the subtitle field and the type field are available via the CrossRef API.
Could both of these fields please be represented in the default output from oadoi_fetch ?

Subtitle is useful because often some output titles are just "Introduction" and the subtitle contains the more informative bit... container.title might also be useful to report from oadoi_fetch, although perhaps it was a conscious decision to leave outputs without journal titles?

Or is it simply that the oadoi API does not return these fields? If so I should report this to them, because output type really does matter for this use case!

Question about stop condition in roadoi::oadoi_fetch

Hi, first off thanks for this tool:)
I was wondering if there's a particular reason for stopping the script when either a DOI is incorrect or oaDOI doesn't return results within the timeout. When the dois list given as an argument to the function is large, the probability of zero results increases, i.e. when the last DOI in the list is incorrect the result of the function will not be available, even for the correct DOIs (same with timeouts).
Is there any reason for not using a tryCatch mechanism and 'remember' the erroneous DOIs? (or is it a feature not yet implemented? ;-)
Thnxs,
Caspar Treijtel (Library of the University of Amsterdam)

Sys.sleep() in between API requests?

Let's assume that the API provider wants me to send requests one by one, ie. not sending a new request before the previous one is taken care of. I wonder if a Sys.sleep() anywhere inside the oadoi_fetch_ function would do the trick? Or, would it need to be (if possible) in the "scope" of the plyr::llply function?

The reason why I'm asking is that I'm learning proper ways to deal with APIs, and your roadoi code acts as an excellent, well-built tutorial. In other words, my question does not refer to the Unpaywall API as such.

Thanks for any pointers!

Error : object ‘textAreaInput’ is not exported by 'namespace:shiny'

Apologies, for my late start on getting around to review your package!

I had trouble installing from CRAN, I'll move on to trying the development version after this.

> install.packages("roadoi")
Installing package into ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.2’
(as ‘lib’ is unspecified)
--2017-06-12 07:29:55--  https://cran.rstudio.com/src/contrib/roadoi_0.2.tar.gz
Resolving cran.rstudio.com (cran.rstudio.com)... 52.85.67.12
Connecting to cran.rstudio.com (cran.rstudio.com)|52.85.67.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 559051 (546K) [application/x-gzip]
Saving to: ‘/tmp/RtmpAkRkEv/downloaded_packages/roadoi_0.2.tar.gz’

     0K .......... .......... .......... .......... ..........  9%  451K 1s
    50K .......... .......... .......... .......... .......... 18%  216K 2s
   100K .......... .......... .......... .......... .......... 27% 3.83M 1s
   150K .......... .......... .......... .......... .......... 36% 2.76M 1s
   200K .......... .......... .......... .......... .......... 45%  172K 1s
   250K .......... .......... .......... .......... .......... 54% 3.07M 1s
   300K .......... .......... .......... .......... .......... 64% 3.60M 0s
   350K .......... .......... .......... .......... .......... 73% 2.58M 0s
   400K .......... .......... .......... .......... .......... 82% 4.33M 0s
   450K .......... .......... .......... .......... .......... 91% 2.76M 0s
   500K .......... .......... .......... .......... .....     100% 4.61M=0.8s

2017-06-12 07:29:56 (728 KB/s) - ‘/tmp/RtmpAkRkEv/downloaded_packages/roadoi_0.2.tar.gz’ saved [559051/559051]

* installing *source* package ‘roadoi’ ...
** package ‘roadoi’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
Error : object ‘textAreaInput’ is not exported by 'namespace:shiny'
ERROR: lazy loading failed for package ‘roadoi’
* removing ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.2/roadoi’
Warning in install.packages :
  installation of package ‘roadoi’ had non-zero exit status

The downloaded source packages are in
	‘/tmp/RtmpAkRkEv/downloaded_packages’
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.5

No DOI should result in empty list with same format

Not sure is this the same as #19.

If I try to process a not complete list of DOIs like:

DOIs <- c("10.1016/j.jbiotec.2010.07.030","10.1186/1471-2164-11-245","")
OAStatus <- oadoi_fetch(dois= DOIs,email = "[email protected]")

I got the following result:

Fehler: Columns doi, data_standard, is_oa, journal_issns, journal_name, publisher, title, updated must be 1d atomic vectors or lists
Zusätzlich: Warnmeldung:
In is.na(req$journal_is_oa) :
is.na() auf nicht-(Liste oder Vektor) des Typs 'NULL' angewendet

The incomplete DOI list I got from a publication database. The result I like to merge easy with the existing data. Certainly there are workarounds for the problem but it would be nice if this would work out of the box.

add "email" to API call

Thanks so much for making this R wrapper, it's fantastic! !!! yay :)

When you get a chance, could you add "email" as a param to the call signature and your usage example, and email=EMAIL to the api request? Tracking the oadoi (via roadoi) usage allows us to get in touch if something breaks, and also means we can report stats to our funders, which helps us keep our non-profit afloat. :) Thanks!

what is "blue" open access?

I notice from the README that you have green, gold, and blue options for oa_color.

The oaDOI API seemingly only returns gold & green(?): https://oadoi.org/api

I'm guessing "blue" OA might be equivalent to when is_boai_license is TRUE but I couldn't find this explicitly stated anywhere. So at the very least some clarification is needed.

Personally I'm not a fan of the terms 'gold' and 'green' - they are oft misunderstood and aren't very fitting words for things they describe ('gold' often implies expensive, but it doesn't have to be). But gold and green are popular terms and thus are probably here to stay whether I like it or not.

The same can't be said for blue OA. I think very few people in the world know what blue OA is and thus I'd urge you not to use that term in this package. The functionality is great - keep that, it's good to know what subset of gold OA is BOAI-compliant. Just rename it BOAI-compliant OA or something like that?

Fix CRAN warnings

tidyr::unnest() fails because of the duplicated column name updated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.