ropensci-archive / refimpact Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 0.0 4.34 MB

:no_entry: ARCHIVED :no_entry: API Wrapper for the UK REF 2014 Impact Case Studies Database

License: Other

R 100.00%

uk research-funding research-improvement directed-graph directed-graphs text-mining research-policy r rstats r-package

refimpact's Introduction

refimpact

This repository has been archived. The former README is now in README-NOT.md.

refimpact's People

Contributors

Stargazers

Watchers

refimpact's Issues

Fails to build

The CI fails to build the package:

* checking for file 'refimpact/DESCRIPTION' ... OK
* preparing 'refimpact':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building 'refimpact.Rmd' using rmarkdown
refimpact: API wrapper for the UK REF 2014 Impact Case Studies Database.
Run ?refimpact for help, or see the vignette.
Quitting from lines 71-74 (refimpact.Rmd) 
Error: processing vignette 'refimpact.Rmd' failed with diagnostics:
incorrect number of dimensions
--- failed re-building 'refimpact.Rmd'

SUMMARY: processing the following file failed:
  'refimpact.Rmd'

Consider object_verb function naming pattern

This is minor, but the naming could follow the object_verb naming guidelines suggested in rOpenSci Packaging Guide, possibly: casestudies_get, institutions_get, etc., or even ref_casestudies_get, ref_institutions_get, ... to make it clear that these are all from this package. This makes it more like the stringi example in the packaging guidelines. But for such a short package this is really a matter of preference than any sort of requirement. (But since rOpenSci is designed to keep the R jungle tidy, why not start with even the smallest vines.)

Expand ?ref_get with examples of each argument

Create the README.md file directly from a README.Rmd

A Code of Conduct as a community guideline should be added

devtools::use_code_of_conduct()

Consider how to make package backwards compatible

Address release/NEWS.md mismatch

NEWS.md: This file has no details besides declaring the "initial release", although there are two tagged releases in the GitHub account. Could link the release announcement in NEWS.md to the v1.0 release tag.

No CONTRIBUTING or CONDUCT.md files are present

May be a duplicate of #10 - need to see if this is an either/or

Review function and package documentation

This is where the package is weakest, since the documentation is limited to the barest of documentation for the functions. Even those do not describe much about the context of the options or inputs: for instance get_case_studies() did not really explain what the ID is or why I might use it instead of a UKPRN. I suggest that this could be vastly expanded in a Details section and added as an overview to refimpact-package.Rd. I had to do considerable testing and reading from the UK REF website before I understood what were the inputs to the functions.

Extend unit tests

For instance, the single test in test_tag_types.R is:

test_that("get_tag_types() returns a tibble", {
  # skip_on_cran()
  expect_equal(dim(get_tag_types()),c(13,2))
})

which not only does not test whether the return is a tibble, but also only matches the dimensions known in advance. A more robust test might also compare the values returned to the known tags from the REF website, or that the return type is in fact a tibble class object.

Investigate 'Namespace in Imports field not imported from: ‘curl’'

Functionality needed in many functions, i.e. API calls and validation, could be separated and stored in a utils.R file.

Use httr to work with and manipulate http requests

Think about whether hybrid data/API makes sense

As someone who was actually an individual submitted in this exercise -- although not for a case study -- from one of the listed institutions, and personally involved in managing the staff submitted from my institution, I held the first impression that this would be a really cool package to have, for accessing this data. However the more I experimented with the package, the more I wondered why the API approach was needed. The data is static, so the only reasons not to package the data are copyright and size. Most of the information (except some of the case studies - see http://impact.ref.ac.uk/CaseStudies/Terms.aspx) is governed by a CC license, and so could easily be packaged as data. The only objection to size applies to the case studies themselves, but again, if the documentation or README.md had more on the motivation and/or documentation, I would have a better idea of just how large this is (and whether this size makes it something that is not better simply provided as a flattened large data.frame or "tibble").

The following static tables from the API are CC licensed and could easily be packaged as built-in objects:

institutions: This table is 155 x 5 data.frame of 20.8k in size
units_of_assessment: 36 x 3
tag_types: 13 x 2
values: This is much larger but the entire table could be flattened in a way that links to tag_types, if we are willing to strongly suspend the principles of relational data normalization (something most users may not know or care about).
This seems to gut the functions from the package, since it leaves only get_case_studies(), which might be appropriately handled through an API call. But here I suggest the package could really enhance value by adding data-handling functions that link the static data objects to the structure of what get_case_studies() returns, such as ways to flatten the lists that are elements of the return objects from that function. For instance, the return object from get_case_studies(ID = c(27,29)) is a 2 x 19 element tibble, but several of those columns (e.g. Continient) are variable length lists. Many users who are not experts in dissecting R objects are going to have trouble with the nesting of lists within data.frames.

In addition, by having the smaller objects as built-in data, the inputs to get_case_studies() can be checked for valid values, rather than relying on the API to reject a non-existent ID, for instance.

Look at shortening the "get all data" example in the vignette

devtools::install_github("perrystephenson/refimpact")
uoa_table <- refimpact::ref_get("ListUnitsOfAssessment")
tt <- lapply(uoa_table$ID, function(x) refimpact::ref_get("SearchCaseStudies", query = list(UoA = x)))

Helper functions to reshape dataset

The Maintainer field is missing in DESCRIPTION

Write vignette with some sort of text mining analysis

Run goodpractice::gp() on package and fix issues

Repo topics to review

👋 @perrystephenson

I've added GitHub repo topics to increase its discoverability

Please review and tweak until you're happy! 😺

use character vectors as input for get_case_studies()

Investigate whether I can use donttest instead of dontrun in example code

Add more detail to the ?refimpact help docs

Appveyor builds and webhooks

👋 @perrystephenson!

I'm currently looking for Appveyor "broken" hooks over all ropensci repos. refimpact's one is broken see https://github.com/ropensci/refimpact/settings/hooks/10055678 the latest deliveries failed so your latest commits weren't built on Appveyor.

I see builds are under your account which is fine. You'll need to contact Appveyor at [email protected] for asking them (how) to fix the webhook. I did that for other repos, they'll want to know this about the latest delivery:

X-GitHub-Delivery: 7b312b20-7266-11e7-8a00-03912236c505
Time 2017-07-27 02:56:53

Thanks!

ropensci-archive / refimpact Goto Github PK

refimpact's Introduction

refimpact

refimpact's People

Contributors

Stargazers

Watchers

refimpact's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs