GithubHelp home page GithubHelp logo

pakillo / template Goto Github PK

View Code? Open in Web Editor NEW
168.0 10.0 21.0 1.03 MB

A template for data analysis projects structured as R packages (or not)

Home Page: http://pakillo.github.io/template/

R 100.00%
r workflow project-template research-compendium

template's Introduction

template package

Generic template for data analysis projects structured as R packages

R-CMD-check

HitCount since 2020-06-14

The template package automates creation of new projects with all the necessary scaffolding: different folders for data, scripts, and functions, plus (optionally) additional files required for an R package structure. It can simultaneously create and synchronise a new repository on GitHub so one can start working immediately.

template can create both projects with or without R package structure. Structuring data analysis projects as R packages (a.k.a. “research compendia”) can bring some advantages (e.g. see this blogpost, this repo, these and these slides or read Marwick et al.). But there are also good reasons why an R package structure may not always be needed or convenient.

Installation

# install.packages("remotes")
remotes::install_github("Pakillo/template")

Usage

First, load the package:

library("template")

Now run the function new_project to create a directory with all the scaffolding (slightly modified from R package structure). For example, to start a new project about tree growth, just use:

new_project("treegrowth")

This will create a new Rstudio project with this structure:

You can create a GitHub repository for the project at the same time:

new_project("treegrowth", github = TRUE, private.repo = FALSE)

You could choose either public or private repository. Note that to create a GitHub repo you will need to have configured your system as explained in https://usethis.r-lib.org/articles/articles/usethis-setup.html.

There are other options you could choose, like setting up testthat or continuous integration (Travis-CI, GitHub Actions…). Or skip R package structure altogether. See ?new_project for all options.

Developing the project

  1. Now edit README.Rmd and the DESCRIPTION file with some basic information about your project: title, brief description, licence, package dependencies, etc.

  2. Place original (raw) data in data-raw folder. Save all R scripts (or Rmarkdown documents) used for data preparation in the same folder.

  3. Save final (clean, tidy) datasets in the data folder. You may write documentation for these data.

  4. R scripts or Rmarkdown documents used for data analyses may be placed at the analyses folder. The final manuscript/report may be placed at the manuscript folder. You could use one of the many Rmarkdown templates available out there (e.g. rticles, rrtools or rmdTemplates).

  5. If you write custom functions, place them in the R folder. Document all your functions with Roxygen. Write tests for your functions and place them in the tests folder.

  6. If your analysis uses functions from other CRAN packages, include these as dependencies (Imports) in the DESCRIPTION file (e.g. using usethis::use_package() or rrtools::add_dependencies_to_description(). Also, use Roxygen @import or @importFrom in your function definitions, or alternatively package::function(), to import these dependencies in the namespace.

  7. I recommend using an advanced tool like targets to manage your project workflow. A simpler alternative might be writing a makefile or master script to organise and execute all parts of the analysis. A template makefile is included with this package (use makefile = TRUE when calling new_project).

  8. Render Rmarkdown reports using rmarkdown::render, and use Rstudio Build menu to create/update documentation, run tests, build package, etc.

  9. Record the exact dependencies of your project. One option is simply running sessionInfo() but many more sophisticated alternatives exist. For example, automagic::make_deps_file() or renv::snapshot() will create a file recording the exact versions of all packages used, which can be used to recreate such environment in the future or in another computer. If you want to use Docker, you could use e.g. containerit::dockerfile() or rrtools::use_dockerfile().

  10. Archive your repository (e.g. in Zenodo), get a DOI, and include citation information in your README.

Thanks to:

Links

template's People

Contributors

pakillo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

template's Issues

example?

I really like this template and have learned a lot by studying its organisation.

Do you have an example of this template being used for a real-world report or publication?

Consider the drake R package?

From the README:

Write a makefile or master script to organise and execute all parts of the analysis.

The drake R package is like Make, but for R-focused data science projects. It is not in the reproducibility guide you mentioned, but it is part of rOpenSci.

package versions

Hello,

This is a really good idea (i.e. using R packages to facilitate reproducibility) and I have been doing this for sometime. I would like to suggest an additional step and that is being able to replicate R version and package versions since, obviously, changes in package code/functions overtime can break your analysis code. Ideally we would be able to set a R and package version in the DESCRIPTION file, and I can remember some rather recent discussion concerning this on the Bioconductor Developer mailing list, but, as far as I understood, this is not possible at the moment (probably stemming from the fact that, even though they are a practical solution, R packages were not thought to serve the purpose we want to use them for). I have recently resolved this using the liftr package and docker. I made a 'quick and dirty' example at reproducibleAnalysis for you to have a look at if you would like. I don't know if this is the best solution but it may be a starting point to work from. Id also be glad to hear any comments you might have!

Kind Regards,
Jason

add makefile

makefile.R by now, just rendering Rmarkdown documents in appropriate order, or running R scripts, or calling functions (e.g. read data, make_figures, etc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.