GithubHelp home page GithubHelp logo

erikaduan / abs_labour_force_report Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 5.89 MB

A minimal viable example of an R report generation workflow and CI/CD pipeline using ABS labour force open data

License: Creative Commons Zero v1.0 Universal

R 100.00%
r cicd github-actions

abs_labour_force_report's Introduction

ABS labour force report automation

This repository contains a minimal viable example of an R data visualisation and report generation workflow using ABS labour force open data.

The contents of this repository have been created to support the Automating R Markdown report generation - Part 2 tutorial in my r_tips repository.

Rmd tips

  • As referenced in this GitHub issue, path handling by rmarkdown::render() is currently not ideal as the output_dir argument creates an absolute path for rendered figures. This can be resolved by using xfun::in_dir("code", ...) to render inside .\code and then moving the outputs into .\output.

CI/CD automation tips

  • Use renv to manage package version and commit your renv.lock file with your repository. The renv package will automatically create a second .gitignore file in ~/renv, which prevents the private project library ~/renv/library from being committed.

  • Load the minimum set of packages required i.e. load dplyr instead of tidyverse if you are just performing simple data transformations and avoid using pacman::p_load().

  • The package renv uses static analysis to determine which packages are used i.e. by scanning your code for calls to library(pkg), require(pkg) or pkg::. Due to this functionality, avoid mapping package loading with lapply(packages, library, character.only = TRUE) as described here.

    # Recommended due to renv static analysis approach 
    library("here")  
    library("readr")  
    
    # Also recommmended for extra code reproducibility
    here::here(...)
    readr::read_csv(...)
    
    # Not recommended 
    packages <- c("here", "readr")
    invisible(lapply(packages, library, character.only = TRUE))
    
  • The pandoc package is not bundled with the rmarkdown package (pandoc is provided by RStudio) so the correct version of pandoc needs to be manually specified in the YAML pipeline.

    steps:
      # Checks out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2
    
      # Sets up pandoc which is required for knitting HTML reports  
      - uses: r-lib/actions/setup-pandoc@v2
        with:
          pandoc-version: '2.17.1' 
    
  • A virtual R environment needs to first be set up.

    steps:
      - name: Setup R version 4.1.2
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: '4.1.2' 
    
  • The template CI/CD code for using renv to install R package dependencies is found here, based on a GitHub actions renv cache issue recorded here.

    env:
        RENV_PATHS_ROOT: ~/.local/share/renv
    
    steps:
      # Set up R packages cache for workflow reruns 
      - name: Cache R packages
        uses: actions/cache@v1
        with:
           path: ${{ env.RENV_PATHS_ROOT }}
           key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
           restore-keys: |-
              ${{ runner.os }}-renv-
    
      # Install cURL to transfer data to virtual environment
      - run: sudo apt-get install -y --no-install-recommends libcurl4-openssl-dev
    
      # Install renv and project specific R packages 
      - name: Restore R packages
        shell: Rscript {0}
        run: |
          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
          renv::restore()
    
  • Write scripts that are self-contained. This means using one script to separately load all R libraries should be avoided, to minimise errors in case one job cannot access the outputs of another job.

  • I personally prefer running scripts as separate steps, for better job progress monitoring.

      # Execute R scripts
      - name: Extract data from ABS labour force data API
        run: Rscript code/01_extract_data.R
    
      - name: Clean raw labour force data
        run: Rscript code/02_clean_data.R  
    

abs_labour_force_report's People

Contributors

erikaduan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.