GithubHelp home page GithubHelp logo

sxt1229 / r-large-scale Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rcc-uchicago/r-large-scale

0.0 1.0 0.0 57.49 MB

Materials for RCC workshop, "Large-scale data analysis in R."

License: Other

R 42.62% Shell 6.20% Python 33.71% Makefile 12.45% C++ 5.03%

r-large-scale's Introduction

Large-scale data analysis in R

The R computing environment has become an important tool for quantitative research, from computational biology to financial modeling. In this hands-on workshop, we will explore commonly used strategies to efficiently analyze large-scale data sets in R. Participants will learn to automate their R analyses on a compute cluster, profile memory usage, call fast C++ routines in R, and implement simple parallelization strategies, including multithreaded and distributed computing. The aim is to learn these techniques through hands-on "live coding"; we will analyze several medium to large-scale data sets. Objectives: Attendees will (1) learn how to automate R analyses on a compute cluster; (2) use simple techniques to profile memory usage in R; (3) learn how to make more effective use of memory in R; (4) use multithreading to speed up R computations; (5) learn how to call C++ code from R using Rcpp; (6) write scripts to distribute "embarrassingly parallel" R computations using the Slurm job scheduler on the RCC Midway compute cluster; (7) learn through "live coding."

Prerequistes

All participants are expected to bring a laptop with a Mac, Linux or Windows operating system. Further, participants should be comfortable interacting with the UNIX shell and programming in a non-graphical R environment (not RStudio). An RCC user account is recommended, but not required.

What's included

This git repository (the "workshop packet") includes:

  • README.md: This file.

  • conduct.md: Code of Conduct.

  • LICENSE.md: License information for the materials in this repository.

  • slides.pdf: The slides for the workshop.

  • slides.Rmd: R Markdown source used to generate these slides.

  • Makefile: GNU Makefile containing commands to generate the slides from the R Markdown source.

Other information

Credits

These materials were developed by Peter Carbonetto at the University of Chicago. Thank you to Matthew Stephens for his support and guidance. Also thank you to Gao Wang for sharing the Python script for profiling memory usage.

r-large-scale's People

Contributors

pcarbo avatar satejsoman avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.