GithubHelp home page GithubHelp logo

stufield / git-staa-577 Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 70.26 MB

Slides, code, cheat sheets, and RStudio lab notebooks for "Applied Machine Learning" course for Spring 2019

HTML 99.98% R 0.02%
machine-learning rstudio tidyverse r

git-staa-577's Introduction

GitHub Repository for STAA 577

Overview

RStudio lab notebooks, full R code, cheat sheets, resources, and ad hoc notes from “Applied Machine Learning” course Spring 2019.


Why use GitHub?

We have decided to place the course materials in a GitHub repository:

  1. to familiarize you with this widly used collaborative coding tool
  2. so that you will have access to them beyond your tenure at CSU when you venture into the official job market. Jenny Bryan and Jim Hester summarize the benefits of GitHub in this fantastic reference here:

If you ever plan to use verion control with GitHub I strongly recommend reading it in detail.


Course Lab Content

  • Intro Labs
    • Lab 00: Basic Exploring
    • Lab 01: Subsetting (data frames)
    • Lab 02: Data Wrangling with dplyr and the tidyverse
    • Lab 03: Skipped to synchronize course and textbook ISLR
  • Lab 04: Classification
    • The S&P Stock Market Data Set
    • Logistic Regression
    • Discriminant Analysis
    • KNN: K-Nearest Neighbors
  • Lab 05: Cross Validation
    • The Auto Data Set
    • Cross Validation (by hand)
    • LOOCV (leave-one-out)
    • K-fold CV
    • The Bootstrap
  • Lab 06: Subset Selection
    • The Hitters Data Set
    • Subset Selection
    • Shrinkage Methods: Ridge Regression
    • Shrinkage Methods: The Lasso
  • Lab 07: Beyond Linearity
    • The Wage Data Set
    • Polynomial Regression
    • Polynomial Logistic Regression
    • Spline Regression
    • General Additive Models
  • Lab 08: Tree-based Methods
    • The Carseats Data Set
    • Classification Trees
    • Regression Trees
    • Bagging
      • Random Forest
    • Boosting
    • Appendices
    • Resources
  • Lab 09: Support Vector Machines
    • Create training data
    • Support Vector Classifier
    • Support Vector Machine
    • ROC curves
  • Lab 10: Unsupervised Learning
    • Principal Component Analysis (PCA)
    • K-means Clustering
    • Heirarchial Clustering

Datasets for STAA 577

  • nyflights13
    • new york city airport flight data from 2013 (must install)
    • install with install.packages("nyflights13", repos="http://cran.rstudio.com")
  • iris
    • classic iris flower data set from Fisher (comes with R installed)
  • mtcars
    • mtcars: USA motor trend cannonical data set (comes with R installed)

Cheatsheets

Previewing HTML on GitHub

  • Fairly useful tool to preview HTML docs without having to clone the repository
  • Right-click the *.html file, copy the link, then go here, paste the GitHub specific HTML link

Sad But True

Stu’s Looping Rules for R

  1. Always use a vectorized solution over iteration when possible, otherwise … go to #2.
  2. Use a functional. Since R is a functional language and for readability, usually of the apply() family, or a loop-wrapper function, unless …
    • modifying in place: if you are modifying or transforming certain subsets (columns) of a data frame.
    • recursive problems: whenever an iteration depends on the previous iteration, a loop is better suited because a functional does not have access to variables outside the present lexical scope.
    • while loops: in problems where it is unknown how many iterations will be performed, while-loops are well suited and preferred over a functional.
  3. If you must use a loop, ensure the following:
    • Initialize new objects: prior to the loop, allocate the necessary space ahead of time. Do NOT “grow” a vector on-the-fly within a loop (this is terribly slow).
    • Optimize operations: do NOT perform operations inside the loop that could be done either up front of applied in a vectorized fashion following the loop. Enter the loop, do the bare minimum, then get out.

Hadley Wickham Links

Jenny Bryan’s Links

Max Kuhn’s Links

Modeling Framework (thx Max Kuhn)

Memory Usage and rsample:

The rsample package is smarter than you might think.

Vignettes

What is the Tidyverse?

Information about the:


Created on 2019-01-27 by Rmarkdown (v1.11) and R version 3.5.2 (2018-12-20).

git-staa-577's People

Contributors

stufield avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.