GithubHelp home page GithubHelp logo

dlab-berkeley / r-machine-learning-legacy Goto Github PK

View Code? Open in Web Editor NEW
47.0 7.0 25.0 41.62 MB

D-Lab's 6 hour introduction to machine learning in R. Learn the fundamentals of machine learning, regression, and classification, using tidymodels in R.

License: Other

R 100.00%
machine-learning r data-science tidymodels classification regression rsample

r-machine-learning-legacy's Issues

Simplify and update renv package installation process

The renv can be simplified to remove unnecessary packages (to speed up (re-)installation) and replaced with an exact step-by-step process for people to follow. Some participants struggled with package installation as it is currently written.

https://github.com/dlab-berkeley/R-Machine-Learning/blob/ee158d8506d68e72498a36d7484a403ed5b4b506/README.md?plain=1#L43-L75

Also, install_github() should be replaced with standard install.packages() where possible, e.g. remotes::install_github("tidymodels/discrim") can just be install.packages("discrim") since the standard v1.0.0 package has been released now.

This caused additional issues because it assumes they have already run install.packages("remotes") because it is not explicitly stated in the instructions, so add that if there's a good reason to keep any install_github calls around.

Add datahub and binder links and icons

@pssachdeva I just noticed this repo is missing binerhub links in the README: https://github.com/dlab-berkeley/R-Machine-Learning/blob/ee158d8506d68e72498a36d7484a403ed5b4b506/README.md?plain=1#L81-L87

Any chance you could add it and test it in time for today's workshop at 10am since we have folks from partner organizations who may need a binderized version?

Also, it would be great if you can add the iconified buttons like what's shown in R Fundamentals:

image

Thanks!

Clarify installation workflow

From the instructions it is unclear which step is suppose to be performed first:

  1. The first line in Part1 renv::init()
  2. The 5th step in installation instructions install.packages(c("tidyverse", "tidymodels", "here","pROC","glmnet", "ranger", "rpart", "xgboost","rpart.plot", "doParallel", "palmerpenguins", "ISLR2", "klaR", "stacks"))

04.Decision_Trees.RMD `makeCluster()` function error

Running the part 4 .RMD locally, I'm getting an error on line 197 with:

cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl)

I believe this function optimizes a local environment for heavier computation? I've dug into the docs and arguments get deep into CS, beyond my scope at this time. It's possible to continue downstream in the workflow without these lines being run, which is what I've done so far.

Binder link option doesn't give Rstudio environment to run code

In the installation instructions the step:

If you don't have a Berkeley CalNet ID, you can still run these lessons in the cloud by clicking this button: Binder_link_here

This leads you to a Jupyter environment that cannot run code in the .Rmd lesson files.

I suggest we remove this option.

Use `renv` to ensure that attendees have the required packages

While the README contains instructions on installing the necessary packages for the workshop, it might be useful to have an renv setup so that users could simply run a renv::init() at the beginning of the first notebook in order to make sure all necessary packages are installed.

Datahub-Rstudio package loading issues

When I open this workshop in datahub via the gitpuller, and run the first line:

renv::init()

I get the error message:

> knitr::opts_chunk$set(echo = TRUE)
Error in loadNamespace(x) : there is no package called ‘knitr’

When I try to install knitr manually, it takes a very long time.

Error producing metrics and confusion plot

In lines 251-256 of 04-decision-trees.rmd these two plots run into errors. I think it's because tree_fit_viz_metr and tree_fit_viz_mat are not defined already. Here is the relevant code:

`# Metrics
(tree_fit_viz_metr + labs(title = "Non-tuned")) / (visualize_class_eval(tree_fit_tuned) + labs(title = "Tuned"))

Confusion matrix

(tree_fit_viz_mat + labs(title = "Non-tuned")) / (visualize_class_conf(tree_fit_tuned) + labs(title = "Tuned"))`

Solutions for 04_regularization

Solutions for 04_regularization should include:

Import data

penguins <- palmerpenguins::penguins %>%
filter(!is.na(bill_length_mm))

Set seed

set.seed(23)

Perform split

penguin_split <- penguins %>% initial_split(prop = 0.80)
penguins_train <- training(penguin_split)
penguins_test <- testing(penguin_split)

Participants will run into error if null values in bill_length_mm are not dropped

Workshop Title

Workshop title should be "R-Introduction-to-Machine-Learning-with-tidymodels"

Stacks Still broken

Error message

Error: The inputted candidates argument was not generated with the appropriate control

  • settings. Please see ?control_stack

recommendations for additional restructuring

I delivered the workshop with these current materials, and things went pretty smoothly, but it's too much material for two 3 hour workshops.

I'd recommend splitting this workshop into either 3 or 4 2-hour workshops. This could look something like:

  • Part 1: introduction and regression
  • Part 2: preprocessing
  • Part 3: regularization and cross-validation
  • Part 4: more models

Perhaps Parts 1/2 could be merged into a single workshop.

I'd also recommend beefing up the "more models" section to more appropriately fill a 2 hour slot: more details on logistic regression, naive bayes, and random forest

Typo in Part3

Line 70 in Part3.md should be set_mode rather than set_model.

Hard stop errors when running .RMD(s) on my local environment.

  • Overview-1.RMD
    needed to:

install.packages(‘rlang’)

before installation of all packages chunk, or else ggplot2 would not load.

  • 04- Decision Trees, ln. 196
    Error in makePSOCKcluster(names = spec, ...) : Cluster setup failed. 4 of 4 workers failed to connect.

Error did not break the .RMD workflow.

  • 05 - Random Forest
    update_model() function, same error as above

  • 06- xgboost
    update_model(), same error, ln. 351

  • 09 - hlclust

ln. 37:
Error: Problem with mutate() input ..1. x there is no package called ‘BBmisc’ℹ Input ..1 is across(is.numeric, BBmisc::normalize)

`install.packages("BBmisc")

library(BBmisc)

and np.

 

_

Participant Instructions to run 01_overview.RMD

It may help students and instructors if we include the "Participants Instructions" closer to the top of the read me, in large script with a message something like:

"If you are working in a local computing environment, be sure to prepare for the workshop by running all the code in the 01.Overview.RMD before our first meeting. Doing this could take a few minutes, and is needed for you to run the workshop's code on your own computer."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.