GithubHelp home page GithubHelp logo

karthik / bc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from swcarpentry/deprecated-bc

0.0 0.0 0.0 23.65 MB

Template for Software Carpentry bootcamp site repository.

License: Other

CSS 2.04% Shell 0.08% Python 78.28% R 14.92% TeX 3.76% JavaScript 0.92%

bc's Introduction

I am a research associate professor at UC Berkeley's Institute for Data Science. I'm passionate about making science more open and research software more visible. In 2011, I co-founded the rOpenSci project and served as its founding director through 2024. I am an advisor for The US Research Software Sustainability Institute, The UK Software Sustainability Institute (SSI), and an advisor to Open Journals. Lately, you can find me on BlueSky.

bc's People

Watchers

 avatar  avatar  avatar

bc's Issues

Improvement

+For a vector, `length(vector_name)` is just the total number of elements.
 +
 +
 +**What happens when you mix types?**
 +
 +R will create a resulting vector that is the least common denominator. The coercion will move towards the one that's easiest to coerce to.
 +
 +**Guess what the following do without running them first**
 +
 +```
 +xx <- c(1.7, "a") 
 +xx <- c(TRUE, 2) 
 +xx <- c("a", TRUE) 
 +```
 +
 +This is called implicit coercion.  You can also coerce vectors explicitly using the `as.<class_name>`. Example

Gavin says: Might be better to use as.<type_of> or as. as you risk confusing things here with classes in the OOP sense and also because class is not strictly correct when referring to the atomic types. Admittedly, it gets complex fast, because in say as.matrix() the matrix part is the class, there is no typeof "matrix".
Go with whatever you feel the students will be most comfortable with.

Unclear what files are supposed to be used & merged to SWC

It is not immediately clear what the structure of the bootcamp is, which files are main material and which are extraneous etc. For example, in ./R-basics there are two apparently key .Rmd files with naming convention 00-title.Rmd and 01-title.Rmd etc, but the remaining .Rmd files don't follow this.

The data-visualistion folder contains a LaTeX source with R code as an Rnw. - is this part of the code that needs to be merged with SWC as part of the pull request? Do you want this reviewing too at this stage?

chunk name Q1

In the knitr lesson:

read_chunk("shared_code.R")

This is usually done in an early chunk such as the first chunk of a document, and we can use the chunk Q1 later in the source document:

{r, chunk_name}

Is Q1 the hypothetical name of a code chunk in "shared_code.R"? If yes, then shouldn't the example below of how to use that chunk in the source document be:

{r, Q1}

Missing knitr example: 1 + 'a'

As we can see from the previous example, 1 + ’a’ should have stopped R because that is not a valid addition operation in R (a number + a string).

The example does not appear in the lesson before this sentence.

Length is not an attribute

((134 lines not shown))
 +# [1] 0
 +```
 +
 +
 +`NaN` means Not a number. it's an undefined value.
 +
 +```
 +0/0
 +NaN.
 +```
 +
 +Each object has an attribute. Attribues can be part of an object of R. These include 
 +
 +* names
 +* dimnames
 +* length

Gavin: Length` isn't an attribute, in the R sense, unlike the others you mention:

R> attributes(1:10)
NULL
R> attr(1:10, "length")
NULL
R> length(1:10)
[1] 10

You could add dimensions to the list instead, as it is an attribute of some objects, namely arrays via attribute dim

No error

+x <- 5
+rm(x)
+x

Rather than generate an error, how about rerun ls() to show it is gone (or even exists(x))

Describe package structure

Both the testing and documentation lectures mention R package structures. It would be useful to provide a basic description of R packages and their directory structure.

In testing.Rmd:

Save all tests in a folder. In the case of a package, the folder is located under
inst/tests. Otherwise save into a file that has the word test in the title. example. test_script.R

In documentation.md:

After this markup has been processed by roxygen2 you get a file with the same name and extension .Rd (R documentation) in the man/ folder.

Other valid chunk labels

In the knitr lesson, it mentions that there are multiple ways to write a chunk label but only displays one option.

Below are all valid ways to write chunk labels:

{r, label_name}

Technical error

+**Atomic Vectors**
 +A vector can be a vector of characters, logical, integers or numeric.
 +
 +Create an empty vector with `vector()`
 +
 +```
 +x <- vector()
 +# with a pre-defined length
 +x <- vector(length = 10)
 +# with a length and type
 +vector("character", length = 10)
 +vector("numeric", length = 10)
 +vector("integer", length = 10)
 +vector("logical", length = 10)
 +```
 +The general pattern is `vector(class of object, length)`.  You can also create vectors by concactenating them using the `c()` function.

In: data-structures.

Gavin: Technically, it is vector(mode of object, length) where mode of object is the atomic mode or a couple of other things ("list" or "expression" IIRC)

ask = FALSE

+Use old.packages() to keep track of what's out of date.
+update.packages() - with package name will update a single package. Otherwise it will update all interactively. This can take a while if you haven't done it recently. To update everything without any user intervention, use the ask = F argument.

This should be ask = FALSE, as you have in the code block below

lessons/misc-r/full-R-bootcamp/R-basics/01-basics-of-R.md

README for instructors to use material

Just to keep all the issues in one location, I pulled this from your recent email:

I’ll create an index of files (linking to each Rmd) in the root README for this material, with instructions for the instructors (guidelines on how to structure their individual bootcamps)

Can have attributes

+```
 + 1/0
 +# [1] Inf
 + 1/Inf
 +# [1] 0
 +```
 +
 +
 +`NaN` means Not a number. it's an undefined value.
 +
 +```
 +0/0
 +NaN.
 +```
 +
 +Each object has an attribute. Attribues can be part of an object of R. These include 

Gavin: Technically it is "can have attributes". The R Language definition says
All objects except NULL can have one or more attributes attached to them.
And "Attributes are of an R object."

Error

+
 +```
 +xx <- c(1.7, "a") 
 +xx <- c(TRUE, 2) 
 +xx <- c("a", TRUE) 
 +```
 +
 +This is called implicit coercion.  You can also coerce vectors explicitly using the `as.<class_name>`. Example
 +
 +```
 +as.numeric()
 +as.character()
 +```
 +
 +
 +When you coerce an existing numeric vector with `as.numeric()`, it does nothing.

in Data-structures.md

Gavin says: Technically, this is not true. 1:6 is an integer vector which as.numeric() will coerce to a double

R> x <- 0:6
R> identical(x, as.numeric(x))
[1] FALSE
R> typeof(x)
[1] "integer"
R> typeof(as.numeric(x))
[1] "double"

Explicit about single versus plural

Might be useful to be explicit here. Atomic vectors can contain only one type of object. Also, note the two in the list are plurals the other two aren't

In data-structures

Attribute

((197 lines not shown))
 +> 1 < "2"
 +[1] TRUE
 +> "1" > 2
 +[1] FALSE
 +> 
 +```
 +
 +## Matrix
 +
 +Matrices are a special vector in R. They are not a separate class of object but simply a vector but now with dimensions added on to it. Matrices have rows and columns. 
 +
 +```
 +m <- matrix(nrow = 2, ncol = 2)
 +m
 +dim(m)
 +same as 

Gavin:

# same as
Also, I'd leave this and the attributes() line off, unless you are making the point that dim really is an attribute?

Improvement

In: Data-structures

+`NaN` means Not a number. it's an undefined value.
 +
 +```
 +0/0
 +NaN.
 +```
 +
 +Each object has an attribute. Attribues can be part of an object of R. These include 
 +
 +* names
 +* dimnames
 +* length
 +* class
 +* attributes (contain metadata)
 +
 +For a vector, `length(vector_name)` is just the total number of elements.

Gavin: Arguably, it is for all vectors, even lists. You might want to clarify this a little.

data-manipulation/03-split-apply.Rmd

This file needs to be converted from Markdown to RMarkdown. Also, it attempts to read in a file, "data/baby-names2.csv.bz2", which is not found in the repository.

confusing sentence

In R-basics/02-data-structures.Rmd:

With and data frame, you can do `nrow(df)` and `ncol(df)`
rownames are usually 1..n.

unclassing a factor

In R-basics/02-data-structures.Rmd:

`unclass(x)` strips out the class information.

IMO this is too advanced for an introduction to R data structures. Could you provide an example so that the students could understand how it would be useful to them? I'd suggest replacing this with a demonstration of how to convert from a factor to a character string since that is a much more common operation:

as.character(x) converts a factor to a character vector.

Lower case y

In: lessons/misc-r/full-R-bootcamp/R-basics/01-basics-of-R.md

+Quitting R
+
+type in quit() or q() and answer Y to quit

Nit picky, but lower case y

mode = "logical"

In: data-structures:

+* tables
 +
 +
 +### Vectors
 +A vector is the most common and basic data structure in `R` and is pretty much the workhorse of R. Vectors can be of two types:
 +
 +* atomic vectors
 +* lists
 +
 +**Atomic Vectors**
 +A vector can be a vector of characters, logical, integers or numeric.
 +
 +Create an empty vector with `vector()`
 +
 +```
 +x <- vector()

Gavin:

Technically, this is a logical vector, as is the one in line 57, because the first argument to vector() is mode = "logical"

R> x <- vector()
R> is.logical(x)
[1] TRUE

Vectors versus lists,

I wonder here if you might delay the distinction between atomic vectors and lists? Also, you separated vectors and lists in the bullets above yet here you treat them under the same heading of Vectors. Technically this is correct, but not sure the novice needs to know this.

In: data-structures.

+R also has many data structures. These include
 +
 +* vector
 +* list
 +* matrix
 +* data frame
 +* factors
 +* tables
 +
 +
 +### Vectors

R-basics/exercise.md.Rmd

Is this supposed to be completed after lessons 01-basics-of-R and 02-data-structures? Exercises 2 and 3 deal with subsetting data frames, which has not yet been covered at this point. Similarly exercise 5 requires extracting info from a list using the name of the element (in the lesson you only retrieved information using the number of the element).

Then some minor issues:

  • Is this a Markdown or RMarkdown file? It has both extensions, but is in Mardown format.
  • The first line should be removed since those directions only apply to your past bootcamp.
  • The numbering of the exercises is not preserved when converting to html.

Missing byrow

+
 +```
 +m <- matrix(nrow = 2, ncol = 2)
 +m
 +dim(m)
 +same as 
 +attributes(m)
 +```
 +
 +Matrices are constructed columnwise. 
 +
 +```
 +m <- matrix(1:6, nrow=2, ncol =3)
 +```
 +
 +Other ways to construct a matrix

In: data-structures.md

Gavin:

The options here are missing the obvious one, to fill by rows using the byrow argument, an example of which would logically come immediately before this line.

Minor edit

Capitalise "How"


 +Create lists using `list` or coerce other objects using `as.list()`
 +
 +
 +```
 +x <- list(1, "a", TRUE, 1+4i)
 +```
 +
 +```
 +x <- 1:10
 +x <- as.list(x)
 +length(x)
 +```
 +
 +What is the class of `x[1]`?  
 +how about `x[[1]]`?

in Data-structures.md

Typo

+seq(1, 10, by = 0.1)
 +```
 +
 +**Other objects**
 +
 +`Inf` is infinity. You can have positive or negative infinity.
 +
 +```
 + 1/0
 +# [1] Inf
 + 1/Inf
 +# [1] 0
 +```
 +
 +
 +`NaN` means Not a number. it's an undefined value.

Gavin: Capitalise "It's"

Not usually, must

+| Example | Type |
+| ------- | ---- |
+| "a", "swc" | character |
+| 2, 15.5 | numeric |
+| 2 (usually add a L at end to denote integer) | integer |

In: 02-data-structures.md

Not usually, must

R> is.integer(2)
[1] FALSE
R> is.integer(2L)
[1] TRUE

regex in data-manipulation exercise

* Reads all file names in your working directory with a `.csv` extension.

I think this is too advanced. Requiring the students to look up how to write a regular expression in R (or convert a glob with glob2rx) will delay them from practicing the data manipulation skills that they just learned.

6 types

It has 6, you are missing the raw type. Appreciate you might not want to introduce this as it is unlikely that novices will use it, but if you leave it out, perhaps update the line to read
R has 6 basic atomic classes, the most commonly encountered being
and then you could leave off the complex type as again, most users are unlikely to come into contact with it.
Might also be useful here to explain what is meant by "atomic" - it'll come up again when the students look at matrices/arrays.

R has 5 basic atomic classes

Separate classes

+as.logical(x)
 +# both don't work
 +```
 +
 +**Sometimes there is implicit conversion**
 +```
 +> 1 < "2"
 +[1] TRUE
 +> "1" > 2
 +[1] FALSE
 +> 
 +```
 +
 +## Matrix
 +
 +Matrices are a special vector in R. They are not a separate class of object but simply a vector but now with dimensions added on to it. Matrices have rows and columns. 

Gavin: They are a separate class. Matrices (and arrays) are vectors with dimensions (i.e. a dim attribute)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.