carpentries-incubator / bioc-intro Goto Github PK

View Code? Open in Web Editor NEW

28.0 14.0 27.0 747.22 MB

Bioconductor data science introduction

Home Page: https://carpentries-incubator.github.io/bioc-intro

License: Other

R 68.05% Shell 0.04% TeX 31.91%

lesson carpentries-incubator bioconductor english r life-sciences beta hacktoberfest

bioc-intro's Introduction

Introduction to genomic data analysis with R and Bioconductor

Contributing

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of [issues][FIXME] for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon.

Look for the tag . This indicates that the maintainers will welcome a pull request fixing this issue.

Useful links

If you're going to be developing lesson material for the first time according to our design principles, consider reading the Carpentries Curriculum Development Handbook
Consult the Lesson Example website to find out more about working with the lesson template

Lesson team

This lesson has been developed and is current maintained by

Laurent Gatto (maintainer)
Charlotte Soneson
Jenny Drnevich
Robert Castelo
Kevin Rue-Albert

We would also like to acknowledge the contributions of:

Oliver Crook, Sarah Kaspar, Nick Hirschmueller, Lisa Breckels and Maria Doyle for their contributions during the Bioconductor introduction workshop in Heidelberg, as part of EuroBioc2021 |> 2022.
Axelle Loriot, Marco Chiapelle, Manon Martin and Toby Hodges for various contributions and discussions.
lmsimp, alorot, manonmartin, mchiapello, stavares843, JennyZadeh, csdaw, ninja-1337, fursham-h, lagerratrobe, fmichonneau, federicomarini, tobyhodges for pull requests.

If we have contributed but we missed you, apologies, and feel free to add yourself with a PR.

Authors

A list of contributors to the lesson can be found in AUTHORS

Citation

To cite this lesson, please consult with CITATION

Testing locally

To test locally, run the following in the lessons directory:

sandpaper::serve()

For more details, see the [workbench installation instructions](https://carpentries.github.io/workbench/#installation].

bioc-intro's People

Contributors

Stargazers

Watchers

bioc-intro's Issues

Typo in workshop title

The workshop title that is on the every page and every lesson on https://carpentries-incubator.github.io/bioc-intro/index.html has "analyis" misspelled. I am not sure where this lives or I would fix it myself.

Shorten visualisation chapter

I would like to suggest to shorten the visualisation chapter by cutting the parts that we seem to skip systematically, namely

the hexbin example
all the ggplot customisation

and replace the latter with a link to relevant material (a ggplot vignette or chapter of the ggplot2 book) and a link to a ggplot2 figure gallery (there was a question asking about this at the Heidelberg workshop).

Comments, suggestions?

@csoneson , what do you think?

Fig SE.png missing

Missing figure on https://carpentries-incubator.github.io/bioc-intro/60-next-steps/index.html:
Error in knitr::include_graphics("../fig/SE.png"): Cannot find the file(s): "../fig/SE.png"

SE.png is not in https://github.com/carpentries-incubator/bioc-intro/tree/main/fig

Migrate chapter 60_next_steps.Rmd

Rename next steps

During last teaching meeting, Fred suggested the following:

Should we rename ‘Next steps’ with something like ‘Working with Bioconductor’, especially when the intro course is considered as an intro for the domain specific courses (RNA-Seq, single cell, proteomics, …).

I think this makes sense, especially of we want to consider the three bioc-intro/rnaseq/project as a unit.

Unexpected graph generated with 40-visualization.Rmd

The line 872 from 40-visualization.Rmd is somehow not showing the right graph in the generated html. I think it rather prints a figure from chap 4 Starting with data (plot(sex)). I cannot figure out why it is behaving like this.

Move rds/rds serialisation to end

Move readRDS/saveRDS to after SEs (where it makes more sense than for data.frames) and drop save/load.

Hunt for typos in the Next steps episode

Read through the Next steps episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

make headings more accessible

Many of the episodes include section headings at the h1 level in the page source. To make the lesson more accessible for visitors using a screen reader, pages should contain only one top-level heading. This is provided by the lesson template, based on the title field in the Rmd metadata. All other sections should be h2 or below (i.e. ## section title in the Rmd source) - I started a branch, accessible-headings, which I will work on to adjust the headings in all the episodes so far.

Timings

Add lesson timings - to discuss with Axelle

Hunt for typos in the Manipulating and analyzing data episode

Read through the Manipulating and analyzing data with dplyr episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Workshop feedback

The bioc-intro workshop was organised at taught at the Center for Computational Biomedicine, Harvard Medical School, Boston on the 3 - 7 April 2023. A total of 35 people total registered and the attendance was 17 on day 1, 12 on day 2, 10 on day 3 and 8 on day 4.

Here's both feedback from the post-survey and some thoughts from the trainer and organiser, Chris Magnano and Jaclyn Mallard.

Post-Survey Feedback (n=7):

Overall, feedback from participants was almost universally positive. 5/7 felt the workshop exceeded expectations, with 1 saying it "met all" and 1 saying it "met most".
Most participants said that their skill level analyzing data with R and Bioconductor went from "Novice" to "Basic", which I would say is a good but not over-confident result from a single workshop.
When asked which parts of the workshop were most valuable almost everyone mentioned the data visualization lesson, the majority mentioned the manipulating data lesson, and about a third mentioned the tabular data lesson.
When asked what participants felt least confident about, there was no consensus, with single participants mentioning visualization, base R commands, and dplyr.
Finally, one participant mentioned that they would appreciate more resources for practicing ggplot after the workshop is finished.
A nice quote I wanted to share:

"I also love the training materials. Very organized and simple to follow and practical."

My thoughts

We gave this workshop in 4 3-hour sessions over zoom. The registration to no-show rate was fairly standard for other zoom workshops we and others at HMS have put on. We had some drop off from session to session, which was exacerbated by some folks during the first session who I believe just wanted to grab the materials, the sessions being very long for zoom sessions, and that we were hosting the sessions in the late afternoon right as we got our first days of warm, sunny Spring weather.
These are materials are well-made and very polished, with a nice balance of exercises throughout. Participants were engaged and really seemed to get a lot out of it. I thought that the spreadsheet data and the manipulating data lessons are especially well done.
I found the materials easy to prep for as an instructor, and the estimated lesson times were mostly spot on - I didn't feel like I had to skip anything due to running out of time.
It would be nice if the summary/schedule page was expanded to include some overall learning objectives and a description of the workshop, both for participants to see and for instructors to have a blub to advertise the workshop with.
It might also be useful to split some of the longer lessons up into multiple smaller lessons. The data manipulation lesson especially will probably be broken up by at least a break or two (I had to split it across multiple days). Choosing a good breakpoint or two to split the lesson up into 2-3 smaller lessons would make these points consistent between instructors and would allow some design around where participants might need a post-break refresher to get back into things.
Overall, this was a great workshop to teach and it hits on important skills for working with real-world data.

invert sections "Interacting with R" and "Getting set up" in chapter "R and RStudio"

I would rather invert the two sections "Interacting with R" and "Getting set up" to first start explaining how the console and scripts are working and then present the working directory, project management, etc. where R commands are used.

Do we need UCLouvain-CBIO/rWSBIM1207?

Is it necessary to install UCLouvain-CBIO/rWSBIM1207 for this lesson (in https://github.com/carpentries-incubator/bioc-intro/blob/main/learners/setup.md)? I don't think it is loaded anywhere (accessing the messy data set is done via GitHub).

We need to include the references

There is no reference file (.bib) present in the repo, the references are not pointing to an actual bibliography item. For instance @Zeeberg:2004 in "Starting with data".

Blob instead of raw file being downloaded

In the Starting with Data lesson, on line 74, 'https://github.com/carpentries-incubator/bioc-intro/blob/main/episodes/data/rnaseq.csv' is currently downloaded, which downloads the webpage instead of the datafile.

Instead, 'https://github.com/carpentries-incubator/bioc-intro/raw/main/episodes/data/rnaseq.csv' should be the download url to get the raw csv file.

Error building website

Instructions

Thanks for contributing! ❤️

If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!

If this issue is about a specific episode within a lesson, please provide its link or filename.

Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].

You may delete these instructions from your comment.

- The Carpentries

I got the following error after build the page:

chmod: cannot access '/opt/R/4.1.0/lib/R/library': No such file or directory

Any idea how to get rid of it?
M

Hunt for typos in the Data Visualisation data episode

Read through the Data Visualisation episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Hunt for typos in the Data Organisation episode

Read through the Data Organisation episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Migrate chapter 30-dplyr.Rmd

First mention of function

Replace

b <- sqrt(a)

with

b <- sqrt(4)

so as to make the code executable and/or provide more context for learners that work through the lesson on their own.

Hunt for typos in the Starting with R episode

Read through the Starting with R episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Migrate chapter 50-joining-tables.Rmd

annot1.csv and annot2.csv are missing

Found another problem: At the end of the https://carpentries-incubator.github.io/bioc-intro/30-dplyr/index.html, the codes say to read in data/annot1.csv and data/annot2.csv but I can't find anywhere in the workshop where the attendees have been told where to get these files. Am I missing it?

Data for data joins

Add links to bioc-project

Add links to specific bioc-project sections in the relevant intro sections.

Highlighting keywords in plain text

A minor comment but wouldn't it be a good idea to highlight important concepts/keywords in the plain text by putting words in bold rather than italics?

eg:

The basis of programming is that we write down instructions for the computer to follow, and then we tell the computer to follow those instructions. We write, or code, instructions in R because it is a common language that both the computer and we can understand. We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.

There are two main ways of interacting with R: by using the console or by using scripts (plain text files that contain your code). The console pane (in RStudio, the bottom left panel) is the place where commands written in the R language can be typed and executed immediately by the computer. It is also where the results will be shown for commands that have been executed. You can type commands directly into the console and press Enter to execute those commands, but they will be forgotten when you close the session.

Instead of:

The basis of programming is that we write down instructions for the computer to follow, and then we tell the computer to follow those instructions. We write, or code, instructions in R because it is a common language that both the computer and we can understand. We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.

There are two main ways of interacting with R: by using the console or by using scripts (plain text files that contain your code). The console pane (in RStudio, the bottom left panel) is the place where commands written in the R language can be typed and executed immediately by the computer. It is also where the results will be shown for commands that have been executed. You can type commands directly into the console and press Enter to execute those commands, but they will be forgotten when you close the session.

Migrate chapter 40-visualization.Rmd

Download data in visualisation lesson

tidyverse on SEs

A lesson to use tidyverse-style packages directly on SummarizedExperiment objects?

Suggested by Hugo at Bioc 2021.

Error in rename(., isoform = isoform_num)

Hi all,
I'm Kozo. I am a member of the CAB Multilingual Working Group.

In order to translate this repository using https://bioconductor.crowdin.com/ + https://github.com/bioconductor-translations/bioc-intro-translation ,
I'm converting the Rmd files in this repo to md with the following command.

library(rmarkdown)
render("50-joining-tables.Rmd", md_document())

Then I got the following error.

> render("50-joining-tables.Rmd", md_document())


processing file: 50-joining-tables.Rmd
  |..                                                                                                               |   2%
  ordinary text without R code

  |....                                                                                                             |   3%
label: unnamed-chunk-1 (with options) 
List of 1
 $ include: logi FALSE

  |......                                                                                                           |   5%
  ordinary text without R code

  |........                                                                                                         |   7%
label: jdrinstall0 (with options) 
List of 1
 $ include: logi FALSE

  |..........                                                                                                       |   8%
  ordinary text without R code

  |...........                                                                                                      |  10%
label: jdrinstall (with options) 
List of 1
 $ eval: logi FALSE

  |.............                                                                                                    |  12%
  ordinary text without R code

  |...............                                                                                                  |  14%
label: joindata
  |.................                                                                                                |  15%
  ordinary text without R code

  |...................                                                                                              |  17%
label: jdf1
  |.....................                                                                                            |  19%
  ordinary text without R code

  |.......................                                                                                          |  20%
label: jdf2
  |.........................                                                                                        |  22%
  ordinary text without R code

  |...........................                                                                                      |  24%
label: join1
  |.............................                                                                                    |  25%
  ordinary text without R code

  |...............................                                                                                  |  27%
label: unnamed-chunk-2
  |.................................                                                                                |  29%
  ordinary text without R code

  |..................................                                                                               |  31%
label: joinby
  |....................................                                                                             |  32%
  ordinary text without R code

  |......................................                                                                           |  34%
label: joinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |........................................                                                                         |  36%
  ordinary text without R code

  |..........................................                                                                       |  37%
label: unnamed-chunk-3 (with options) 
List of 6
 $ results  : chr "markup"
 $ fig.cap  : chr "An inner join matches pairs of observation matching in both tables, this dropping those that are unique to one "| __truncated__
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"

  |............................................                                                                     |  39%
  ordinary text without R code

  |..............................................                                                                   |  41%
label: unnamed-chunk-4 (with options) 
List of 6
 $ results  : chr "markup"
 $ fig.cap  : chr "Outer joins match observations that appear in at least on table, filling up missing values with `NA` values. Fi"| __truncated__
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"

  |................................................                                                                 |  42%
  ordinary text without R code

  |..................................................                                                               |  44%
label: leftjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |....................................................                                                             |  46%
  ordinary text without R code

  |......................................................                                                           |  47%
label: rightjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |........................................................                                                         |  49%
  ordinary text without R code

  |.........................................................                                                        |  51%
label: innerjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |...........................................................                                                      |  53%
  ordinary text without R code

  |.............................................................                                                    |  54%
label: jdf6
  |...............................................................                                                  |  56%
  ordinary text without R code

  |.................................................................                                                |  58%
label: multexple
  |...................................................................                                              |  59%
  ordinary text without R code

  |.....................................................................                                            |  61%
label: unnamed-chunk-5
  |.......................................................................                                          |  63%
  ordinary text without R code

  |.........................................................................                                        |  64%
label: multproblem
  |...........................................................................                                      |  66%
  ordinary text without R code

  |.............................................................................                                    |  68%
label: unnamed-chunk-6 (with options) 
List of 7
 $ results  : chr "markup"
 $ fig.cap  : chr "Joins with duplicated keys in both tables, producing all possible combinations. Figure taken from *R for Data Science*."
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"
 $ indent   : chr "> > "

  |...............................................................................                                  |  69%
  ordinary text without R code

  |................................................................................                                 |  71%
label: unnamed-chunk-7 (with options) 
List of 1
 $ indent: chr "> > "

  |..................................................................................                               |  73%
  ordinary text without R code

  |....................................................................................                             |  75%
label: unnamed-chunk-8
  |......................................................................................                           |  76%
  ordinary text without R code

  |........................................................................................                         |  78%
label: unnamed-chunk-9
  |..........................................................................................                       |  80%
  ordinary text without R code

  |............................................................................................                     |  81%
label: morekeys
  |..............................................................................................                   |  83%
  ordinary text without R code

  |................................................................................................                 |  85%
label: unnamed-chunk-10
Quitting from lines 316-321 (50-joining-tables.Rmd) 
Error in rename(., isoform = isoform_num) : 
  object 'isoform_num' not found

>

Can you think of anything that might have caused it?

{-} in headings

the {-} in the headings (eg in chapter "Starting with data") is not interpreted here to remove the section numbering, should we remove them?

References formatting

References aren't displayed properly.

Update SE next steps

SE was also very confusing, just very different from the rest of the material, and it was difficult for participants to conceptualise why/when this would be useful. Especially as the long format, as they were given, already contains all data. Maybe we could provide the 3 independent tables (matrix, col and row data) and they build the SE, which corresponds to reality. A summary exercise could then be to extract some counts for some features/sample from the SE and long table, and confirm they are identical.

Review how to install packages

As part of the Bioconductor material, we decided to use BiocManager::install() for package installation. For beginners, we still to first mention CRAN and install.packages() to bootstrap the use of BiocManager::install().

TODO: make sure this is reflected in the current intro chapters, et reminded in the next steps when mentioning Bioconductor.

Transition to workbench

Dear @zkamvar

When you get a chance, could you please help me transition this repository to the workbench format? I had a first go at it here. I didn't get very far due to limited time, without any plan for the legacy version to transition to workbench - your latest video indicates that there is a roadmap to help with this.

Many thanks in advance and do let me know if I can help.

Laurent

Can't find Bioc packages

From this PR (but it's not the first time this happens.

---
  Standard error:
  → Searching for and installing available dependencies
  The following packages are used in this project, but not available locally:
  
  	SummarizedExperiment, gridExtra, hexbin, knitr, lubridate,
  	patchwork, rmarkdown, tidySummarizedExperiment, tidyverse
  
  renv will attempt to download and install these packages.
  
  The following package(s) were not installed successfully:
  
  	[SummarizedExperiment]: package 'SummarizedExperiment' is not available
  	[tidySummarizedExperiment]: package 'tidySummarizedExperiment' is not available
  
  You may need to manually download and install these packages.
  
  ! Attempting to install missing packages assuming bioc
  ---
  Backtrace:
  1. sandpaper::manage_deps(path = wd, quiet = FALSE)
  2. callr::r(func = callr_manage_deps, args = args, show = !quiet, …
  3. callr:::get_result(output = out, options)
  4. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
  ---
  Subprocess backtrace:
  1. renv::install(paste0("bioc::", pkgs), library = renv_lib, project = path)
  2. renv:::retrieve(names(remotes))
  3. local handler(package, renv_retrieve_impl(package))
  4. renv:::renv_retrieve_impl(package)
  5. renv:::renv_available_packages_latest(package)
  6. renv:::stopf("package '%s' is not available", package)
  7. base::stop(sprintf(fmt, ...), call. = call.)
  8. | base::.handleSimpleError(function (e) …
  9. global h(simpleError(msg, call))
  Execution halted
  Error: Process completed with exit code 1.

@zkamvar do you have an idea?

Installing tidyverse package/s

I'm prepping to teach using this workshop and I've made a few minor changes so far, but I've come across a bigger one I wanted to discuss before unilaterally making the change...

At the end of the R & RStudio episode, package installation and loading via library() are first discussed, and I felt that was OK when I went over it. But then we don't use any add on packages until the beginning of the https://carpentries-incubator.github.io/bioc-intro/30-dplyr/index.html lesson. The first code line says to BiocManager::install("tidyverse") but if they have done the set up they should already have tidyverse and this will unnecessarily do a re-install (we've had lots of problems with permissions & installing packages as more people use OneDrive).

Of course, not everyone will have done the set up and will need to install. This got me to thinking that this could be a good place to cover the common Error in library("kslknsknls") : there is no package called ‘kslknsknls’ that everyone runs into all the time. We could flip them and start with the library(tidyverse), then have a little aside of "Did you get an error Error in library("tidyverse") : there is no package called ‘tidyverse’? This means you have not installed the package yet. To install the package, do ...

What do you think?

Hunt for typos in the R and RStudio episode

Read through the R and RStudio episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Provide basic info about packages

A short intro/section about packages in the R intro lesson, with longer and more formal sections (including different sources to install packages from) in @kevinrue.

SE: mention rowRanges

When introducing the SummarizedExperiement class and the main slots, also mention rowRanges.

Clashing filenames

When cloning the repository onto my Mac I got a warning about conflicting filenames:

Cloning into 'bioc-intro'...
remote: Enumerating objects: 226, done.
remote: Counting objects: 100% (226/226), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 226 (delta 65), reused 152 (delta 36), pack-reused 0
Receiving objects: 100% (226/226), 4.30 MiB | 1.76 MiB/s, done.
Resolving deltas: 100% (65/65), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'fig/Sorting_Example.png'
  'fig/sorting_example.png'

I took a brief glance at the two images and they appear to be identical. Can one of the files be removed?

Add figures illustrating pivot_longer and pivot_wider to dplyr episode

Mention R's new pipe operator

In the dplyr section, mention and show |>.

SSL error

Register Repositories
Using github PAT from envvar GITHUB_PAT
Error: Error: Failed to install 'vise' from GitHub:
  SSL peer certificate or SSH remote key was not OK: [api.github.com] SSL: no alternative certificate subject name matches target host name 'api.github.com'
Execution halted
Error: Process completed with exit code 1.

See https://github.com/carpentries-incubator/bioc-intro/actions/runs/5600064601/jobs/10241911505

Review Objectives/Questions/Keypoints

Suggestion:

Phrase key-points to answer the questions at the beginning.

More info/intro to Bioconductor

Mention Bioc in "intro to R" (e.g. BiocManager::install), clarify differences between tidyverse, base R and Bioconductor.

Hunt for typos in the Starting with Data episode

Read through the Starting with data episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Hunt for typos in the Joining tables data episode

Read through the Joining tables episode and fix typos.

NB: In this lesson, we will follow The Carpentries guidelines and use the British spelling.

Change in rnaseq.csv requiring addition of quote = "" to read in?

UPDATE: I figured out that I had just copied the link from my browser which had "blob" instead of "raw". You can decline the 2nd and 3rd pull requests I put in. The first one should be modified to change blob to raw in the url path. But still is better to read in the copy with this repo instead of our other bioconductor-teaching repo.

carpentries-incubator / bioc-intro Goto Github PK

bioc-intro's Introduction

Introduction to genomic data analysis with R and Bioconductor

Contributing

Useful links

Lesson team

Authors

Citation

Testing locally

bioc-intro's People

Contributors

Stargazers

Watchers

Forkers

bioc-intro's Issues

Post-Survey Feedback (n=7):

My thoughts

Recommend Projects

Recommend Topics

Recommend Org

Jobs