GithubHelp home page GithubHelp logo

carpentries-incubator / bioc-intro Goto Github PK

View Code? Open in Web Editor NEW
28.0 14.0 27.0 747.22 MB

Bioconductor data science introduction

Home Page: https://carpentries-incubator.github.io/bioc-intro

License: Other

R 68.05% Shell 0.04% TeX 31.91%
lesson carpentries-incubator bioconductor english r life-sciences beta hacktoberfest

bioc-intro's Introduction

Introduction to genomic data analysis with R and Bioconductor

Create a Slack Account with us

Contributing

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of [issues][FIXME] for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon.

Look for the tag good_first_issue. This indicates that the maintainers will welcome a pull request fixing this issue.

Useful links

Lesson team

This lesson has been developed and is current maintained by

  • Laurent Gatto (maintainer)
  • Charlotte Soneson
  • Jenny Drnevich
  • Robert Castelo
  • Kevin Rue-Albert

We would also like to acknowledge the contributions of:

  • Oliver Crook, Sarah Kaspar, Nick Hirschmueller, Lisa Breckels and Maria Doyle for their contributions during the Bioconductor introduction workshop in Heidelberg, as part of EuroBioc2021 |> 2022.
  • Axelle Loriot, Marco Chiapelle, Manon Martin and Toby Hodges for various contributions and discussions.
  • lmsimp, alorot, manonmartin, mchiapello, stavares843, JennyZadeh, csdaw, ninja-1337, fursham-h, lagerratrobe, fmichonneau, federicomarini, tobyhodges for pull requests.

If we have contributed but we missed you, apologies, and feel free to add yourself with a PR.

Authors

A list of contributors to the lesson can be found in AUTHORS

Citation

To cite this lesson, please consult with CITATION

Testing locally

To test locally, run the following in the lessons directory:

sandpaper::serve()

For more details, see the [workbench installation instructions](https://carpentries.github.io/workbench/#installation].

bioc-intro's People

Contributors

actions-user avatar almutlue avatar chiasinl avatar csdaw avatar csmagnano avatar csoneson avatar federicomarini avatar fmichonneau avatar jdrnevich avatar lagerratrobe avatar lgatto avatar lmsimp avatar manonmartin avatar mblue9 avatar mchiapello avatar ninja-1337 avatar ococrook avatar stavares843 avatar tobyhodges avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bioc-intro's Issues

Shorten visualisation chapter

I would like to suggest to shorten the visualisation chapter by cutting the parts that we seem to skip systematically, namely

  • the hexbin example
  • all the ggplot customisation

and replace the latter with a link to relevant material (a ggplot vignette or chapter of the ggplot2 book) and a link to a ggplot2 figure gallery (there was a question asking about this at the Heidelberg workshop).

Comments, suggestions?

@csoneson , what do you think?

Rename next steps

During last teaching meeting, Fred suggested the following:

Should we rename ‘Next steps’ with something like ‘Working with Bioconductor’, especially when the intro course is considered as an intro for the domain specific courses (RNA-Seq, single cell, proteomics, …).

I think this makes sense, especially of we want to consider the three bioc-intro/rnaseq/project as a unit.

Unexpected graph generated with 40-visualization.Rmd

The line 872 from 40-visualization.Rmd is somehow not showing the right graph in the generated html. I think it rather prints a figure from chap 4 Starting with data (plot(sex)). I cannot figure out why it is behaving like this.

make headings more accessible

Many of the episodes include section headings at the h1 level in the page source. To make the lesson more accessible for visitors using a screen reader, pages should contain only one top-level heading. This is provided by the lesson template, based on the title field in the Rmd metadata. All other sections should be h2 or below (i.e. ## section title in the Rmd source) - I started a branch, accessible-headings, which I will work on to adjust the headings in all the episodes so far.

Timings

Add lesson timings - to discuss with Axelle

Workshop feedback

The bioc-intro workshop was organised at taught at the Center for Computational Biomedicine, Harvard Medical School, Boston on the 3 - 7 April 2023. A total of 35 people total registered and the attendance was 17 on day 1, 12 on day 2, 10 on day 3 and 8 on day 4.

Here's both feedback from the post-survey and some thoughts from the trainer and organiser, Chris Magnano and Jaclyn Mallard.

Post-Survey Feedback (n=7):

  • Overall, feedback from participants was almost universally positive. 5/7 felt the workshop exceeded expectations, with 1 saying it "met all" and 1 saying it "met most".

  • Most participants said that their skill level analyzing data with R and Bioconductor went from "Novice" to "Basic", which I would say is a good but not over-confident result from a single workshop.

  • When asked which parts of the workshop were most valuable almost everyone mentioned the data visualization lesson, the majority mentioned the manipulating data lesson, and about a third mentioned the tabular data lesson.

  • When asked what participants felt least confident about, there was no consensus, with single participants mentioning visualization, base R commands, and dplyr.

  • Finally, one participant mentioned that they would appreciate more resources for practicing ggplot after the workshop is finished.

  • A nice quote I wanted to share:

"I also love the training materials. Very organized and simple to follow and practical."

My thoughts

  • We gave this workshop in 4 3-hour sessions over zoom. The registration to no-show rate was fairly standard for other zoom workshops we and others at HMS have put on. We had some drop off from session to session, which was exacerbated by some folks during the first session who I believe just wanted to grab the materials, the sessions being very long for zoom sessions, and that we were hosting the sessions in the late afternoon right as we got our first days of warm, sunny Spring weather.

  • These are materials are well-made and very polished, with a nice balance of exercises throughout. Participants were engaged and really seemed to get a lot out of it. I thought that the spreadsheet data and the manipulating data lessons are especially well done.

  • I found the materials easy to prep for as an instructor, and the estimated lesson times were mostly spot on - I didn't feel like I had to skip anything due to running out of time.

  • It would be nice if the summary/schedule page was expanded to include some overall learning objectives and a description of the workshop, both for participants to see and for instructors to have a blub to advertise the workshop with.
    It might also be useful to split some of the longer lessons up into multiple smaller lessons. The data manipulation lesson especially will probably be broken up by at least a break or two (I had to split it across multiple days). Choosing a good breakpoint or two to split the lesson up into 2-3 smaller lessons would make these points consistent between instructors and would allow some design around where participants might need a post-break refresher to get back into things.

  • Overall, this was a great workshop to teach and it hits on important skills for working with real-world data.

We need to include the references

There is no reference file (.bib) present in the repo, the references are not pointing to an actual bibliography item. For instance @Zeeberg:2004 in "Starting with data".

Error building website

Instructions

Thanks for contributing! ❤️

If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!

If this issue is about a specific episode within a lesson, please provide its link or filename.

Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].

You may delete these instructions from your comment.

- The Carpentries

I got the following error after build the page:

chmod: cannot access '/opt/R/4.1.0/lib/R/library': No such file or directory

Any idea how to get rid of it?
M

First mention of function

Replace

b <- sqrt(a)

with

b <- sqrt(4)

so as to make the code executable and/or provide more context for learners that work through the lesson on their own.

Highlighting keywords in plain text

A minor comment but wouldn't it be a good idea to highlight important concepts/keywords in the plain text by putting words in bold rather than italics?

eg:

The basis of programming is that we write down instructions for the computer to follow, and then we tell the computer to follow those instructions. We write, or code, instructions in R because it is a common language that both the computer and we can understand. We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.

There are two main ways of interacting with R: by using the console or by using scripts (plain text files that contain your code). The console pane (in RStudio, the bottom left panel) is the place where commands written in the R language can be typed and executed immediately by the computer. It is also where the results will be shown for commands that have been executed. You can type commands directly into the console and press Enter to execute those commands, but they will be forgotten when you close the session.

Instead of:

The basis of programming is that we write down instructions for the computer to follow, and then we tell the computer to follow those instructions. We write, or code, instructions in R because it is a common language that both the computer and we can understand. We call the instructions commands and we tell the computer to follow the instructions by executing (also called running) those commands.

There are two main ways of interacting with R: by using the console or by using scripts (plain text files that contain your code). The console pane (in RStudio, the bottom left panel) is the place where commands written in the R language can be typed and executed immediately by the computer. It is also where the results will be shown for commands that have been executed. You can type commands directly into the console and press Enter to execute those commands, but they will be forgotten when you close the session.

tidyverse on SEs

A lesson to use tidyverse-style packages directly on SummarizedExperiment objects?

Suggested by Hugo at Bioc 2021.

Error in rename(., isoform = isoform_num)

Hi all,
I'm Kozo. I am a member of the CAB Multilingual Working Group.

In order to translate this repository using https://bioconductor.crowdin.com/ + https://github.com/bioconductor-translations/bioc-intro-translation ,
I'm converting the Rmd files in this repo to md with the following command.

library(rmarkdown)
render("50-joining-tables.Rmd", md_document())

Then I got the following error.

> render("50-joining-tables.Rmd", md_document())


processing file: 50-joining-tables.Rmd
  |..                                                                                                               |   2%
  ordinary text without R code

  |....                                                                                                             |   3%
label: unnamed-chunk-1 (with options) 
List of 1
 $ include: logi FALSE

  |......                                                                                                           |   5%
  ordinary text without R code

  |........                                                                                                         |   7%
label: jdrinstall0 (with options) 
List of 1
 $ include: logi FALSE

  |..........                                                                                                       |   8%
  ordinary text without R code

  |...........                                                                                                      |  10%
label: jdrinstall (with options) 
List of 1
 $ eval: logi FALSE

  |.............                                                                                                    |  12%
  ordinary text without R code

  |...............                                                                                                  |  14%
label: joindata
  |.................                                                                                                |  15%
  ordinary text without R code

  |...................                                                                                              |  17%
label: jdf1
  |.....................                                                                                            |  19%
  ordinary text without R code

  |.......................                                                                                          |  20%
label: jdf2
  |.........................                                                                                        |  22%
  ordinary text without R code

  |...........................                                                                                      |  24%
label: join1
  |.............................                                                                                    |  25%
  ordinary text without R code

  |...............................                                                                                  |  27%
label: unnamed-chunk-2
  |.................................                                                                                |  29%
  ordinary text without R code

  |..................................                                                                               |  31%
label: joinby
  |....................................                                                                             |  32%
  ordinary text without R code

  |......................................                                                                           |  34%
label: joinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |........................................                                                                         |  36%
  ordinary text without R code

  |..........................................                                                                       |  37%
label: unnamed-chunk-3 (with options) 
List of 6
 $ results  : chr "markup"
 $ fig.cap  : chr "An inner join matches pairs of observation matching in both tables, this dropping those that are unique to one "| __truncated__
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"

  |............................................                                                                     |  39%
  ordinary text without R code

  |..............................................                                                                   |  41%
label: unnamed-chunk-4 (with options) 
List of 6
 $ results  : chr "markup"
 $ fig.cap  : chr "Outer joins match observations that appear in at least on table, filling up missing values with `NA` values. Fi"| __truncated__
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"

  |................................................                                                                 |  42%
  ordinary text without R code

  |..................................................                                                               |  44%
label: leftjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |....................................................                                                             |  46%
  ordinary text without R code

  |......................................................                                                           |  47%
label: rightjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |........................................................                                                         |  49%
  ordinary text without R code

  |.........................................................                                                        |  51%
label: innerjoinex1 (with options) 
List of 1
 $ indent: chr "> > "

  |...........................................................                                                      |  53%
  ordinary text without R code

  |.............................................................                                                    |  54%
label: jdf6
  |...............................................................                                                  |  56%
  ordinary text without R code

  |.................................................................                                                |  58%
label: multexple
  |...................................................................                                              |  59%
  ordinary text without R code

  |.....................................................................                                            |  61%
label: unnamed-chunk-5
  |.......................................................................                                          |  63%
  ordinary text without R code

  |.........................................................................                                        |  64%
label: multproblem
  |...........................................................................                                      |  66%
  ordinary text without R code

  |.............................................................................                                    |  68%
label: unnamed-chunk-6 (with options) 
List of 7
 $ results  : chr "markup"
 $ fig.cap  : chr "Joins with duplicated keys in both tables, producing all possible combinations. Figure taken from *R for Data Science*."
 $ echo     : logi FALSE
 $ purl     : logi FALSE
 $ out.width: chr "70%"
 $ fig.align: chr "center"
 $ indent   : chr "> > "

  |...............................................................................                                  |  69%
  ordinary text without R code

  |................................................................................                                 |  71%
label: unnamed-chunk-7 (with options) 
List of 1
 $ indent: chr "> > "

  |..................................................................................                               |  73%
  ordinary text without R code

  |....................................................................................                             |  75%
label: unnamed-chunk-8
  |......................................................................................                           |  76%
  ordinary text without R code

  |........................................................................................                         |  78%
label: unnamed-chunk-9
  |..........................................................................................                       |  80%
  ordinary text without R code

  |............................................................................................                     |  81%
label: morekeys
  |..............................................................................................                   |  83%
  ordinary text without R code

  |................................................................................................                 |  85%
label: unnamed-chunk-10
Quitting from lines 316-321 (50-joining-tables.Rmd) 
Error in rename(., isoform = isoform_num) : 
  object 'isoform_num' not found

> 

Can you think of anything that might have caused it?

{-} in headings

the {-} in the headings (eg in chapter "Starting with data") is not interpreted here to remove the section numbering, should we remove them?

Update SE next steps

SE was also very confusing, just very different from the rest of the material, and it was difficult for participants to conceptualise why/when this would be useful. Especially as the long format, as they were given, already contains all data. Maybe we could provide the 3 independent tables (matrix, col and row data) and they build the SE, which corresponds to reality. A summary exercise could then be to extract some counts for some features/sample from the SE and long table, and confirm they are identical.

Review how to install packages

As part of the Bioconductor material, we decided to use BiocManager::install() for package installation. For beginners, we still to first mention CRAN and install.packages() to bootstrap the use of BiocManager::install().

TODO: make sure this is reflected in the current intro chapters, et reminded in the next steps when mentioning Bioconductor.

Transition to workbench

Dear @zkamvar

When you get a chance, could you please help me transition this repository to the workbench format? I had a first go at it here. I didn't get very far due to limited time, without any plan for the legacy version to transition to workbench - your latest video indicates that there is a roadmap to help with this.

Many thanks in advance and do let me know if I can help.

Laurent

Can't find Bioc packages

From this PR (but it's not the first time this happens.

---
  Standard error:
  → Searching for and installing available dependencies
  The following packages are used in this project, but not available locally:
  
  	SummarizedExperiment, gridExtra, hexbin, knitr, lubridate,
  	patchwork, rmarkdown, tidySummarizedExperiment, tidyverse
  
  renv will attempt to download and install these packages.
  
  The following package(s) were not installed successfully:
  
  	[SummarizedExperiment]: package 'SummarizedExperiment' is not available
  	[tidySummarizedExperiment]: package 'tidySummarizedExperiment' is not available
  
  You may need to manually download and install these packages.
  
  ! Attempting to install missing packages assuming bioc
  ---
  Backtrace:
  1. sandpaper::manage_deps(path = wd, quiet = FALSE)
  2. callr::r(func = callr_manage_deps, args = args, show = !quiet, …
  3. callr:::get_result(output = out, options)
  4. callr:::throw(callr_remote_error(remerr, output), parent = fix_msg(remerr[[3]]))
  ---
  Subprocess backtrace:
  1. renv::install(paste0("bioc::", pkgs), library = renv_lib, project = path)
  2. renv:::retrieve(names(remotes))
  3. local handler(package, renv_retrieve_impl(package))
  4. renv:::renv_retrieve_impl(package)
  5. renv:::renv_available_packages_latest(package)
  6. renv:::stopf("package '%s' is not available", package)
  7. base::stop(sprintf(fmt, ...), call. = call.)
  8. | base::.handleSimpleError(function (e) …
  9. global h(simpleError(msg, call))
  Execution halted
  Error: Process completed with exit code 1.

@zkamvar do you have an idea?

Installing tidyverse package/s

I'm prepping to teach using this workshop and I've made a few minor changes so far, but I've come across a bigger one I wanted to discuss before unilaterally making the change...

At the end of the R & RStudio episode, package installation and loading via library() are first discussed, and I felt that was OK when I went over it. But then we don't use any add on packages until the beginning of the https://carpentries-incubator.github.io/bioc-intro/30-dplyr/index.html lesson. The first code line says to BiocManager::install("tidyverse") but if they have done the set up they should already have tidyverse and this will unnecessarily do a re-install (we've had lots of problems with permissions & installing packages as more people use OneDrive).

Of course, not everyone will have done the set up and will need to install. This got me to thinking that this could be a good place to cover the common Error in library("kslknsknls") : there is no package called ‘kslknsknls’ that everyone runs into all the time. We could flip them and start with the library(tidyverse), then have a little aside of "Did you get an error Error in library("tidyverse") : there is no package called ‘tidyverse’? This means you have not installed the package yet. To install the package, do ...

What do you think?

SE: mention rowRanges

When introducing the SummarizedExperiement class and the main slots, also mention rowRanges.

Clashing filenames

When cloning the repository onto my Mac I got a warning about conflicting filenames:

Cloning into 'bioc-intro'...
remote: Enumerating objects: 226, done.
remote: Counting objects: 100% (226/226), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 226 (delta 65), reused 152 (delta 36), pack-reused 0
Receiving objects: 100% (226/226), 4.30 MiB | 1.76 MiB/s, done.
Resolving deltas: 100% (65/65), done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'fig/Sorting_Example.png'
  'fig/sorting_example.png'

I took a brief glance at the two images and they appear to be identical. Can one of the files be removed?

More info/intro to Bioconductor

Mention Bioc in "intro to R" (e.g. BiocManager::install), clarify differences between tidyverse, base R and Bioconductor.

Change in rnaseq.csv requiring addition of quote = "" to read in?

UPDATE: I figured out that I had just copied the link from my browser which had "blob" instead of "raw". You can decline the 2nd and 3rd pull requests I put in. The first one should be modified to change blob to raw in the url path. But still is better to read in the copy with this repo instead of our other bioconductor-teaching repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.