GithubHelp home page GithubHelp logo

kozo2 / bioc2020anno_japanese Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jmacdon/bioc2020anno

0.0 2.0 0.0 2.41 MB

Home Page: https://rpubs.com/kozo2/AnnotationWorkshop

Dockerfile 100.00%

bioc2020anno_japanese's Introduction

Introduction to Bioconductor annotation resources

Instructors

Workshop Description

There are various annotation packages provided by the Bioconductor project that can be used to incorporate additional information to results from high-throughput experiments. This can be as simple as mapping Ensembl IDs to corresponding HUGO gene symbols, to much more complex queries involving multiple data sources. In this workshop we will cover the various classes of annotation packages, what they contain, and how to use them efficiently.

Prerequisites

  • Basic knowledge of R syntax
  • Basic understanding of the various annotation sources (NCBI, EBI/EMBL)

Useful background reading

Preparation

To be able to follow along with this workshop, we have created a Docker installation that includes the devel version of R, and all required Bioconductor packages. In order to access this, you need to first install Docker on your own computer. Once you have done that, you can then load the Docker container for this workshop by starting Docker.

How you do this is dependent on your operating system.

Windows

Hit the 'Windows' key (lower left on your keyboard, between Ctrl and Alt), then type Docker. If Docker is installed you should see Docker App highlighted - click Enter to start the App. It can take some time to get started. You can see it's doing something by clicking on the little caret (^) in the lower right of your screen - there should be a little animated Docker icon which indicates it's starting. Once it is started, open a CMD prompt by hitting the Windows key again and typing cmd, then Enter. In the CMD prompt type

docker run -e PASSWORD=<choose_a_password_for_rstudio> -p 8787:8787 jmacdon/bioc2020anno

You can choose any password for rstudio - that's what you will use to log in. It will take some time for the Docker to be downloaded and started, so you might consider doing this ahead of time.

Linux

For Linux, it depends on how you installed. If you used a package installer then presumably Docker will be set to start automatically. Otherwise you need to start the Docker daemon by hand (or set it to start automatically). There are too many variables to give much detailed information here; for those on Linux, the assumption is that you probably know what you are doing and can figure it out from the Docker install page.

To start the daemon, if neccessary, do

sudo dockerd &
## followed by 

sudo docker run hello-world

If Docker is installed correctly it should print something informative. To get the Docker container, it's the same as for Windows.

docker run -e PASSWORD=<choose_a_password_for_rstudio> -p 8787:8787 jmacdon/bioc2020anno

MacOS

There is an installer for MacOS. As with Windows, follow the instructions - it's just a regular drag'n'drop install. Once it's installed and started, open a terminal prompt and as above type.

docker run -e PASSWORD=<choose_a_password_for_rstudio> -p 8787:8787 jmacdon/bioc2020anno

For all operating systems, once the Docker container is initialized, you can access it by opening a browser and typing

http://localhost:8787

Which should present you with an RStudio login. Use rstudio as the username and the password you used to start the Docker.

Workshop Participation

After each type of annotation package is introduced, students will be given the opportunity to practice making their own queries.

R / Bioconductor packages used

  • AnnotationDbi
  • AnnotationHub
  • BSgenome
  • biomaRt
  • ensembldb
  • org.Hs.eg.db
  • TxDb.Hsapiens.UCSC.hg19.knownGene
  • EnsDb.Hsapiens.v79
  • EnsDb.Mmusculus.v79
  • Homo.sapiens
  • BSgenome.Hsapiens.UCSC.hg19
  • hugene20sttranscriptcluster.db

Workshop goals and objectives

Annotating data is a complex task. For any high-throughput experiment the analyst usually starts with a set of identifiers for each thing that was measured, and in order to make the results useful to collaborators these identifiers need to be mapped to other identifiers that are either more familiar to collaborators, or that can be used for further analyses. As an example, RNA-Seq data may only have Entrez Gene IDs for each gene measured, and as part of the output you may want to include the gene symbols, which are more likely to be familiar to a Biologist.

Learning goals

  • Understand what sort of annotation data are available
  • Understand the difference between annotation sources (NCBI and EBI/EMBL)
  • Gain familiarity with the various ways to query annotation packages
  • Get some practice making queries

Learning objectives

  • Be able to use select and mapIds to map between identifiers
  • Be able to extract data from TxDb and EnsDb packages
  • Be able to make queries using biomaRt
  • Extract and utilize various data from AnnotationHub

bioc2020anno_japanese's People

Contributors

j-lawson avatar jmacdon avatar kozo2 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.