GithubHelp home page GithubHelp logo

juliomys / bio720 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dworkinlab/bio720

0.0 0.0 0.0 41.69 MB

Repository For graduate class (Practical introduction to Bioinformatics & Genomics). McMaster University

R 0.75% Shell 0.08% HTML 99.17%

bio720's Introduction

Bio720

Course Summary

This is my (ID) github repo for students for both Bio720 (introduction to computational skills for biologists) and BIO722 (practical introduction to bioinformatic and genomic computational skills). The reason I have these together is mostly history, but also utility. Those in BIO722 may need some reminders, so there are other useful links here.This is taught through the Biology Department at McMaster University, but most course materials are freely available to anyone interested.

Please note, other instructors (Dr. Golding and Dr. Evans) may not put their material up here. So this may only be the parts of each class taught by ID.

please note scripts for BIO722 are here, but most of the readme below is about BIO720.

Instructors

Dr. Brian Golding

Dr. Ben Evans

Dr. Ian Dworkin

Class time and location

BIO722, Wednesdays 1:45 - 3:45, MS TEAMS

Background assumed for students for Bio720

For this class we are not assuming students have background in programming/scripting, nor in bioinformatics. We do assume that students have a working knowledge of basic molecular biology and genetics and have basic familiarity with using computers. i.e. you can figure out how to install basic software on a Mac (OS X) or a PC (Windows).

Background assumed for students for BIO722.

That you have taken BIO720 or equivalent. Are comfortable at the UNIX shell, with at least one scripting language and able to learn a second. Know how to remotely access computers (ssh etc).

What students will need

A laptop with internet access and the ability to install several programs (in particular, R, Python and a shell emulator if not using a Mac (OS X) or linux.

Course goals

The primary goal of this course it to provide graduate students an opportunity to develop fundamental computational skills necessary to go on and (in the future) develop the appropriate (and more advanced) skills for bioinformatics, genomics, etc.

What this course is not

Because of limitations of time (one two hour lecture a week for 13-14 weeks), we are purposefully making this a course about fundamental skills. As such, this course will not cover in any detail:

- Genomic analysis pipelines (RNAseq, variant calling and populations genomics). These are covered in the winter-spring in Bio722.

- Theory of computer science (nor theory on programming, algorithms, data structures etc).

- Despite using `R` for much of this course, it is most definitely not a statistics course. Bio708 (taught by Dr. Ben Bolker and Dr. Jonathan Dushoff) is such a course (also using R as the primary programming environmental for statistical modeling.)

- A bioinformatics class (i.e. we will not teach any conceptual or theoretical background in bioinformatics. All examples will be real examples, but mostly to illustrate the computational skills necessary to run an analysis, not the why).

Learning Objectives

Topics (some TBD)

It is important to note that in order to keep things flexible depending on how things go with the class, these topics are subject to change if necessary. We will discuss in class.

A. Introduction to UNIX and the command line. (Brian)

  1. Introduction to basic shell commands, logging onto remote systems
  2. Standard UNIX utilities that make your day to day computer work (and bioinformatics) easier.
  3. using pipes in UNIX (and the model of streaming data), batch processing of data.
  4. Writing shell tools.
  5. Using your UNIX skills for practical bioinformatic problems (probably setting up a BLAST database, and querying some sequences)
  6. (maybe) Regular expressions are you friend. No really. Using grep and its variants (i.e. agrep) and sed and awk for file manipulation and processing

B. Fundamentals of programming using R(Ian). Link to R portal for class

  1. Fundamentals of programming in R.
  2. How to avoid repetitive strain injury while programming. Control flow in R (for loops, if else, etc). Using the apply family of functions in R. Simple simulations.
  3. Working with data in R. Getting data in. Data munging (subsetting, merging, cleaning). Working with strings in R.
  4. Basics of plotting in R. Other topics TBD.
  5. Reproducible research using markdown for reports and git for version control.
  6. An Introduction to bioinformatic tools in R. Primarily an introduction to BioConductor, and genomic range data.

C. This will likely not be taught this year Fundamentals of program using python.

Learning outcomes

After successfully completing this course you will:

- Have a much higher degree of comfort using your computer!

- Be able to write custom UNIX shell scripts to do file copying, moving, editing, parsing and manipulation.

- Be able to write simple R programs to do simple simulations, data parsing (munging), plotting.

- Be able to perform computationally reproducible research, and use version control on your source code.

- Be able to utilize genomic range data and incorporate simple genomic features.

- Understand the fundamental framework of UNIX programs, scripting and why streaming data is so useful for genomics and bioinformatics.

- Know that troubleshooting for installing and using programs, and troubleshooting when writing and using code are normal. You will have developed some tenacity in dealing with such issues and have some ideas on how to approach finding solutions (including your *google-fu*).

Recommended books.

You are responsible for ordering your own copies of these books. Both are excellent with only a small amount of overlap, but we are only highly recommending the first book (BDS) for this class. The reason for this is that this year we are only using UNIX (and shell scripting) and programming in R which are both covered a bit in the BDS book.

Bioinformatics Data Skills, BDS. HIGHLY RECOMMENDED This book fills an important gap in that is oriented towards the day to day skills for anyone working in the fields of genomics and bioinformatics. In addition to covering the basic UNIX skills (and why we use UNIX in bioinformatics and genomics), it also covers subjects like overviews of the essential file types (.fasta, .fastq, .gff, etc) that are ubiquitous in the field. There is also a nice, but brief introduction to the essentials of R, using bioconductor and in particular range data, and two important chapters on how to organizing (and maximize reproducibility) of computational projects. Currently (August 29th 2018) this is ~52.44$ on amazon.ca . It is available as an e-book as well from the publisher. The author is still a PhD student (in population genomics), and wrote this in their first year of graduate school, so definitely worth supporting.

Practical Computing for Biologists. This book provides a nice, gentle introduction to the basic computational skills all biologists should have. In particular, with introduction to using the UNIX command line, shell scripting, basic python programming, regular expressions, working on remote machines and a few other topics. The book is written to be agnostic with respect to discipline (i.e. it is not a bioinformatics book per se), but does a great job of being both very accessible and immediately useful. It seems a bit pricy on Amazon.ca, but look around for used copies (it is 4 years old). If you plan to continue in computational research, this is a fantastic resource.

Important websites

For Brian's section. This will have pertinent links to Brian's section of the course.

R tutorials and screencasts. A link to the exercises, in-class activities, playlists for screencasts I have put together for the R tutorials. I will also be putting assignments up here. I will be adding more as the semester progresses. Mostly we will be using the excellent Datacamp online interactive 'courses' for the introductory stuff, and moving on from there.

Readings

Week 1 - BDS Chapter 1, Chapter 2 pages 21-30, Chapter 3 pages 37-45, Chapter 4 pages 57-59.

Also check out here for a review of organizing computational data analysis projects. You don't need to read about version control or using markdown yet.

Week 2 - Chapter 3 pages 45 - 56, Chapter 7 125 - 156. Maybe also worth looking at pages 395 - 398 in Chapter 12.

Week 3 - Chapter 7 140-145, 157-169 might be really useful. I also recommend the first tutorial on regular expressions listed here. This takes you gently through regular expressions and within an hour you will realize what amazing things you can do.

Week 5 - For more information on some of the basic file types used in genomics (.fastq, .SAM, .BAM) see chapter 10 (only 13 pages) and chapter 11 (pages 355-365). I also suggest reading chapter 6 (pages 109-115). Also, here is a link to a few tools for doing QC at various steps. Not meant to be comprehensive, but if you use some google, chances are someone has already written some tool for QC and sanity checks for some steps similar to your own. In case you want to see another example here is a tutorial on making a BLAST database and querying sequences with it.

Week 6 (Beginning R) - Start with DataCamp assignments (Courses Introduction To R and Intermediate R). Associated readings in BDS chapter 8 pages 175-206.

bio720's People

Contributors

idworkin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.