Interactive lessons in bioinformatics.

Home Page: http://readIAB.org

License: Other

Python 97.95% TeX 1.95% CSS 0.10%

an-introduction-to-applied-bioinformatics's People

Contributors

Stargazers

Watchers

Forkers

evoldoc jairideout ebolyen ataheri lukeping tastysquirrel zed9 sodiroa bo-maxwell-stevens kschwarzberg edgingtonn1 gtuckerkellogg ajaykshatriya simexin sunxingqiang honglongwu raivivek fixiecoding christopheryoung sjsrey wasade drestmont yiliu001 sameera-d b-rich fw1121 odewahn pepsalehi kchennen randoramma johnchase aungthurhahein ials muntasirali anderspitman gitter-badger tarah28 shiffer1 wmercurio welch16 bdx0 eldeveloper iab-reader1 chaokang2012 daniorerio theoryno3 gvanzin mcolic xiaojieqiu elisemorrison kescobo marimiyapriv hanruiwu yugi1001 lsl5 uznix igoralves1 eal98 juliahull erictleung yourbuddyconner jnaulty ranaivosonherimanitra drilett thedegreeisalie hele4213 gtv-jtm264 mondo023 here0009 robertoalvarezm thecartdepart kelseyatkin jab743 ccwnau hr266 nonsense64 ktaed lukesanch bioxiao gls9000 amilano83 zsailer bobtodd marienhof kylescotshank maggievanderberg gsc0107 cor215 pchinchen housw frap91 nab2000 zydisney andrewsanchez nga24 bigdataguru marencc pablormier reemuw dmvelazquez

an-introduction-to-applied-bioinformatics's Issues

remove iab.algorithms.DNA when skbio dependency is updated beyond 0.1.4

This light wrapper is required for the functionality introduced in skbio's #507 which is used the MSA chapter, while IAB depends on skbio 0.1.4.

add acknowledgements to readme

link to Bayesian Methods for Hackers, as a source of inspiration

clean up matrix formatting

The code for formatting dynamic programming and traceback matrices is scattered and ugly. It should be consolidated, and all output rows should fit on a single line. Maybe the way to go with this is to load into numpy arrays and use the built-in formatting?

add back beer example and real-word msa

real-world msa could be based on some subset of Greengenes (derived from the cookbook example)

removed these as i didn't get to update them for new msa code yet as part of #81

port project timeline to milestones and issues

This will initially derive content from this document and should be easily editable by me.

move the database searching chapter to applications

six dependency currently requires future==1.13

Details on stack overflow, but this definitely affects the person trying to get started

http://stackoverflow.com/questions/26247431/future-utils-six-not-found-when-trying-to-import-skbio-modules

Genome Assembly/Analysis

http://evomicsorg.wpengine.netdna-cdn.com/wp-content/uploads/2015/01/2015-evomics-assembly.pdf
https://www.coursera.org/course/ads1

add suggested reading in a new "getting started with python" section

possible suggestions:
http://nbviewer.ipython.org/gist/anonymous/11250965
http://learnpythonthehardway.org/
http://www.amazon.com/Practical-Computing-Biologists-Steven-Haddock/dp/0878933913
http://www.amazon.com/Practical-Programming-Introduction-Pragmatic-Programmers/dp/1934356271

rename "alignment-exercises" as "pairwise-alignment-exercises" after BIO 299 assignment based on this is due

don't want to deal with changing url before then

better real-world example of creating tree from biological sequences

The mammal tree based on hemoglobin doesn't work well, I think because the sequences are too similar, but maybe because we need a better distance metric for the aligned sequences that counts using blosum50. Basically want something that has organisms that students can relate to (in that they'll have an intuitive feel for who should be more/less closely related) and that is well annotated so those names are evident.

bug in guide_tree_from_query_sequences

guide_tree_from_query_sequences fails when passed correct data.

686 
687     guide_dm = DistanceMatrix(guide_dm, seq_ids)

--> 688 guide_lm = average(guide_dm.condensed_form())
689 guide_tree = to_tree(guide_lm)
690 if display_tree:

AttributeError: 'DistanceMatrix' object has no attribute 'condensed_form'

clustering chapter to do items

better name for cluster_greedy - this really should be something like cluster_centroid_distance
images to illustrate the different clustering algorithms - see phone for photos of whiteboard from lecture
add open reference discussion
move cluster functions to iab/algorithms/__init__.py
add some discussion of real world run time for OTU picking (several people asked questions about doing this iteratively - like iterative msa - which is interesting, but runtime would be a limiting factor)
instead of computing all kmer distances in cluster_greedy_kmer, just compute the kmer distances to the cluster centroids (Rob M. pointed this out)
add discussion of why approximations are required (i.e., why can't you compute distances between all pairs of sequences, build a tree, and define OTUs based on clades in the tree?) - this should go in the top of the notebook so it's clear why we don't compare all sequences against all other sequences.

scikit currently requires numpy==1.8 (API version 9), release ipython comes with version 7?

I updated ipython just before starting into iab, but found this as one of my first dependencies that needed addressing. Should these versions be pinned or maintained in requirements.py?

add suggested reading in a new "getting started with biology" section

possible suggestions:
http://www.amazon.com/Processes-Life-Introduction-Molecular-Biology/dp/0262013053
NIH bookshelf

add graphical matrix formatting for score, dynamic programming, and traceback matrices

this could use seaborn heatmaps, e.g. where high values get darker colors.

related to #73

lifehacker u

We should try to get IAB listed by Lifehacker U

phylogeny chapter text corrections

@EvolDoc is working on proof-reading and critical review of this chapter.

Some other thoughts that I sent to @EvolDoc by email:

I reviewed the phylogeny chapter this morning and remember just how basic/short it actually is (much less developed than the other chapters). So, one thing that would be great to get your input on is what other methods you think are important to introduce students to at this stage. I specifically try to focus it on methods where the math isn't too challenging, and then point them to other resources for learning more. One great thing to add to this chapter would be a discussion of the limitations of a simple method like UPGMA, and what has been done to address those with other methods.

MSA chapter intro text suggestions

@kschwarzberg is currently reviewing this one.

check out Daniel's python tutorial

Review tutorial generated by @wasade here.

iab setup

Hi there, when I downloded the book and started working with it I already had all the necessary packages except iab. As I didn't need to install any module Ididn't saw the pip intall . line . Maybe if you could make a reference bout it in the installation section of the readme would be great. So others can see that the module iab mut be installed using pip.

Thanks !

pass through full algorithms section to clean-up and prep as demo

The algorithms section is what I want to present as the example of where I'm hoping this project will lead. Clean up the various chapters to be more presentable.

1-pairwise-alignment needs some copyediting

Various minor typos, thinkos, etc. Mainly submitting this for bookkeeping to associate with pull-request for commits 3ba3f7c and d445c57.

link to big-o cheatsheet

http://bigocheatsheet.com/

thanks for the link @wasade.

Minor typo

There is a minor typing error in getting started - index.ipynb

Variables is written as varaibles.

MSA big-O tweaks

Few comments:

increasing from 25 to 150 to highlight how bad this can be for just two of your millions of miseq sequences
using a log-linear plot may be awesome and will show the diffs between the curves better
the y-axis is not seconds, but an undefined variable that is proportional to time

add Travis build icon to README.md

updates for scikit-bio 0.2.1

updates to teaching web site to consolidate material here

#38
#39
link to IAB

update contents

drop the "General Molecular Biology" section, but maybe merge some of that information into "Getting Started" - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
drop all of the python stuff (beyond the scope of what I can do here, and there are a lot of resources out there for learning python - I'll link to those from "Getting Started") - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
rename algorithms to "Fundamentals" - done in README.md, but notebooks are still in algorithms directory, so links don't change for students who are using them this semester
add an "Applications" section, where I add new notebooks on measuring diversity (i.e., qiime-like stuff), genome assembly, ...;
drop the 'Statistics' section (beyond the scope, and will link to other materials);
drop the "Other topics" section (pending notebooks that'd fall under that category, but might add some of my reproducibility in computing type stuff there at some point);
remove numbers from all chapter names, now that the chapter layout is ordered in the Index.ipynb (waiting a couple of weeks on this so links don't change for students who are using them this semester)

add mantel and bioenv examples to the biological diversity chapter

Add reference to github issue tracker in Disclaimer, alongside email and pull requests

I feel bad having sent a bunch of email to Greg, but I didn't know the issue tracker was a thing until after I sent him a bunch of email about dependencies in the "Getting Started" page.

presentation of notebook start page needs some work

Currently when launching the notebook server from the top-level directory, the sub-directories are un-ordered. It's also confusing, because some of the directories (licenses, iab) don't contain any notebooks, so show up as blank. Need a better way to handle this so users aren't confused.

Screenshot:

pairwise-alignment working directory is algorithms. When trying to import iab.algorithms, need to change back to project root

http://stackoverflow.com/questions/15514593/importerror-no-module-named-when-trying-to-run-python-script

In my case,

import os
os.getcwd()
import sys
sys.path.append('~/iab/An-Introduction-To-Applied-Bioinformatics/')

update clustering notebook to use qiime_default_reference...

...and the ssw aligner

add "developer notes"

I think it would make sense to highlight some of the discussions that are included as Developer notes, where we briefly describe things that you'd want to think about if you were developing the functionality described in the text. These could be highlighted in some kind of box to stand out from the rest of the text. An example is in the pairwise alignment chapter:

Next steps: All of those steps are a bit ugly, so as a developer you'd want to make this functionality generally accessible to users. To do that, you'd want to define a function that takes all of the necessary input and provides the aligned sequences and the score as output, without requiring the user to make several function calls. What are the required inputs? What steps would this function need to perform?

make iab pip-installable

1-pairwise-alignment: linkrot in BLOSUM link to wikipedia

this is the wrong link
wikipedia article on this topic.
I think you meant this:
wikipedia article on this topic

post multiple sequence alignment exercises when ready

add plot of sequencing technology changes over time

@ElDeveloper created a nice xkcd-style plot, and it'd be great to include the dates of introduction of the different technologies from this page which @walterst pointed me at.

Embedded LaTeX doesn't always render properly on iPad

I'm sure this is an upstream notebook issue, but thought it might be worth tracking here. I noticed it when reading through the pairwise section. If someone else can confirm that would be good.

update all notebooks to use psource magic instead of getsource()

this is a little cleaner, and code has syntax highlighting. see the getting-started/overview notebook for example.

make notebooks run faster

The tests take ~40 minutes via Travis, which is running through all of the notebooks. Once we hit 50 minutes, Travis will abort the tests. Most of the cells run instantly, or with little delay, but some cells take several minutes to complete. @gregcaporaso thoughts on this?

Travis also requires that there is some sort of output printed within a 10-minute window, otherwise the tests will be killed. We're currently okay, but there are some cells that are likely close to this threshold.

improve formatting of substitution matrices

this is related to scikit-bio/scikit-bio#161

website improvements

auto-generated gh-pages is now posted here:
http://caporasolab.us/An-Introduction-To-Applied-Bioinformatics/

i don't really like how narrow the theme is - probably need some work on a better theme. @ebolyen, we should chat about this a little bit.

also, should we re-direct appliedbioinformatics.us to this page, or should we leave as a caporasolab.us url, to bring attention to the lab website?

Mention that iterative/progressive alignment can take a while in the exercises

Just so students aren't surprised if it takes 5 min or so

add automated testing of notebooks

We need a way to automatically run all notebook code cells and make sure there aren't any errors. This could then be hookd up to Travis once #19 is completed.

I started looking into this for QIIME, but ran into issues because QIIME's notebooks only really run external commands (e.g., !validate_mapping_file.py -m ...) and I couldn't easily find a way to check if the external commands failed. However, I think the things I link to here may be helpful for testing the iab notebooks, which are AFAIK all Python code.

Who knows, there might even be a way to do this directly using IPython 2.0 (there wasn't when I looked into it a few months ago).

applied-bioinformatics / an-introduction-to-applied-bioinformatics Goto Github PK

an-introduction-to-applied-bioinformatics's People

Contributors

Stargazers

Watchers

Forkers

an-introduction-to-applied-bioinformatics's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs