applied-bioinformatics / an-introduction-to-applied-bioinformatics Goto Github PK
View Code? Open in Web Editor NEWInteractive lessons in bioinformatics.
Home Page: http://readIAB.org
License: Other
Interactive lessons in bioinformatics.
Home Page: http://readIAB.org
License: Other
This light wrapper is required for the functionality introduced in skbio's #507 which is used the MSA chapter, while IAB depends on skbio 0.1.4.
link to Bayesian Methods for Hackers, as a source of inspiration
The code for formatting dynamic programming and traceback matrices is scattered and ugly. It should be consolidated, and all output rows should fit on a single line. Maybe the way to go with this is to load into numpy arrays and use the built-in formatting?
real-world msa could be based on some subset of Greengenes (derived from the cookbook example)
removed these as i didn't get to update them for new msa code yet as part of #81
This will initially derive content from this document and should be easily editable by me.
Details on stack overflow, but this definitely affects the person trying to get started
don't want to deal with changing url before then
The mammal tree based on hemoglobin doesn't work well, I think because the sequences are too similar, but maybe because we need a better distance metric for the aligned sequences that counts using blosum50. Basically want something that has organisms that students can relate to (in that they'll have an intuitive feel for who should be more/less closely related) and that is well annotated so those names are evident.
guide_tree_from_query_sequences fails when passed correct data.
686
687 guide_dm = DistanceMatrix(guide_dm, seq_ids)
--> 688 guide_lm = average(guide_dm.condensed_form())
689 guide_tree = to_tree(guide_lm)
690 if display_tree:
AttributeError: 'DistanceMatrix' object has no attribute 'condensed_form'
cluster_greedy
- this really should be something like cluster_centroid_distance
iab/algorithms/__init__.py
cluster_greedy_kmer
, just compute the kmer distances to the cluster centroids (Rob M. pointed this out)I updated ipython just before starting into iab, but found this as one of my first dependencies that needed addressing. Should these versions be pinned or maintained in requirements.py?
possible suggestions:
http://www.amazon.com/Processes-Life-Introduction-Molecular-Biology/dp/0262013053
NIH bookshelf
this could use seaborn heatmaps, e.g. where high values get darker colors.
related to #73
We should try to get IAB listed by Lifehacker U
@EvolDoc is working on proof-reading and critical review of this chapter.
Some other thoughts that I sent to @EvolDoc by email:
I reviewed the phylogeny chapter this morning and remember just how basic/short it actually is (much less developed than the other chapters). So, one thing that would be great to get your input on is what other methods you think are important to introduce students to at this stage. I specifically try to focus it on methods where the math isn't too challenging, and then point them to other resources for learning more. One great thing to add to this chapter would be a discussion of the limitations of a simple method like UPGMA, and what has been done to address those with other methods.
@kschwarzberg is currently reviewing this one.
Hi there, when I downloded the book and started working with it I already had all the necessary packages except iab. As I didn't need to install any module Ididn't saw the pip intall . line . Maybe if you could make a reference bout it in the installation section of the readme would be great. So others can see that the module iab mut be installed using pip.
Thanks !
The algorithms section is what I want to present as the example of where I'm hoping this project will lead. Clean up the various chapters to be more presentable.
thanks for the link @wasade.
There is a minor typing error in getting started - index.ipynb
Variables is written as varaibles.
Few comments:
local_pairwise_align_ssw
(addresses #30) in
iab
modulealgorithms
directory, so links don't change for students who are using them this semesterI feel bad having sent a bunch of email to Greg, but I didn't know the issue tracker was a thing until after I sent him a bunch of email about dependencies in the "Getting Started" page.
Currently when launching the notebook server from the top-level directory, the sub-directories are un-ordered. It's also confusing, because some of the directories (licenses
, iab
) don't contain any notebooks, so show up as blank. Need a better way to handle this so users aren't confused.
Screenshot:
In my case,
import os
os.getcwd()
import sys
sys.path.append('~/iab/An-Introduction-To-Applied-Bioinformatics/')
...and the ssw aligner
I think it would make sense to highlight some of the discussions that are included as Developer notes, where we briefly describe things that you'd want to think about if you were developing the functionality described in the text. These could be highlighted in some kind of box to stand out from the rest of the text. An example is in the pairwise alignment chapter:
Next steps: All of those steps are a bit ugly, so as a developer you'd want to make this functionality generally accessible to users. To do that, you'd want to define a function that takes all of the necessary input and provides the aligned sequences and the score as output, without requiring the user to make several function calls. What are the required inputs? What steps would this function need to perform?
this is the wrong link
wikipedia article on this topic.
I think you meant this:
wikipedia article on this topic
@ElDeveloper created a nice xkcd-style plot, and it'd be great to include the dates of introduction of the different technologies from this page which @walterst pointed me at.
I'm sure this is an upstream notebook issue, but thought it might be worth tracking here. I noticed it when reading through the pairwise section. If someone else can confirm that would be good.
this is a little cleaner, and code has syntax highlighting. see the getting-started/overview notebook for example.
The tests take ~40 minutes via Travis, which is running through all of the notebooks. Once we hit 50 minutes, Travis will abort the tests. Most of the cells run instantly, or with little delay, but some cells take several minutes to complete. @gregcaporaso thoughts on this?
Travis also requires that there is some sort of output printed within a 10-minute window, otherwise the tests will be killed. We're currently okay, but there are some cells that are likely close to this threshold.
this is related to scikit-bio/scikit-bio#161
auto-generated gh-pages is now posted here:
http://caporasolab.us/An-Introduction-To-Applied-Bioinformatics/
i don't really like how narrow the theme is - probably need some work on a better theme. @ebolyen, we should chat about this a little bit.
also, should we re-direct appliedbioinformatics.us to this page, or should we leave as a caporasolab.us url, to bring attention to the lab website?
Just so students aren't surprised if it takes 5 min or so
We need a way to automatically run all notebook code cells and make sure there aren't any errors. This could then be hookd up to Travis once #19 is completed.
I started looking into this for QIIME, but ran into issues because QIIME's notebooks only really run external commands (e.g., !validate_mapping_file.py -m ...
) and I couldn't easily find a way to check if the external commands failed. However, I think the things I link to here may be helpful for testing the iab notebooks, which are AFAIK all Python code.
Who knows, there might even be a way to do this directly using IPython 2.0 (there wasn't when I looked into it a few months ago).
Depends on #18 so that we have stuff to test.
functions need to be tested, and many ported to skbio
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.