GithubHelp home page GithubHelp logo

applied-bioinformatics / an-introduction-to-applied-bioinformatics Goto Github PK

View Code? Open in Web Editor NEW
769.0 769.0 312.0 18.95 MB

Interactive lessons in bioinformatics.

Home Page: http://readIAB.org

License: Other

Python 97.95% TeX 1.95% CSS 0.10%

an-introduction-to-applied-bioinformatics's People

Contributors

2grep avatar anderspitman avatar ccwnau avatar ebolyen avatar ellasantanapropper avatar erictleung avatar gitter-badger avatar gregcaporaso avatar iab-reader1 avatar jab743 avatar jairideout avatar kelseyatkin avatar khdc-me avatar kschwarzberg avatar ktaed avatar lewisacidic avatar lkruse1998 avatar llcooljohn avatar lsl5 avatar mandel01 avatar maxvonhippel avatar mribeirodantas avatar nonsense64 avatar shiffer1 avatar thermokarst avatar wasade avatar wvg3 avatar yourbuddyconner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

an-introduction-to-applied-bioinformatics's Issues

clean up matrix formatting

The code for formatting dynamic programming and traceback matrices is scattered and ugly. It should be consolidated, and all output rows should fit on a single line. Maybe the way to go with this is to load into numpy arrays and use the built-in formatting?

better real-world example of creating tree from biological sequences

The mammal tree based on hemoglobin doesn't work well, I think because the sequences are too similar, but maybe because we need a better distance metric for the aligned sequences that counts using blosum50. Basically want something that has organisms that students can relate to (in that they'll have an intuitive feel for who should be more/less closely related) and that is well annotated so those names are evident.

bug in guide_tree_from_query_sequences

guide_tree_from_query_sequences fails when passed correct data.

686 
687     guide_dm = DistanceMatrix(guide_dm, seq_ids)

--> 688 guide_lm = average(guide_dm.condensed_form())
689 guide_tree = to_tree(guide_lm)
690 if display_tree:

AttributeError: 'DistanceMatrix' object has no attribute 'condensed_form'

clustering chapter to do items

  • better name for cluster_greedy - this really should be something like cluster_centroid_distance
  • images to illustrate the different clustering algorithms - see phone for photos of whiteboard from lecture
  • add open reference discussion
  • move cluster functions to iab/algorithms/__init__.py
  • add some discussion of real world run time for OTU picking (several people asked questions about doing this iteratively - like iterative msa - which is interesting, but runtime would be a limiting factor)
  • instead of computing all kmer distances in cluster_greedy_kmer, just compute the kmer distances to the cluster centroids (Rob M. pointed this out)
  • add discussion of why approximations are required (i.e., why can't you compute distances between all pairs of sequences, build a tree, and define OTUs based on clades in the tree?) - this should go in the top of the notebook so it's clear why we don't compare all sequences against all other sequences.

phylogeny chapter text corrections

@EvolDoc is working on proof-reading and critical review of this chapter.

Some other thoughts that I sent to @EvolDoc by email:

I reviewed the phylogeny chapter this morning and remember just how basic/short it actually is (much less developed than the other chapters). So, one thing that would be great to get your input on is what other methods you think are important to introduce students to at this stage. I specifically try to focus it on methods where the math isn't too challenging, and then point them to other resources for learning more. One great thing to add to this chapter would be a discussion of the limitations of a simple method like UPGMA, and what has been done to address those with other methods.

iab setup

Hi there, when I downloded the book and started working with it I already had all the necessary packages except iab. As I didn't need to install any module Ididn't saw the pip intall . line . Maybe if you could make a reference bout it in the installation section of the readme would be great. So others can see that the module iab mut be installed using pip.

Thanks !

Minor typo

There is a minor typing error in getting started - index.ipynb

Variables is written as varaibles.

MSA big-O tweaks

Few comments:

  • increasing from 25 to 150 to highlight how bad this can be for just two of your millions of miseq sequences
  • using a log-linear plot may be awesome and will show the diffs between the curves better
  • the y-axis is not seconds, but an undefined variable that is proportional to time

updates for scikit-bio 0.2.1

  • update in install instructions
    • readme
    • website
  • use local_pairwise_align_ssw (addresses #30) in
    • ch 1
    • ch 2
    • ch 3
    • ch 4
    • ch 5
    • ch 6
  • drop aligner functions from iab module

update contents

  • drop the "General Molecular Biology" section, but maybe merge some of that information into "Getting Started" - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
  • drop all of the python stuff (beyond the scope of what I can do here, and there are a lot of resources out there for learning python - I'll link to those from "Getting Started") - removed from README.md, but notebooks are still in place so links don't change for students who are using them this semester
  • rename algorithms to "Fundamentals" - done in README.md, but notebooks are still in algorithms directory, so links don't change for students who are using them this semester
  • add an "Applications" section, where I add new notebooks on measuring diversity (i.e., qiime-like stuff), genome assembly, ...;
  • drop the 'Statistics' section (beyond the scope, and will link to other materials);
  • drop the "Other topics" section (pending notebooks that'd fall under that category, but might add some of my reproducibility in computing type stuff there at some point);
  • remove numbers from all chapter names, now that the chapter layout is ordered in the Index.ipynb (waiting a couple of weeks on this so links don't change for students who are using them this semester)

presentation of notebook start page needs some work

Currently when launching the notebook server from the top-level directory, the sub-directories are un-ordered. It's also confusing, because some of the directories (licenses, iab) don't contain any notebooks, so show up as blank. Need a better way to handle this so users aren't confused.

Screenshot:

screen shot 2014-04-11 at 8 05 35 am

add "developer notes"

I think it would make sense to highlight some of the discussions that are included as Developer notes, where we briefly describe things that you'd want to think about if you were developing the functionality described in the text. These could be highlighted in some kind of box to stand out from the rest of the text. An example is in the pairwise alignment chapter:

Next steps: All of those steps are a bit ugly, so as a developer you'd want to make this functionality generally accessible to users. To do that, you'd want to define a function that takes all of the necessary input and provides the aligned sequences and the score as output, without requiring the user to make several function calls. What are the required inputs? What steps would this function need to perform?

make notebooks run faster

The tests take ~40 minutes via Travis, which is running through all of the notebooks. Once we hit 50 minutes, Travis will abort the tests. Most of the cells run instantly, or with little delay, but some cells take several minutes to complete. @gregcaporaso thoughts on this?

Travis also requires that there is some sort of output printed within a 10-minute window, otherwise the tests will be killed. We're currently okay, but there are some cells that are likely close to this threshold.

add automated testing of notebooks

We need a way to automatically run all notebook code cells and make sure there aren't any errors. This could then be hookd up to Travis once #19 is completed.

I started looking into this for QIIME, but ran into issues because QIIME's notebooks only really run external commands (e.g., !validate_mapping_file.py -m ...) and I couldn't easily find a way to check if the external commands failed. However, I think the things I link to here may be helpful for testing the iab notebooks, which are AFAIK all Python code.

Who knows, there might even be a way to do this directly using IPython 2.0 (there wasn't when I looked into it a few months ago).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.