GithubHelp home page GithubHelp logo

meyer-lab / thmm Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 1.0 64.8 MB

A general Python framework for using hidden Markov models on binary trees or cell lineage trees.

Home Page: https://asmlab.org

License: MIT License

Makefile 0.22% Python 99.49% Batchfile 0.29%
inference bioengineering cell-biology heterogeneity drug-response

thmm's Introduction

tHMM

https://readthedocs.org/projects/tHMM/badge/?version=latest

tHMM is a Python3 package for clustering, visualizing, and analyzing data in the form of lineage trees. This work is now published in Communications Biology https://www.nature.com/articles/s42003-022-04208-9

Install

$ git clone https://github.com/meyer-lab/tHMM.git

It takes a few seconds to clone the repository.

Software Requirements

This package is supported on macOS and Linux systems.

Python dependencies

tHMM requires virtualenv. All other required packages can then be installed using make venv to establish a virtual environment. The Python packages that will be installed are listed in pyproject.toml, and can be directly installed using poetry install.

Documentation

Please find the documentation at https://thmm.readthedocs.io.

License

This project is covered under the MIT License.

thmm's People

Contributors

aarmey avatar adamcweiner avatar alifarhat40 avatar dependabot[bot] avatar farnazmdi avatar github-actions[bot] avatar jclagarde avatar lkargi avatar namn12 avatar scottdtaylor95 avatar shak360 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mcreixell

thmm's Issues

NIH DEBUT Competition - Due May 31st

-6 page proposal
-2 minute video
-evidence of working prototype
-demonstrate this is marketable (ie researchers/pharma/clinicians have a need for this)

Test for `get_mutual_info` keeps failing

Normalized mutual information score keeps returning negative accuracies. The test is deleted for now just to move forward with adding stuff prior to abstract submission.

Explore various `generate()` functions

Comments from previous PR:

"Probably worth having multiple strategies for generating the tree, dependent upon the purpose. For example, in experiments you track all the cells for 0 <= t <= T, so a more closely matching strategy would be to generate cells such that all cells have an ending event within 0โ€“T." -Meyer

"In the proposed while method, we wouldn't output a tree unless it was a certain length, where as in the current way, the number of cells is just an upper bound that doesn't necessarily have to be reached in the creation of a tree. We can just write multiple generators as @aarmey suggested. I remember when we were making the generate() function you also had the idea of using a time based generator which could work for the idea @aarmey has above." -Shakthi

Move paper to build script pipeline

Probably best done piece-wise:

  • Bibliography (bibtex)
  • Methods
  • Introduction
  • Discussion
  • Results
  • Figure captions
  • Authorship, abstract, etc
  • Proofing

Remove `keepBern` parameter in tHMM.

keepBern is no longer needed because now we can just use not isUnfinished(). It's is also worth investigating whether or not removeNaNs() is needed anymore.

Profile code

We should find where the fitting process is spending the most time. I guarantee there will be large optimizations we can make. Take a look at testprofile in the makefile, which uses cProfile to profile, then gprof2dot to visualize which functions take the most time.

Keeping notes on the mathematics

https://www.overleaf.com/4613729765tqmsjgdpxrgg

Here is a \LaTeX project I started on Overleaf, which will need an Overleaf account, where we can keep all our notes on the mathematics and other written down aspects of our project. This will also make sure we have consistent notation, and an easier way to represent math if we need to move them to slides or figures later. I guess we can start with a review on HMMs and move from there.

Implement cell cycle reporter emissions

The Heiser lab has a reporter for cells in G1 or G2/S. Cells can then either continue or die in either phase. For now, we'd like to evaluate the collected data on the one state model.

Paper (general writing and figure making progress)

https://1drv.ms/w/s!AncRvdvF6gtNgdJqtfqU68JJSkR1SQ OBSOLETE

https://1drv.ms/w/s!AncRvdvF6gtNgdJ7Y_TqAhLuHQA1CA USE THIS

Above is a link to an editable Word document (moved away from Overleaf due to limited times we can share a document). Some of the LaTeX didn't move well so we have to change it.

Here is a list of figures (sub-figures) that we decided should be implemented (see Slack and Meeting notes for more detail):

  • Graphical abstract of our method. Message: Summarize our method/paper.

  • Cells (images of cells, tracking images, segmentation, scale bars, resolutions, etc.) Message: Our experimental setup, and could be an example of the type of setup a user might want to use our model with.

  • Plot a synthesized lineage, plot lineage from wet-lab data. Message: Examples of possible inputs to our model. What would a user need to use our model. This figure may not be necessary.

  • Graphical Model of our tHMM/ Graphical model of our transitions (state transition diagram). See:
    image

  • Percent classified (some measure of accuracy) vs. Divergence of cell phenotypes (i.e. show how different cells have to be (how far apart must the Bernoulli and Gompertz parameters be) to get accurate classification/state change). Utilize KL-divergence. Message: How far apart do our underlying distributions have to be for MLE (one state model) to fail? for BW/Viterbi (>1 state model) to succeed?

  • Percent classified (some measure of accuracy) vs. Tree length (i.e. how does tree length affect accurate classification/state change) Message: How does the experiment time affect the accuracy of MLE? of BW/Viterbi?

  • Percent classified vs. Number of lineages (i.e. how do number of lineages affect inference) Message: How many lineages do I need to accurately obtain the state parameters for MLE (1 state model)? for BW/Viterbi (2 state model)? Averaging the fitted state parameters over several lineages should do better than just using 1 lineage of a cell type/state.

  • A heat-map of a single lineage (or more) showing how the state change is altered over time. Plot the gammas, i.e. the smoothed-probabilities, as a tree. This figure may not be necessary.

If we have time:

  • If there are x% of cells in 1 state and (1-x)% in the other state, how many cells and/or lineages are required to appropriately classify this?

  • Explore Akaike Information Criterion (AIC) for multi-state models.

Feel free to edit the above. If you make changes to the document, please use track changes or just add a summary of your edits here. You can also make edits to the Google Drive document and then we can move that here.

No longer need 'removeNaNs()'

No longer need 'removeNaNs()' This needs to be removed from the code. Everything should be functional without it considering the new changes with the estimators. This is most likely the biggest change, after winter quarter of capstone.

NaN in gammas

image

I get this when trying to make populations with lineages of 10 cells or less

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.