Light

meyer-lab / thmm Goto Github PK

View Code? Open in Web Editor NEW

9.0 3.0 1.0 64.8 MB

A general Python framework for using hidden Markov models on binary trees or cell lineage trees.

Home Page: https://asmlab.org

License: MIT License

Makefile 0.22% Python 99.49% Batchfile 0.29%

inference bioengineering cell-biology heterogeneity drug-response

thmm's Introduction

tHMM

tHMM is a Python3 package for clustering, visualizing, and analyzing data in the form of lineage trees. This work is now published in Communications Biology https://www.nature.com/articles/s42003-022-04208-9

Install

$ git clone https://github.com/meyer-lab/tHMM.git

It takes a few seconds to clone the repository.

Software Requirements

This package is supported on macOS and Linux systems.

Python dependencies

tHMM requires virtualenv. All other required packages can then be installed using make venv to establish a virtual environment. The Python packages that will be installed are listed in pyproject.toml, and can be directly installed using poetry install.

Documentation

Please find the documentation at https://thmm.readthedocs.io.

License

This project is covered under the MIT License.

thmm's People

Contributors

Stargazers

Watchers

Forkers

mcreixell

thmm's Issues

NIH DEBUT Competition - Due May 31st

-6 page proposal
-2 minute video
-evidence of working prototype
-demonstrate this is marketable (ie researchers/pharma/clinicians have a need for this)

Track distribution parameters of a single state with one variable

Packaging these up into a list will make it easier to implement flexible emissions.

Test for `get_mutual_info` keeps failing

Normalized mutual information score keeps returning negative accuracies. The test is deleted for now just to move forward with adding stuff prior to abstract submission.

Originally posted by @aarmey in #93

Explore probabilistic programming options for Upward/Downward recursions.

See https://statmodeling.stat.columbia.edu/2017/02/07/hmms-stan-absolutely/.

Make sure everyone is up to speed with GitHub & command line

Clone repository onto Aretha accounts, obtain SSH keys, etc.

Employ max likelihood distribution fitting

Move testing to pytest

Nose is deprecated. Look at polyfc repo.

Checking root parents allows for unreasonable states

https://github.com/meyer-lab/lineage-growth/blob/fdd3673d9419e8dc5ef65e81cd736c7d1d357620/lineage/CellNode.py#L28

What if self.parent is None but self.gen != 1? Currently this returns False but it seems like this should raise an exception.

Multiprocessing

https://docs.python.org/2/library/multiprocessing.html

Use this.

Calculate relative probability of latent states for individual cell

Make `generateLineageWithTime()` handle different FOM options better.

The input should be a dictionary and based on what's in the dictionary, the function should operate differently. This would be a lot cleaner than having to have different arguments be None or implicitly excluded/included. This would also make input size checking easier.

Explore various `generate()` functions

Comments from previous PR:

"Probably worth having multiple strategies for generating the tree, dependent upon the purpose. For example, in experiments you track all the cells for 0 <= t <= T, so a more closely matching strategy would be to generate cells such that all cells have an ending event within 0–T." -Meyer

"In the proposed while method, we wouldn't output a tree unless it was a certain length, where as in the current way, the number of cells is just an upper bound that doesn't necessarily have to be reached in the creation of a tree. We can just write multiple generators as @aarmey suggested. I remember when we were making the generate() function you also had the idea of using a time based generator which could work for the idea @aarmey has above." -Shakthi

Move build to inside virtualenv environment

This will help us ensure that we've marked all the packages we use, and pin versions.

Make pytest more verbose

Look through hmmlearn tests and implement the ones relevant, particularly for BW

Change verbose settings, printing in functions to logging

This will clean up the code considerably. See built-in package logging.

Check paper formatting

Line numbers, 1.5 spacing, 12 pt font.

Accuracy implementation

Figure out an unbiased way to calculate accuracy.

Move paper to build script pipeline

Probably best done piece-wise:

Given a lineage tree and distribution information, calculate the likelihood of the tree and choose analysis of highest likelihood

Decide on file format and import strategy for lineage information

Employ expectation maximization for learning

Strange error when doing switchT - related to previous overflow maybe?

Find method for automated cell lineage tracing with nuclear marker

Given the distributions and lineage observations, calculate the max likelihood hidden states

Given distributions and lineage, calculate likelihood

Now taking into account transition probabilities

Remove `keepBern` parameter in tHMM.

keepBern is no longer needed because now we can just use not isUnfinished(). It's is also worth investigating whether or not removeNaNs() is needed anymore.

Use tHMM model itself to build synthetic data

Current synthetic data is a hack.

Profile code

We should find where the fitting process is spending the most time. I guarantee there will be large optimizations we can make. Take a look at testprofile in the makefile, which uses cProfile to profile, then gprof2dot to visualize which functions take the most time.

Change `deathObserved` member variable of CellNode to `fateObserved`.

Change deathObserved member variable of CellNode to fateObserved.

Fix overflow issue in Gompertz estimation and likelihood calculation.

Keeping notes on the mathematics

https://www.overleaf.com/4613729765tqmsjgdpxrgg

Here is a \LaTeX project I started on Overleaf, which will need an Overleaf account, where we can keep all our notes on the mathematics and other written down aspects of our project. This will also make sure we have consistent notation, and an easier way to represent math if we need to move them to slides or figures later. I guess we can start with a review on HMMs and move from there.

Calculate bulk expected growth in presence of heterogeneity

UPDATE DUE MAY 8THBMES Abstract

Add a .gitignore and remove most recent iPython notebook from master branch

Pretty simple

Calculate bulk expected growth in absence of heterogeneity

Implement cell cycle reporter emissions

The Heiser lab has a reporter for cells in G1 or G2/S. Cells can then either continue or die in either phase. For now, we'd like to evaluate the collected data on the one state model.

Paper (general writing and figure making progress)

https://1drv.ms/w/s!AncRvdvF6gtNgdJqtfqU68JJSkR1SQ OBSOLETE

https://1drv.ms/w/s!AncRvdvF6gtNgdJ7Y_TqAhLuHQA1CA USE THIS

Above is a link to an editable Word document (moved away from Overleaf due to limited times we can share a document). Some of the LaTeX didn't move well so we have to change it.

Here is a list of figures (sub-figures) that we decided should be implemented (see Slack and Meeting notes for more detail):

Graphical abstract of our method. Message: Summarize our method/paper.
Cells (images of cells, tracking images, segmentation, scale bars, resolutions, etc.) Message: Our experimental setup, and could be an example of the type of setup a user might want to use our model with.
Plot a synthesized lineage, plot lineage from wet-lab data. Message: Examples of possible inputs to our model. What would a user need to use our model. This figure may not be necessary.
Graphical Model of our tHMM/ Graphical model of our transitions (state transition diagram). See:
Percent classified (some measure of accuracy) vs. Divergence of cell phenotypes (i.e. show how different cells have to be (how far apart must the Bernoulli and Gompertz parameters be) to get accurate classification/state change). Utilize KL-divergence. Message: How far apart do our underlying distributions have to be for MLE (one state model) to fail? for BW/Viterbi (>1 state model) to succeed?
Percent classified (some measure of accuracy) vs. Tree length (i.e. how does tree length affect accurate classification/state change) Message: How does the experiment time affect the accuracy of MLE? of BW/Viterbi?
Percent classified vs. Number of lineages (i.e. how do number of lineages affect inference) Message: How many lineages do I need to accurately obtain the state parameters for MLE (1 state model)? for BW/Viterbi (2 state model)? Averaging the fitted state parameters over several lineages should do better than just using 1 lineage of a cell type/state.
A heat-map of a single lineage (or more) showing how the state change is altered over time. Plot the gammas, i.e. the smoothed-probabilities, as a tree. This figure may not be necessary.

If we have time:

If there are x% of cells in 1 state and (1-x)% in the other state, how many cells and/or lineages are required to appropriately classify this?
Explore Akaike Information Criterion (AIC) for multi-state models.

Feel free to edit the above. If you make changes to the document, please use track changes or just add a summary of your edits here. You can also make edits to the Google Drive document and then we can move that here.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble