GithubHelp home page GithubHelp logo

lrgr / sigma Goto Github PK

View Code? Open in Web Editor NEW
16.0 7.0 7.0 14.82 MB

SigMa is a probabilistic model for the sequential dependencies of mutation signatures

License: MIT License

Python 100.00%
mutational-signatures hidden-markov-models cancer breast-cancer

sigma's Introduction

SigMa

Build Status License: MIT

This repository contains the source code for SigMa (Signature Markov model) and related experiments. SigMa is a probabilistic model of the sequential dependencies among mutation signatures.

Below, we provide an overview of the SigMa model from the corresponding paper. "The input data consists of (A) a set of predefined signatures that form an emission matrix E (here, for simplicity, represented over six mutation types), and (B) a sequence of mutation categories from a single sample and a distance threshold separating sky and cloud mutation segments. (C) The SigMa model has two components: (top) a multinomial mixture model (MMM) for isolated sky mutations and (bottom) an extension of a Hidden Markov Model (HMM) capturing sequential dependencies between close-by cloud mutations; all model parameters are learned from the input data in an unsupervised manner. (D) SigMa finds the most likely sequence of signatures that explains the observed mutations in sky and clouds."

Setup

Dependencies

SigMa is written in Python 3. We recommend using Conda to manage dependencies, which you can do directly using the provided environment.yml file:

conda env create -f environment.yml
source activate sigma-env

For windows replace last command with

activate sigma-env

Usage

We use Snakemake to manage the workflow of running SigMa on hundreds of tumor samples.

Reproducing the experiments from the SigMa paper

First, download and preprocess the ICGC breast cancer whole-genomes and COSMIC mutation signatures. To do so, run:

cd data && snakemake all

Second, run SigMa and a multinomial mixture model (MMM) on each sample, and perform leave-one-out cross-validation (LOOCV):

snakemake all

This will create an output/ directory, with two subdirectories: models/ and loocv/. models/ contains SigMa trained on each sample. loocv/ contains the results of LOOCV with SigMa using different cloud thresholds.

Configuration

To run the entire SigMa workflow on different mutation signatures or data, see the Snakefile for configuration options.

To train SigMa or MMM on individual mutation sequences, use the src/train_and_predict.py script. To get a list of command-line arguments, run:

python src/train_and_predict.py -h

Support

Please report bugs and feature requests in the Issues tab of this GitHub repository.

For further questions, please email Max Leiserson and Itay Sason directly.

References

Xiaoqing Huang*, Itay Sason*, Damian Wojtowicz*, Yoo-Ah Kim, Mark Leiserson^, Teresa M Przytycka^, Roded Sharan^. Hidden Markov Models Lead to Higher Resolution Maps of Mutation Signature Activity in Cancer. Genome Medicine (2019) doi: 10.1186/s13073-019-0659-1.

* equal first author contribution ^ equal senior author contribution

sigma's People

Contributors

itaysason avatar mdml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sigma's Issues

Trying to run

Hi. I'm trying to run sigma but I'm still a little bit confused. For the complete workflow, how do I generate the input matrix?
Thanks

runtime and parallelization

Going through the example from the readme. When I run the second snakemake all command, it initiates a process which seems to be extremely long-running (3 of 7256 steps (0.04%) done ). At the same time sigma only seems to use one core (?). Is there a way of speeding up the process, enabling multi-core, of even even multi-CPU on a cluster?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.