GithubHelp home page GithubHelp logo

algo-cancer / phiscs-bnb Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 3.0 12.1 MB

A Fast Branch and Bound Algorithm for the Perfect Tumor Phylogeny Reconstruction Problem

Python 86.73% Scala 13.27%
single-cell tumor-heterogeneity tumor-evolution cancer-genomics phylogeny sequencing bioinformatics

phiscs-bnb's Introduction

PhISCS-BnB

PhISCS-BnB is a fast tool for reconstructing the perfect tumor phylogeny using single-cell data via branch and bound.

PhISCS-BnB has been published in Bioinformatics (Proceedings of ISMB 2020) (doi:10.1093/bioinformatics/btaa464). If you find this code useful in your research, please consider citing.

@article{azer2020phiscs,
  doi           = {10.1093/bioinformatics/btaa464},
  url           = {https://doi.org/10.1093/bioinformatics/btaa464},
  year          = 2020,
  month         = jul,
  publisher     = {Oxford University Press ({OUP})},
  volume        = {36},
  number        = {Supplement{\_}1},
  pages         = {i169--i176},
  author        = {Erfan {Sadeqi Azer} and Farid {Rashidi Mehrabadi} and Salem Maliki{\'{c}} and Xuan Cindy Li and Osnat Bartok and Kevin Litchfield and Ronen Levy and Yardena Samuels and Alejandro A Sch\"{a}ffer and E Michael Gertz and Chi-Ping Day and Eva P{\'{e}}rez-Guijarro and Kerrie Marie and Maxwell P Lee and Glenn Merlino and Funda Ergun and S Cenk Sahinalp},
  title         = {{{PhISCS}-{BnB}: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem}},
  journal       = {Bioinformatics}
}

Contents

  1. Installation
  2. Running
  3. Example
  4. Contact

Installation

PhISCS-BnB is written in Python. It supports Python 3. Currently it is intended to be run on POSIX-based systems (only Linux and macOS have been tested).

~$ git clone https://github.com/algo-cancer/PhISCS-BnB.git
~$ cd PhISCS-BnB
~$ pip install -r requirements.txt
~$ python main.py --help

Running

Input

Single-cell input is assumed to be represented in the form of ternary, tab-delimited, matrix with rows corresponding to single-cells and columns corresponding to mutations. We assume that this file contains headers and that matrix is ternary matrix with 0 denoting the absence and 1 denoting the presence of mutation in a given cell, whereas ? represents the lack of information about presence/absence of mutation in a given cell (i.e. missing entry). In order to simplify parsing of the matrix, we also assume that upper left corner equals to string cellID/mutID.

Below is an example of single-cell data matrix. Note that mutation and cell names are arbitrary strings not containing tabs or spaces, however they must be unique.

cellID/mutID  mut0  mut1  mut2  mut3  mut4  mut5  mut6  mut7
cell0         0     0     ?     0     0     0     0     0
cell1         0     ?     1     0     0     0     1     1
cell2         0     0     1     0     0     0     1     1
cell3         1     1     0     0     0     0     0     0
cell4         0     0     1     0     0     0     0     0
cell5         1     0     0     0     0     0     0     0
cell6         0     0     1     0     0     0     1     1
cell7         0     0     1     0     0     0     0     0
cell8         ?     0     0     0     ?     0     ?     1
cell9         0     1     0     0     0     0     0     0

Output

The program will generate a file in OUT_DIR folder (which is set by argument -o or --outDir). This folder will be created automatically if it does not exist.

The output matrix is also a tab-delimited file having the same format as the input matrix, except that eliminated mutations (columns) are excluded (so, in case when mutation elimination is allowed, this matrix typically contains less columns than the input matrix). Output matrix represents genotypes-corrected matrix (where false positives and false negatives from the input are corrected and each of the missing entries set to 0 or 1). Suppose the input file is INPUT_MATRIX.ext, the output matrix will be stored in file OUT_DIR/INPUT_MATRIX.CFMatrix. For example:

 input file: data/ALL2.SC
output file: OUT_DIR/ALL2.CFMatrix

Parameters

Parameter Description Default Mandatory
-i Path to single-cell data matrix file - ๐Ÿ”˜
-o Output directory current โšช
-b Bounding algorithm 1 โšช
-t Draw output tree with Graphviz - โšช

Example

~$ python main.py -i example/data1.SC -o example -b 2 -t

[02/04 12:53:49] Size: (20, 20)
[02/04 12:53:49] NAValue: 3
[02/04 12:53:49] #Zeros: 226
[02/04 12:53:49] #Ones: 94
[02/04 12:53:49] #NAs: 80
[02/04 12:53:49] Time: 00:00:00
[02/04 12:53:49] #0->1: 10
[02/04 12:53:49] #1->0: 0
[02/04 12:53:49] #na->0: 60
[02/04 12:53:49] #na->1: 20
[02/04 12:53:49] isDone: True
[02/04 12:53:49] The output phylogenetic tree is in 'example' directory!

This is the clonal tree that has been created:

For each node, the number inside the brackets denotes its node id and the number inside the parentheses shows the total number of mutations occurring on the path from the germline (root) to the node (i.e., the total number of mutations harbored by the node). The edge labels represent the number of mutations occurring between a parent and its child node. The complete list of mutations occurring at each edge can be found data1.mutsAtEdges file which contains:

[12]->[11]: mut7 mut8 mut9 mut10 mut11
[12]->[10]: mut0 mut2
[11]->[9]: mut15 mut16 mut17
[10]->[8]: mut1
[9]->[5]: mut18
[8]->[7]: mut3
[8]->[1]: mut12 mut14
[7]->[6]: mut5
[6]->[4]: mut4
[5]->[2]: mut19
[4]->[3]: mut6
[1]->[0]: mut13

Contact

If you have any questions please e-mail us at [email protected] or [email protected].

phiscs-bnb's People

Contributors

faridrashidi avatar turkerfan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.