GithubHelp home page GithubHelp logo

gppf's Introduction

General Parsimony Phylogeny models from Frequencies

Requirements

To run gppf are required:

  • Python 2.7.12 +
  • Gurobi 6.5.2 +

To reproduce experiments described in the paper the following are also required:

  • R 3.3.3 with packages:
    • ggplot2
    • grid
    • plyr
    • readr

For generating samples:

  • ms by Hudson

gppf

Start the program with:
gppf [-h] -m {perfect,persistent,dollo,caminsokal} -f FILE [-k K] -t TIME -c CLONES [-e]
Where:

-m(--model) {perfect,persistent,dollo,caminsokal} is a required arguments to specify to the phylogeny model

-f(--file) FILE Specifies the path of the input file.

-k K Specifies the k-value of the selected model. (Persistent(k), Dollo(k),Camin-Sokal(k))

  • For a persistent model without restrictions, use -k full.
  • Do not specify any k for Perfect Pyholgeny model. If specified it will be ignored.

-t(--time) TIME Specifies the maximum time allowed for the computation. Type -t 0 to not impose a limit.

-c(--clones) CLONES Specifies the number of clones allowed, expressed as a fraction of the input mutations. Eg -c 0.8. The actual amount of maximum clones used is calculated by : [clones] * #Mutations.

-e(--exp) Set this parameter to output experimental-format results. See experimental results.

Example of commands:

  • ./gppf -m persistent -f data/simulated/n10_m20/21.sim -t 0 -c 0.8 -k 2 starts a Persistent(2) model without time limit and with a clone limit of 80%
  • ./gppf -m dollo -f data/simulated/n10_m20/21.sim -t 300 -c 1 -k 4 -e starts a Dollo(4) model with a time limit of 300 seconds (5 minutes), with a clone limit of 100% and with a experimental-format output.

Input format

gppf accepts two different input format:

  • The simulated data format, for a file with extension .sim (required), is a frequency matrix F of tab separated values. The first line is a dummy line of mutation names.

    Mut_0 Mut_0- Mut_1 Mut_1- Mut_2 Mut_2-
    0.4584 0.0000 0.0970 0.0000 0.1630 0.1630
    0.6222 0.2443 0.1420 0.2443 0.1073 0.1073
    0.6450 0.1400 0.1539 0.1400 0.1270 0.1270
    0.6110 0.0805 0.0838 0.0805 0.0794 0.0794
    0.6930 0.1108 0.0920 0.1108 0.0792 0.0792
  • The real data format is a format were the input is a tab-separated text file. The first line contains the sample names. The first column contains mutations ids. The consecutive pair of columns contains read counts for reference and mutated alleles.

Detailed-format Output

When not running an experiment gppf outputs the detailed information in detail in a file called res_INPUTNAME.txt created in the same folder where gppf is running. The output file contains:

  • Elapsed time
  • Input name
  • Number of samples
  • Number of mutations
  • Clone limit
  • Total clone used
  • Model (k)
  • Total error
  • Solution accuracy
  • Clonal matrix
  • Usage matrix
  • Extended matrix
  • Error matrix
  • Inferred matrix F
  • Tree in DOT code

We can see as an example the output of CLL077, to run the execution of CLL077 as described in the paper use:

./gppf -m persistent -f data/real/cll077_deep.txt -t 0 -c 1 -k 2

Experimental-format Output

In the experiments, the program does not output all the informations regarding the clonal matrix, the usage matrix, and does not print a tree of the reconstructed phylogeny. It instead prints informations useful for testing. In specific it prints a CSV file with the following informations:
matrix_name, num_samples, num_mutations, mut_mod, clone_used, k, time, total_error, accuracy

We can see as an example the Dollo(2) output of Exp.2

Replicating the experiments in the paper

To start the experiments described in the paper, run the bash files start_exp1.sh and start_exp2.sh. Note that these experiments can be easily parallelized by separating the for cycles in different files or sessions. The bash files also recreate the plot and the table present in the paper by running programs plot_from_csv.R and make_table.py. The last two programs require the output file of gppf to be in the root directory, (as default).

Exporting the model in MPS format

It is also possible to export the ILP model to the standard MPS format using the --mps argument. With this flag gpps will output the model in MPS format that can be fed to any other Solver that support it, like CPLEX, Google OR-Tools, etc.

Use on Single Cell Sequencing Data

We provide here another tool, called gpps, similar to the previous one, that can be used to infer cancer progressions from single cell data. Differently from the previous tool, gpps employs a maximum likelihood search to find the best tree that explain the input, starting from single cell data.

The tool can be run with the following arguments:

  -m {perfect,persistent,dollo}, --model {perfect,persistent,dollo}
  -f FILE, --file FILE  path of the input file.
  -k K                  k-value of the selected model. Eg: Dollo(k)
  -t TIME, --time TIME  maximum time allowed for the computation. Type 0 to
                        not impose a limit.
  -o OUTDIR, --outdir OUTDIR
                        output directory.
  -e, --exp             set -e to get experimental-format results.
  -b FALSEPOSITIVE, --falsepositive FALSEPOSITIVE
                        set -b False positive probability.
  -a FALSENEGATIVE, --falsenegative FALSENEGATIVE
                        set -a False negative probability.

Where -a and -b are respectively the false negative and false positive rates for the Single Cell Sequencing.

gppf's People

Contributors

gdv avatar murraypatterson avatar ricardoanibalmatamorosaragon avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gppf's Issues

gppl

prj_ppp@lxc_prj_ppp:~/cancer_evolutinary_algorithms/cancer_ea_experiments/softwares/gppf$ python gppl
  File "gppl", line 244pha)
    for k in range(args.k):
                          ^
TabError: inconsistent use of tabs and spaces in indentation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.