GithubHelp home page GithubHelp logo

conifer's People

Contributors

alexandrebouchard avatar jewellsean avatar miguelbiron avatar sohrabsa avatar zhaottcrystal avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conifer's Issues

Head of conifer does not compile

Unless there is something wrong on my side, there appears to be missing a RunFacility file.

In the future, please test code using gradle test to avoid compilation issues.

TestSimplePhyloModel fails

TestSimplePhyloModel fails. Unclear when this got broken (many recent
changes seem to have been pushed without running the test suite).

Latest version of repository breaks compilation

Please run gradle build before pushing changes to repository

The error is:

Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/PairProteinModel.java:14: package briefj.tomove does not exist
import briefj.tomove.Results;
                    ^
/Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/PairProteinModel.java:52: cannot find symbol
symbol  : variable Results
location: class conifer.PairProteinModel
  private final PrintWriter treeWriter = BriefIO.output(Results.getFileInResultFolder("trees.newick"));
                                                        ^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /Users/bouchard/Documents/workspace/conifer/src/main/java/conifer/io/newick/NewickParser.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
2 errors
:compileJava FAILED

Limitations of newick parser

For example, the following could not be parsed:

((A:1,(X:1,E:1):1):1,(C:1,B:2):1);

Various other problems (stack overflow when duplicate leaf names, bad error message too for empty leaves, etc)

JEBL Maven repository

Currently the jebl 2.0 jar file that is used for majority rule tree construction is downloaded upon building from a git repository. It would be more elegant and less code in the gradle build file if it had it's own repository (e.g., Maven).

Symlink warning

Running BlangSmallExample results in the following warning:

"Warning: command ln returned non-zero code (1). Output so far:
ln: results/latest: No such file or directory"

Here the project's main directory is under directories with whitespace in their names; that's what's causing the problem.

Possible Solutions

The buildCommand method in the binc package (line 323) that ultimately handles executing commands splits the input string (args) by white space regardless of it being escaped or not.

Two fixes that require minimal changes could be:

  1. Escape the input string path with backslashes and replace the split regex in binc by:
    (?<!\\\\)\\s+
    i.e., Don't match a whitespace that follows a backslash.
  2. Double-quote the path string and change split regex to:
    \\s(?=(?:(?:[^\"]*+\"){2})*+[^\"]*+$)
    i.e., Don't match a whitespace that is enclosed in double-quotes.
    The input string should be modified in the Results class​​​'s refreshSoftlink method (line 70).

Boolean index did not match indexed array

Hello,

I am experiencing the following problems.
Let's say I made a custom probes.txt containing 101 probes in a chromosome.

chr start stop name
1 45794966 45795119 MUTYH
1 45796176 45796239 MUTYH
1 45796842 45797016 MUTYH
1 45797080 45797238 MUTYH
1 45797321 45797531 MUTYH
....

Whe I run "conifer.py analyze ..", I got the following message.

conifer.py:157: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 101 but corresponding boolean dimension is 100

And when I run "conifer.py export ..", the resultant bed file contained only 100 rows and the last region in the probe.txt was omitted.

Could you tell me about how to resolve this problem?

Thanks

Duplicated move: RealVectorMHProposal

The class RealVectorMHProposal appears both in conifer.moves and blang.mcmc. Eventually, should be migrated to blang.mcmc. Note that the current implementation (Aug 26, 2014) of blang.mcmc.RealVectorMHProposal seems currently incorrect though.

DiscreteFactors with negative entries

Keeping Tree 1 the same and using the following for tree 2, will result in the error:
FES_4.fasta as multiple alignment file
FES.ape.4.nwk as starting phylogeny

Note
These data files are downloaded as part of the build process and could be found under the src/main/resources/conifer/sampleInput directory.

The Error

Exception in thread "main" java.lang.RuntimeException: DiscreteFactors should not have negative entries
    at bayonet.marginal.DiscreteFactorGraph$DiscreteUnaryFactor.<init>(DiscreteFactorGraph.java:703)
    at bayonet.marginal.DiscreteFactorGraph$DiscreteUnaryFactor.<init>(DiscreteFactorGraph.java:623)
    at bayonet.marginal.DiscreteFactorGraph.pointwiseProduct(DiscreteFactorGraph.java:380)
    at bayonet.marginal.DiscreteFactorGraph$1.pointwiseProduct(DiscreteFactorGraph.java:337)
    at conifer.moves.SPRMove.execute(SPRMove.java:105)
    at blang.mcmc.MoveSet.sweep(MoveSet.java:78)
    at blang.MCMCAlgorithm.run(MCMCAlgorithm.java:37)
    at blang.MCMCRunner.run(MCMCRunner.java:25)
    at conifer.HierarchicalPhyloModel.main(HierarchicalPhyloModel.java:110)

CODA figure overplotting

With the new changes to bayonet's management of the CODA plots, conifer can no longer produce readable plots when there are many hyperparameters.

I will commit a fix to this in bayonet that also adds additional mcmc diagnostics. If we decide later on that all of these diagnostics are overkill (in terms of HD space), we can easily limit the number of diagnostic plots.

Naming conventions

The following standard conventions should be followed:

  1. Class names should start with a capital letter (this should be corrected in pairProteinModel for example, into PairProteinModel.
  2. Constants (fields marked with the modifiers final static) should be in ALL_CAPS.
  3. Variable and methods should start with a small letter (this should be corrected in ExpFamMixture.Accordance45() for example).

Please use the refactoring options in eclipse to correct these.

ggplot-like coda errors (more)

It appears that the ggplot-like plots fail for small chains (<10)

Two different errors:
I. For chain length = 2

Error in exists(name, env) : argument "env" is missing, with no default
Calls: ggmcmc ... ggplot_build -> scales_add_missing -> find_global -> exists

II. For chain length = 6

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 1, 0
Calls: ggmcmc ... ggs_compare_partial -> rbind -> cbind -> cbind -> data.frame

Simple sol'n: change first plotting interval to >10., that is change this line

This is not a fatal error as it does not break compilation or successful running of the mcmc algo. I have asked this as a question in case there are unintended consequences (I've tested locally without issues).

Rounding problem in SPRMove

If SimplePhyloModel is run with AllBranchesScaling move excluded, i.e.,

runner.factory.excludeNodeMove(AllBranchesScaling.class);

it will crash with the following error:

Exception in thread "main" java.lang.RuntimeException: Max number of transitions exceeded 1000000
    at conifer.ctmc.EndPointSampler.sampleNTransitions(EndPointSampler.java:142)
    at conifer.ctmc.EndPointSampler.sample(EndPointSampler.java:58)
    at conifer.models.MultiCategorySubstitutionModel.samplePosteriorPaths(MultiCategorySubstitutionModel.java:130)
    at conifer.models.MultiCategorySubstitutionModel.samplePosteriorPaths(MultiCategorySubstitutionModel.java:84)
    at conifer.moves.PhyloHMCMove.execute(PhyloHMCMove.java:55)
    at blang.mcmc.MoveSet.sweep(MoveSet.java:78)
    at blang.MCMCAlgorithm.run(MCMCAlgorithm.java:37)
    at blang.MCMCRunner.run(MCMCRunner.java:25)
    at conifer.TestPhyloModel.main(TestPhyloModel.java:80)

The data files used may be found here:
Multiple Alignment File
The tree

Incompatibility with Java 11 (with a potential fix)

tl;dr: the issue can be fixed by

  • Updating the gradle wrapper to the one in the blangSDK repo
  • Updating the build.gradle script (attached at the bottom)
  • Renaming GLMExample.bl to GLMExample.bl.BAD

Steps to reproduce

git clone [email protected]:alexandrebouchard/conifer.git
cd conifer
./gradlew --stop

This triggers downloading gradle 4.2.1, which after unpacking fails with

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine java version from '11.0.7'.

11.0.7 is my java version

$ java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-3ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-3ubuntu1, mixed mode, sharing)

Steps I've tried to fix it

1) Replacing the wrapper with the one in the blangSDK repo

I am able to run ./gradlew clean, which now fails with

FAILURE: Build completed with 2 failures.

1: Task failed with an exception.
-----------
* Where:
Build file '/home/mbiron/opt/conifer/build.gradle' line: 110

* What went wrong:
A problem occurred evaluating root project 'conifer'.
> No such property: archiveTask for class: org.gradle.api.internal.artifacts.dsl.LazyPublishArtifact

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
==============================================================================

2: Task failed with an exception.
-----------
* What went wrong:
A problem occurred configuring root project 'conifer'.
> Failed to notify project evaluation listener.
   > 'java.lang.String org.gradle.api.tasks.compile.CompileOptions.getBootClasspath()'
   > 'java.lang.String org.gradle.api.tasks.compile.CompileOptions.getBootClasspath()'

2) Replacing wrapper and mimicking build.gradle in blangSDK and inits repos

Now I can successfully run ./gradlew clean. However, ./gradlew installDist fails with

> Task :generateXtext
ERROR:Type mismatch: cannot convert from MultiCategorySubstitutionModel<ExpFamMixture> to SequenceAlignment (file:/home/mbiron/opt/conifer/src/main/java/conifer/GLMExample.bl line : 20 column : 5)
ERROR:Type mismatch: cannot convert from SequenceAlignment to MultiCategorySubstitutionModel<ExpFamMixture> (file:/home/mbiron/opt/conifer/src/main/java/conifer/GLMExample.bl line : 20 column : 24)
WARNING:The import 'conifer.io.TreeObservations' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 3 column : 1)
WARNING:The import 'conifer.moves.PhyloHMCMove' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 6 column : 1)

> Task :generateXtext FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':generateXtext'.
> Xtext validation failed, see build log for details.

It looks like GLMExample.bl is attempting to do something which has now been forbidden in Java 11.

3) Rename this file to GLMExample.bl.BAD

Now installDist successfully completes Task :generateXtext but fails at

> Task :startScripts FAILED

FAILURE: Build failed with an exception.

* What went wrong:
A problem was found with the configuration of task ':startScripts'.
> No value has been specified for property 'mainClassName'.

4) Turn off startScripts

Now it works!

$ ./gradlew installDist

> Task :generateXtext
WARNING:The import 'conifer.io.TreeObservations' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 3 column : 1)
WARNING:The import 'conifer.moves.PhyloHMCMove' is never used. (file:/home/mbiron/opt/conifer/src/main/java/conifer/EvoGLM.bl line : 6 column : 1)

> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

BUILD SUCCESSFUL in 30s
5 actionable tasks: 5 executed

Additional resources

You can find the working build.gradle here (I had to add .txt to it so that github allowed it).

Best way to fix the topology, and not the branch-length

To fix the topology but have the branch-length as latent variables, is the following correct?

Exclude the moves that propose a new topology, namely the following:

factory.excludeNodeMove((Class<? extends Move>) (Object) SingleNNI.class);
factory.excludeNodeMove(SPRMove.class);

Related to issue #22.

Clean SimplePhyloModel

It seems that SimplePhyloModel now contains a series of experiments that make it hard to reference it as the base model. Would it be helpful to move this functionality into another class?

File that should and should not go in the git repository

Only source code should go in the git repository (this include non-java source code such as JSON specification of models). To share data, the best solution is to commit scripts that download and preprocess the data.

A few small files could potentially be added so that the user can run the code right away, but only for this tutorial-type purpose. These files should not be in the root of the repository, which we should try to keep clean as much as possible.

Compatibility to 1.6 broken

Commit 53a1ef3 breaks compatibility with Java 1.6. The feature breaking compatibility (System.lineSeparator()) does not seem to be worth the break. I would suggest to wait that we decide that we need 1.8 before breaking compatibility with 1.6.

ExpFamMixture/rate matrix sampling

This question is in regards to a basic phylo model, found as a gist here.

It is basic in the sense that it is a striped down version of SimplePhyloModel (ie. does not exclude nodes etc), and I am using it to test the rate matrix sampling of ExpFamMixture.

My question is two part:

(I) Although a kimura1980 model is specified, all of the rate matrix entries are treated as parameters; whereas, K1980 is a one-parameter family of evolutionary models;
(II) The statio parameters are <0 for some of the states. How is this possible / what am I missing?

@sohrabsa : I thought you might have experience with this?

I assume that K1980 is only a template that the initialization is built off?

[Aside: to successfully run the gist with readable coda plots you need bayonet 2.3.5, but this has not yet been pushed to Alex's maven repo. As a workaround, build bayonet 2.3.5 and add this library in eclipse]

Best way to fix topology and branch-lengths

Is the following the correct way to stop a variable from being sampled?

To have both the topology and branch-length variables fixed, remove the priors over them, for instance in SimplePhyloModel.java, remove the treePrior and branchLengthHyperParameter factors by removing the following lines:

@DefineFactor
NonClockTreePrior<RateParameterization> treePrior = 
NonClockTreePrior
.on(likelihood.tree);

@DefineFactor
Exponential<Exponential.MeanParameterization> branchLengthHyperPrior = 
Exponential
.on(treePrior.branchDistributionParameters.rate)
.withMean(10.0);

Code duplication

Many blocks contain nearly duplicate versions of large chunks code, making the project harder to maintain.

For example:

  • ExpFamMixture.proteinGTR() vs ``ExpFamMixture.dnaGTR()`
  • ExpFamMixture.simpleProteinPairStateIndexer() vs dna/protein counterpart

Pull the common blocks into helper private function (static if possible).

the eigendecomposition of EJMLUtils class is not stable

When I use EJMLUtils.class to do eigendecomposition of the rate matrix when the dimension of the rate matrix is 20_20 but it is represented by 36 features, during mcmc iterations, we might encounter two types of errors: the eigendecomposition of the rate matrix represented by V_D*Vinverse is not close to the rate matrix itself. This problem relies on the conditional number of eigenvector matrix V. The second error is when calculating the stationary distribution, it can not find an eigenvalue close to 1. However, I find out the code works with the initial rate matrix when the weights for the 36 features are all 1. This rate matrix can be found in accordance.txt. It does not work during the mcmc iterations. (I will print the rate matrix that does not work later.)

To fix it, I use the eigendecomposition from Jama library. The new code that works is copied as below.

package conifer.ctmc;
import org.ejml.data.DenseMatrix64F;
import org.ejml.data.Eigenpair;
import org.ejml.factory.DecompositionFactory;
import org.ejml.factory.EigenDecomposition;
import org.ejml.ops.EigenOps;
import org.ejml.simple.SimpleMatrix;

import tutorialj.Tutorial;
import bayonet.distributions.Multinomial;
import bayonet.math.EJMLUtils;
import bayonet.math.NumericalUtils;
import bayonet.math.EJMLUtils.SimpleEigenDecomposition;


import java.io.*;
import java.util.*;

import org.jblas.ComplexDoubleMatrix;
import org.jblas.DoubleMatrix;
import org.jblas.MatrixFunctions;
import org.junit.Assert;
import org.junit.Test;

import Jama.EigenvalueDecomposition;
import Jama.Matrix;


// Added by Tingting
import org.jblas.Eigen;
/**
 * A continuous time Markov chain. The main functionalities consists
 * in computing a marginal transition probability and a stationary distibution
 * (see below).
 * 
 * This implementation is based on caching the eigendecomposition
 * of the provided rate matrix.
 * 
 * @author Alexandre Bouchard ([email protected])
 *
 */
public class EigenCTMC implements CTMC
{
  private static final double THRESHOLD = 1e-6;
  private final SimpleEigenDecomposition eigenDecomp; 
  private final double [] stationaryDistribution;
  private final double [][] rates;

  /**
   * Note: if the RateMatrix is changed in place,
   * these changes will not be mirrored by this class.
   * 
   * It should be recreated each time a likelihood 
   * calculation is performed.
   * @param rateMatrix
   */

 public EigenCTMC(double [][] rates)
  {
    RateMatrixUtils.checkValidRateMatrix(rates);
     this.eigenDecomp = SimpleEigenDecomposition(rates); 
     this.stationaryDistribution = computeStationary();
     this.rates = rates;
     }

 public static SimpleEigenDecomposition SimpleEigenDecomposition(double [][] rates)
 {
   Matrix ratesMatrix = new Matrix(rates);
   EigenvalueDecomposition ed = new EigenvalueDecomposition(ratesMatrix);
   Matrix V = ed.getV();
   Matrix D =ed.getD();
   Matrix Vinverse = V.inverse();
   Matrix resultMatrix = V.times(D).times(V.inverse());
   //check if result and rates are close enough
   SimpleMatrix trueMatrix = new SimpleMatrix(rates);
   SimpleMatrix calculatedMatrix = new SimpleMatrix(resultMatrix.getArray()) ;
   if(EJMLUtils.isClose(trueMatrix, calculatedMatrix, THRESHOLD))
   {
     return new  SimpleEigenDecomposition(V, D, Vinverse);
   }else{
     throw new RuntimeException();
     }
   }




 public static class SimpleEigenDecomposition
 {
   public final Matrix V, D, Vinverse, calculatedRateMtx;

   public SimpleEigenDecomposition(Matrix v, Matrix d,
       Matrix vinverse)
   {
     V = v;
     D = d;
     Vinverse = vinverse;
     calculatedRateMtx=V.times(D).times(V.inverse());
   }

 }




  /**
   * Compute and cache the stationary (called by the constructor)
   * @return
   */
  private double[] computeStationary()
  {
    Matrix marginal = new Matrix(marginalTransitionProbability(1.0));
    Matrix exponentiatedTranspose = marginal.transpose();

    double [] result = null;
    final int dim = marginal.getColumnDimension();
    EigenvalueDecomposition eigenDecomp = new EigenvalueDecomposition(exponentiatedTranspose);
    // Find an eigen value equal to one (up to numerical precision)
    // result will hold the corresponding eigenvalue
    double []  realEigenvalues=eigenDecomp.getRealEigenvalues();
    double []  imageEigenvalues=eigenDecomp.getImagEigenvalues();
    double [][] eigenVectorTranspose = eigenDecomp.getV().transpose().getArray();
    for (int i = 0; i < dim; i++)
      if (NumericalUtils.isClose(0.0, imageEigenvalues[i], NumericalUtils.THRESHOLD)&& 
          NumericalUtils.isClose(1.0, realEigenvalues[i], NumericalUtils.THRESHOLD))
      {
        result = eigenVectorTranspose[i];
      }
    if (result == null)
      throw new RuntimeException("Could not find an eigenvalue equal to one. Not a proper rate matrix?");
    double sum = NumericalUtils.getNormalization(result);
    if (sum < 0.0)
    {
      for (int i = 0; i < result.length; i++)
        {
        result[i] *= -1;
        }
      }
    Multinomial.normalize(result);
    return result;
  }

  /**
   * 
   * Fill in ``marginalTransitionProbability()``.
   * 
   * Use the diagonalization method covered in class, using 
   * the eigen-decomposition functionalities provided by EJML.
   */
  @Tutorial(showSource = false, showLink = true)
  public double [][] marginalTransitionProbability(double branchLength)
  {

    DoubleMatrix rateMtx = new DoubleMatrix(eigenDecomp.calculatedRateMtx.getArray());
    rateMtx.muli(branchLength);
    double [][] result = MatrixFunctions.expm(rateMtx).toArray2();
    NumericalUtils.checkIsTransitionMatrix(result);

    for (int row = 0; row < result.length; row++)
      {
        Multinomial.normalize(result[row]);
      }
    return result;
  }


  public double [] stationaryDistribution()
  {
    return stationaryDistribution;
  }

  @Override
  public double[][] getRateMatrix()
  {
    return rates;
  }


//  public static void main(String [] args)
//  {
//    SimpleRateMatrix tryMtx = RateMatrices.accordance();
//    SimpleMatrix trySimpleMtx =new SimpleMatrix(tryMtx.getRateMatrix());
//    RateMatrixUtils.checkValidRateMatrix(tryMtx.getRateMatrix());
//    EJMLUtils.simpleEigenDecomposition(trySimpleMtx);
//    System.out.println("finished");
 //   }

}

Bayonet build

Is it possible that the bayonet version dependency in gradle is not updated? For example, NumericalUtils.getNormalization does not appear in bayonet-2.0.3.jar but is called on line 57 of EigenCTMC.java.

I do, however, see this function in the corresponding github repo for bayonet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.