bmhyb's People
Forkers
djhwuengbmhyb's Issues
Investigate QR or LU decomp over pinv() for inverting a matrix
According to Dave Swofford, performs MUCH better.
check.positive.definite=TRUE is causing errors
Redo sims
Make sure the sim code is right for m!=0.5 (some oddness with the order)
Reload the new BMhyb code to the cluster and run.
Fix wrong model
From Tony:
After considering a more general case (please see the attach figure), we probably can use the code if you agree
V.modified[recipient.index, recipient.index] <-
(V.original[recipient.index, recipient.index] - sigma.sq_flow$time.from.root.recipient[flow.index]) # this equals to sigma.sq_t3 = sigma.sq_(t1+t2+t3) - sigma.sq_(t1+t2)
( flow$m[flow.index]^2 + (1- flow$m[flow.index])^2 ) * (flow$time.from.root.recipient[flow.index]) # this is m^2_var(A) + (1-m)^2var(C) = m^2_(t1+t2) +(1-m)^2(t1+t2)
+2_m_(1-m)V.original[recipient.index, donor.index] # this is 2_m(1-m)cov(A,C) =2_m(1-m)t1 which is equal to 2_m(1-m)_cov(X,Y)
vh
Convert m to gamma
The standard term in the field is gamma, not m, for fraction of inheritance from one ancestor.
Time from donor
I have been trying to test your function GetVModified
on the example network you show on your preprint, but I ran into a problem.
I used the following function to create this network (with t1
, t2
and t3
as in the preprint):
create_paper_network <- function(gamma, t1, t2, t3){
phy <- read.tree(text = paste0("((R:", t3, ",Y:", t3, "):", t1 + t2, ",X:", t1 + t2 + t3, ");"))
network <- list(phy = phy,
flow = data.frame(donor = "X",
recipient = "R",
gamma = gamma,
time.from.root.donor = t1,
time.from.root.recipient = t1 + t2))
network$flow$donor <- as.character(network$flow$donor)
network$flow$recipient <- as.character(network$flow$recipient)
return(network)
}
To plot an example:
gamma <- 0.5
t1 <- 0.3; t2 <- 0.4; t3 <- 0.3; # unit height
network <- create_paper_network(gamma, t1, t2, t3)
PlotNetwork(network$phy, network$flow)
axis(1, at = c(0, t1, t1+t2, t1+t2+t3), labels = c("0", "t1", "t1+t2", "t1+t2+t3"))
Is this network correct ? I tried to copy the format given by outputs of your function SimulateNetwork
, but I might have made a mistake.
Using this network, I had a problem computing the induced variance matrix using GetVModified
:
sigma2 = 1
x <- c(sigma.sq = sigma2, mu = 0, SE = 0)
actual.params <- c("sigma.sq", "mu", "bt", "vh", "SE")
vcv_BMhyb <- GetVModified(x, network$phy, network$flow, actual.params)
This gave me the following result:
R Y X
R 0.65 0.7 0.35
Y 0.70 1.0 0.00
X 0.35 0.0 1.00
There is a problem here with Cov[Y,R] and Cov[X,R]. Applying the formulas, I get:
Cov[X, R] = sigma^2 * gamma * t1 = 0.5*0.3 = 0.15 \neq 0.35
Cov[Y, R] = sigma^2 * (1-gamma) * (t1 + t2) = 0.35 \neq 0.7
I could not explain this discrepancy. Did I misused your functions ? Or are my computations wrong ?
One point that is unclear to me, is that I could find no reference to the parameters time.from.root.donor
(t1) in the code of GetVModified
, that seems essential for the computation of this matrix (but maybe it's hidden in the call of an other function, in which case I might have missed it).
Thank you for your help, and for your package !
Session infos:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BMhyb_1.5.1 ape_4.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 subplex_1.4-1 msm_1.6.4 mvtnorm_1.0-6
[5] lattice_0.20-35 tidyr_0.7.1 corpcor_1.6.9 prettyunits_1.0.2
[9] assertthat_0.2.0 digest_0.6.12 foreach_1.4.3 R6_2.2.2
[13] plyr_1.8.4 phytools_0.6-20 coda_0.19-1 httr_1.3.1
[17] ggplot2_2.2.1 progress_1.1.2 rlang_0.1.2.9000 uuid_0.1-2
[21] lazyeval_0.2.0 curl_2.8.1 data.table_1.10.4 taxize_0.9.0
[25] phangorn_2.2.0 Matrix_1.2-11 RNeXML_2.0.7 combinat_0.0-8
[29] splines_3.4.2 stringr_1.2.0 igraph_1.1.2 munsell_0.4.3
[33] compiler_3.4.2 numDeriv_2016.8-1 geiger_2.0.6 pkgconfig_2.0.1
[37] mnormt_1.5-5 tibble_1.3.4 gridExtra_2.2.1 TreeSim_2.3
[41] expm_0.999-2 quadprog_1.5-5 codetools_0.2-15 XML_3.98-1.9
[45] reshape_0.8.7 viridisLite_0.2.0 dplyr_0.7.2 MASS_7.3-47
[49] crul_0.3.8 grid_3.4.2 nlme_3.1-131 jsonlite_1.5
[53] gtable_0.2.0 magrittr_1.5 scales_0.5.0 stringi_1.1.5
[57] reshape2_1.4.2 viridis_0.4.0 bindrcpp_0.2 scatterplot3d_0.3-40
[61] phylobase_0.8.4 xml2_1.1.1 fastmatch_1.1-0 deSolve_1.20
[65] iterators_1.0.8 tools_3.4.2 rncl_0.8.2 ade4_1.7-8
[69] bold_0.5.0 glue_1.1.1 purrr_0.2.3 maps_3.2.0
[73] plotrix_3.6-6 parallel_3.4.2 survival_2.41-3 colorspace_1.3-2
[77] bindr_0.1 animation_2.5 clusterGeneration_1.3.4
Variance between hybrid descendants
Variance between hybrid descendants
Hi again, @bomeara and @djhwueng
This might be related to #13.
I tried a network a little more sophisticated, with an hybrid having several descendants, here R and Y.
## Underlying tree
t1 <- 0.3; t2 <- 0.4; t3 <- 0.3;
phy <- read.tree(text = paste0("((R:", t3, ",Y:", t3, "):", t1 + t2, ",X:", t1 + t2 + t3, ");"))
## Network
don_recp <- expand.grid(c("X"), c("Y", "R"))
network <- list(phy = phy,
flow = data.frame(donor = don_recp[,1],
recipient = don_recp[,2],
gamma = rep(gamma, 2),
time.from.root.donor = rep(t1, 2),
time.from.root.recipient = rep(t1, 2)))
network$flow$donor <- as.character(network$flow$donor)
network$flow$recipient <- as.character(network$flow$recipient)
## Plot
PlotNetwork(network$phy, network$flow)
axis(1, at = c(0, t1, t1+t2, t1+t2+t3), labels = c("0", "t1", "t1+t2", "t1+t2+t3"))
I tried to respect your format for the flow matrix, using your description here. Is this network correctly defined ?
I then tried to compute the associated variance matrix.
> sigma2 = 1
> x <- c(sigma.sq = sigma2, mu = 0, SE = 0)
> actual.params <- c("sigma.sq", "mu", "bt", "vh", "SE")
> GetVModified(x, network$phy, network$flow, actual.params)
R Y X
R 0.85 0.70 0.15
Y 0.70 0.85 0.15
X 0.15 0.15 1.00
In this matrix, if the network is correctly defined and my computations right, I think that Cov[R,Y] is not correct. I think it should be:
Cov[Y,R] = sigma^2 * [(gamma^2 + (1-gamma)^2)*t1 + t2] = 0.55 \neq 0.70
What do you think about it ? Did I make a mistake somewhere ?
I did not dive into your code very deep, but from what I understood of your algorithm, you are modifying all the couple (recipient, donors) one by one (browsing through your flow matrix), but never the couples (recipient1, recipient2), when there are several descendants from a single event, as it is the case here.
In the example above, the function indeed gives Cov[R,Y]=0.70, which seems like the non-actualized variance one would get from the underlying tree.
But it's possible I misunderstood something, please correct me if I'm wrong !
Thanks again !
Session infos:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BMhyb_1.5.1 ape_4.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 subplex_1.4-1 msm_1.6.4 mvtnorm_1.0-6
[5] lattice_0.20-35 tidyr_0.7.1 corpcor_1.6.9 prettyunits_1.0.2
[9] assertthat_0.2.0 digest_0.6.12 foreach_1.4.3 R6_2.2.2
[13] plyr_1.8.4 phytools_0.6-20 coda_0.19-1 httr_1.3.1
[17] ggplot2_2.2.1 progress_1.1.2 rlang_0.1.2.9000 uuid_0.1-2
[21] lazyeval_0.2.0 curl_2.8.1 data.table_1.10.4 taxize_0.9.0
[25] phangorn_2.2.0 Matrix_1.2-11 RNeXML_2.0.7 combinat_0.0-8
[29] splines_3.4.2 stringr_1.2.0 igraph_1.1.2 munsell_0.4.3
[33] compiler_3.4.2 numDeriv_2016.8-1 geiger_2.0.6 pkgconfig_2.0.1
[37] mnormt_1.5-5 tibble_1.3.4 gridExtra_2.2.1 TreeSim_2.3
[41] expm_0.999-2 quadprog_1.5-5 codetools_0.2-15 XML_3.98-1.9
[45] reshape_0.8.7 viridisLite_0.2.0 dplyr_0.7.2 MASS_7.3-47
[49] crul_0.3.8 grid_3.4.2 nlme_3.1-131 jsonlite_1.5
[53] gtable_0.2.0 magrittr_1.5 scales_0.5.0 stringi_1.1.5
[57] reshape2_1.4.2 viridis_0.4.0 bindrcpp_0.2 scatterplot3d_0.3-40
[61] phylobase_0.8.4 xml2_1.1.1 fastmatch_1.1-0 deSolve_1.20
[65] iterators_1.0.8 tools_3.4.2 rncl_0.8.2 ade4_1.7-8
[69] bold_0.5.0 glue_1.1.1 purrr_0.2.3 maps_3.2.0
[73] plotrix_3.6-6 parallel_3.4.2 survival_2.41-3 colorspace_1.3-2
[77] bindr_0.1 animation_2.5 clusterGeneration_1.3.4
Remove metR from dependencies
Hi! I'm the maintainer of the metR package. I'm trying to publish an update to CRAN and your package, which is listed as a reverse dependency, is failing some tests.
Looking at your code, it seems that you used to import metR::geom_contour_fill()
but now those lines are commented out and thus the metR dependency is no longer needed. If this is the case (I might me mistaken) then, I think you should remove metR from Imports. If you'd like, I can submit a pull request with the change.
Thanks!
ConvertPhyAndFlowToPhygraph failing
data("nicotiana")
p <- BMhyb:::ConvertPhyAndFlowToPhygraph(nicotiana$phy, nicotiana$flow)
Creates an object that looks right, but when plotting causes R to abort. Probably something to do with the numbering or ordering of the p$edge object.
plot(ConvertEvonetToIgraphWithNodeNumbers(p), vertex.shape="none")
works but the resulting object looks like the flow is all running the wrong way: out of taxon 1, for example.
Does this satisfy 3 point condition of Ho and Ané?
Lots of failures with generation
ape is having trouble with binding due to branch length issues
bad numerical issues
If the final VCV has any eigen values < 0 (or just tiny), it is not positive definite. Calculation of the likelihoods may have no relation to the truth. And this can happen for realistic values of our parameters. The spline smoothing is supposed to deal with this, but doesn't do it well. We can see if matrices are ultrametric (this does not mean the same as trees being ultrametric) but how do you get likelihood in a region that is biologically realistic but which isn't appropriate?
We've tried simulating, estimating a var() on the simulated tips to get the VCV, and then use that, but that often has issues as well. We implemented a pseudo-determinant: that also failed. We might do ABC, but it's embarrassing.
Add ancestral state estimation
Especially at the parent of the hybrids, but throughout tree is fine.
Investigate traversal up network to modify VCV
advice from Cécile: do VCV for every terminal AND internal node. Then walk up the network from root
Input of enewick?
SNAQ, which is popular for inferring networks, exports enewick format (the one with a hybrid node appearing twice, not the one with multiple trees to represent a network). So perhaps accept this as an option.
Decompose network into all trees
Do this by messing with flow (setting to all possibilities of zero and one), then getting VCV from it (and can then go back to tree like in datelife). Calculate likelihood on each contained tree, weighting by the probability of that tree being seen (using the product of the gammas for the edges leading to that tree). See if the weighted sum of tree likelihoods is same as network likelihood in positive definite vcv case; if so, use for cases where not positive definite.
Several hybridization events
Variance between hybrid descendants
Hi again, @bomeara and @djhwueng
This might be related to #13 and #14.
I tried a network with several hybridization events:
gamma1 <- 0.5; gamma2 <- 0.5;
## Underlying tree
t1 <- 0.2; t2 <- 0.2; t3 <- 0.2; t4 <- 0.2; t5 <- 0.2;
phy <- read.tree(text = paste0("(((R:",t4+t5,",Y:",t4+t5,"):",t3,",X:",t3+t4+t5,"):",t1+t2,",Z:",t1+t2+t3+t4+t5,");"))
plot(phy)
## Network
don_recp <- rbind(expand.grid(c("Z"), c("Y", "R", "X")),
expand.grid(c("X"), c("R")))
network <- list(phy = phy,
flow = data.frame(donor = don_recp[,1],
recipient = don_recp[,2],
gamma = c(rep(gamma1, 3), gamma2),
time.from.root.donor = c(rep(t1, 3), t1+t2+t3+t4),
time.from.root.recipient = c(rep(t1, 3), t1+t2+t3+t4)))
network$flow$donor <- as.character(network$flow$donor)
network$flow$recipient <- as.character(network$flow$recipient)
## Plot
PlotNetwork(network$phy, network$flow)
axis(1, at = c(0, t1, t1+t2, t1+t2+t3, t1+t2+t3+t4, t1+t2+t3+t4+t5),
labels = c("0", "t1", "t1+t2", "t1+t2+t3", "t1+t2+t3+t4", "t1+t2+t3+t4+t5"))
This gives the folowing variance matrix:
> sigma2 = 1
> x <- c(sigma.sq = sigma2, mu = 0, SE = 0)
> actual.params <- c("sigma.sq", "mu", "bt", "vh", "SE")
> GetVModified(x, network$phy, network$flow, actual.params)
R Y X Z
R 0.8 0.6 0.6 0.1
Y 0.6 0.9 0.4 0.1
X 0.6 0.4 0.9 0.1
Z 0.1 0.1 0.1 1.0
I think that the variance of R is not coherent with the model of trait evolution. If my computations are correct, we should have:
Var[R] = (gamma2^2 + (1-gamma2)^2)*((gamma1^2 + (1-gamma1)^2)*t1+t2+t3+t4)
+ 2*gamma2*(1-gamma2)*((gamma1^2 + (1-gamma1)^2)*t1+t2) + t5 = 0.7 \neq 0.8
(Note that the covariances between R and Y and X might also have problems, see #14).
Browsing through the code, this might be linked with the fact that a new hybridization "erases" an older one in your algorithm. Indeed, all the computations are made using V.original
, that do not take ancestral hybrids into account. Here, if there were only one hybridization (the second one), then we would have:
Var[R] = (gamma2^2 + (1-gamma2)^2)*(t1+t2+t3+t4) + 2*gamma2*(1-gamma2)*(t1+t2) + t5 = 0.8
which is the result given by GetVModified
.
I think this is a seperate problem from the two other ones, hence the new issue. Again, I'm sorry if I mis-used your functions or made mistakes, please correct me if I did.
Thanks !
Session infos:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BMhyb_1.5.1 ape_4.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 subplex_1.4-1 msm_1.6.4 mvtnorm_1.0-6
[5] lattice_0.20-35 tidyr_0.7.1 corpcor_1.6.9 prettyunits_1.0.2
[9] assertthat_0.2.0 digest_0.6.12 foreach_1.4.3 R6_2.2.2
[13] plyr_1.8.4 phytools_0.6-20 coda_0.19-1 httr_1.3.1
[17] ggplot2_2.2.1 progress_1.1.2 rlang_0.1.2.9000 uuid_0.1-2
[21] lazyeval_0.2.0 curl_2.8.1 data.table_1.10.4 taxize_0.9.0
[25] phangorn_2.2.0 Matrix_1.2-11 RNeXML_2.0.7 combinat_0.0-8
[29] splines_3.4.2 stringr_1.2.0 igraph_1.1.2 munsell_0.4.3
[33] compiler_3.4.2 numDeriv_2016.8-1 geiger_2.0.6 pkgconfig_2.0.1
[37] mnormt_1.5-5 tibble_1.3.4 gridExtra_2.2.1 TreeSim_2.3
[41] expm_0.999-2 quadprog_1.5-5 codetools_0.2-15 XML_3.98-1.9
[45] reshape_0.8.7 viridisLite_0.2.0 dplyr_0.7.2 MASS_7.3-47
[49] crul_0.3.8 grid_3.4.2 nlme_3.1-131 jsonlite_1.5
[53] gtable_0.2.0 magrittr_1.5 scales_0.5.0 stringi_1.1.5
[57] reshape2_1.4.2 viridis_0.4.0 bindrcpp_0.2 scatterplot3d_0.3-40
[61] phylobase_0.8.4 xml2_1.1.1 fastmatch_1.1-0 deSolve_1.20
[65] iterators_1.0.8 tools_3.4.2 rncl_0.8.2 ade4_1.7-8
[69] bold_0.5.0 glue_1.1.1 purrr_0.2.3 maps_3.2.0
[73] plotrix_3.6-6 parallel_3.4.2 survival_2.41-3 colorspace_1.3-2
[77] bindr_0.1 animation_2.5 clusterGeneration_1.3.4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.