arborworkflows / arbor Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 221 KB

aRbor, an R package with useful functions for Arbor workflows

R 100.00%

arbor's People

Contributors

Stargazers

Watchers

Forkers

eliotmiller julienneng jeffbaumes

arbor's Issues

aceArbor only supports 2-state characters

Implement phylogenetic signal function that can handle discrete or continuous traits.

legend is missing for ancestral state plot for discrete traits

We need to put the legend back on the tree for the ancestral state plot for discrete traits.
In R I usually use
colors<-rainbow(nlevels(trait))
tiplabels(pch=21, bg=colors[as.numeric(trait)], cex=2.5, adj=1)
nodelabels(pie = ans$lik.anc, piecol = colors, cex = 0.5)
legend(locator(),c(levels(trait)), fill=colors, cex=1, xpd=TRUE, horiz = FALSE)

For aRbor, should this be
legend("topleft",c(levels(trait)), fill=colors, cex=1, xpd=TRUE, horiz = FALSE)

H2 efficiently locate expert mode analyses

Phenomics users requested a better way to find an appropriate analysis.
Instead of using the eyeball click and scrolling through a lot of names.
For example, where is the tree+heatmap function? H2 tell a user where it is?

selective output for easy mode ancestral state continuous traits

Phenomics users requested a button to click that would remove the "wings" on the output plot for easy mode ancestral state continuous traits.

easy mode does not read header line

in both phylogenetic signal and ancestral state, the parser of the trait file does not read the header line.
this happens in both .csv and .tsv formats
instead, the columns are simply Column 1 and Column 2...
and subsequent calculations will fail because the length of the traits is different from the length of the tip labels.
workaround: i left off the headers, and just remembered what was in each column.

include some cospeciation stuff

make.treedata and column names

weird error is returned if data matrix does not have column names. Minimal example (uncomment to fix):

tree<-pbtree(n=100, scale=1)
mm<-cbind(c(1, -0.5), c(-0.5, 1))
z<-sim.char(tree, mm, model="BM")[,,1]

colnames(z)<-c("x", "y")

td<-make.treedata(tree, z)

pull species list from sequence blast

enhance phylopcaArbor functionality

this has literally no flexibility as it stands.

"SYM" unlikely to work for aceArbor

I identified and (I think) fixed an issue with the ER model in aceArbor. I expect the same issue - relating to diversitree constraints - exists for the SYM model. Needs to be checked and fixed.

How to evaluate best fit model in Easy Mode (BM, OU, etc)?

Can we implement Easy Mode / Expert Mode code to evaluate what evolutionary model best fits a dataset, e.g., BM, OU, etc.?
This example is in R-phylo-wiki
Can we calculate multiple models and then provide the user with a likelihood ratio test or comparison of AICc scores, enabling users to select the best fit model.

No Arbor Logo in EasyMode-AncestralStateReconstruction

The image link is broken for the arbor logo in EasyMode - Ancestral State Reconstruction

make.tree type requirement in arbor

within the arbor web interface, it seems that make.tree asks users to choose whether the characters are discreet or continuous, it might be nice for make.tree to be used on all characters and for the functions that use the output to ask for discreet or continuous, as appropriate

aceArbor crashes when a character state is 0

Standardize outputs for physigArbor

All the different types of tests should return consistent outputs.

Tree Metrics

Several users requested a simple tree metrics routine that would output the results of tests like:

Tree Status
is.binary YES
is.ultrametric NO
....
etc.
basically, a large summary of descriptive output that is commonly needed for analyses.

ExpertMode: Docs Pages

When using expert mode, users requested a small box that could be clicked to read a manual page when an analysis is selected.

create general na functions for treedata

I think we need three cases: single column (checks for NAs, removes from data and tree as needed); pairwise (removes any taxa not present in BOTH, for things like PGLS); and multivariate (removes any incomplete taxa, for things like phyloPCA).

EasyMode - Ancestral State Reconstruction Tip Labels Need Color

There is no color in the circles at the tips of the EasyMode - Ancestral State Reconstruction plot. The circles at the tips are also very small compared to the circles at the nodes (which are nicely sized and have the correct color).

See the files in GoogleDrive, "files that failed at Biosphere2", KirrWithHeaderUnderscore.csv and KirrGrafen.phy

discrete character correlations

consistency with s3 generics

Travis throws an error about the treedata s3 functions:

checking S3 generic/method consistency ... WARNING
print:
function(x, ...)
print.treedata:
function(tdObject, ...)
reorder:
function(x, ...)
reorder.treedata:
function(tdObject, order, index.only, ...)
filter:
function(.data, ...)
filter.treedata:
function(tdObject, ...)
mutate:
function(.data, ...)
mutate.treedata:
function(tdObject, ...)
select:
function(.data, ...)
select.treedata:
function(tdObject, ...)
summarise:
function(.data, ...)
summarise.treedata:
function(tdObject, ...)

See section ‘Generic functions and methods’ of the ‘Writing R
Extensions’ manual.

Input Table Preview does not refresh

When I edit an input table (.csv) and drop the new table into easyMode, the file name changes, and the new file is active. However, the input table preview does not change / refresh. It just displays what the previous file was.

select() inside a function doesn't work as expected

I cannot figure out how to use select.treedata() inside another function.

tree<-pbtree(n=100, scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-1:2
x<-sim.history(tree,Q)$states
y<-setNames(as.numeric(x),names(x))

ydf <- as.data.frame(y)
ytd<-make.treedata(tree, ydf)

#either of these work
select(ytd, 2)
select(ytd, y)

# now try in a function
foo<-function(ytd, columnSelect) {
    ynew<-select(ytd, columnSelect)
    return(ynew)
}

# now these don't work, and return what was (to me) a strange error
foo(ytd, 2)
foo(ytd, y)

# I think I get what's happening - the second argument is passed through to dplyr 'select' 
# as 'columnSelect' and the variable within the function is being ignored.
# I am not sure how to fix this.
# there are hacks online but this might be a real issue:
# http://stackoverflow.com/questions/22919448/passing-function-argument-to-dplyr-select

Character type detection

Would be nice to have the ability to "smartly" detect character type. E.g. users often send a vector of 1s and 0s, not a factor - but the data actually does represent a factor.

enhance bisseArbor functionality

Right now this function only fits bisse mk2 using ML. Better to:

deal with mkn
allow user to specify constraints or ER, SYM, ARD
compare trait independent and trait dependent models
run bayesian version

physig should take in treedata objects

physigArbor error

Zach and I tried with his data and got this:

physigArbor(tree, data$SpongeHost)
Error in physigArbor(tree, data$SpongeHost) :
could not find function "detectCharacterType"

Where can we find "detectCharacterType"?

aceArbor character type detection sucks

Just checks for a factor, which is not even what the code expects

summarize.treedata() crashes R

this kills R:

require(aRbor)
data(anolis)
td<-make.treedata(anolis$phy, anolis$dat, name_column=1)
summarize(td)

PGLS

auto-check for singleton nodes

Users continue to attempt to input trees with singleton nodes.
R can read these by:
using package(phytools) read.newick,
followed by collapse.singles,
then write.tree to get a plain newick tree

a check for is.binary.tree, followed by multi2di() is a related issue

auto-check for binary tree

the most common manipulation I need to do to get aRbor to read a tree is to make the tree binary.

Can we add a check when we read trees into any easy mode app:

is.binary.tree(phy) - if it returns FALSE, then do
phy<-multi2di(phy)

check again

is.binary.tree(phy) - if it returns TRUE, continue; if FALSE, need to print an error that Arbor could not generate a binary tree.

birth-death models

continuousCorrelation fails when there are NAs

Install depends on dplyr 0.3, cannot install_github with dplyr 0.2 loaded

check for singleton nodes et al.

See the GoogleDrive folder "files that failed at Biosphere2"
These are from Jenna, a microbiologist in Jonathan's lab at UC-Davis.
The Kirr.csv file needed a header, so I fixed that.
The Kirr.phy file contains singleton nodes.
It draws the following error in EasyMode:
"Analysis failed. Error in read.tree(text = input) : The tree has apparently singleton node(s): cannot read tree file. Reading Newick file aborted at tree no. 1"

For EasyMode to be easy, instead of just giving this error that there is a singleton node, the software should just fix the singleton node for the user:

library(phytools)
phy<-read.newick(input)
phy<-collapse.singles(phy)
phy<-multi2di(phy)

(could also check minimum edge length)
if (min(phy$edge.length) == 0) { phy$edge.length = phy$edge.length + 0.00001 }

because we really want

no singleton nodes
no multichotomous branches
no zero length branches

To fix all this for Jenna, I ran in R

phy<-read.newick("Kirr.phy")
Read 1 item
phy<-collapse.singles(phy)
phy<-multi2di(phy)
min(phy$edge.length)
[1] 1e-06
write.tree(phy,"KirrCollapsed.phy")

Now the error in easy mode is

"Analysis failed. Error in make.treedata(tree, table) : No matching names found between data and tree"

This one is easier. The tree has underscores and the .csv does not. Change the .csv.
A good error message here, so easily noticed and fixed.

Now the error in easy mode is

"Analysis failed. Error in check.tree(tree) : 'tree' must be ultrametric"

newphy<-compute.brlen(phy,method="Grafen")
write.tree(newphy,"KirrGrafen.phy")

Now easyMode works, but 3 other issues came up

no color in circles at tips; circles at nodes much larger than circles at tip (which should just be larger)
when I drop in a new .csv file, the preview does not change, even though the file name changes.
broken arbor logo in EasyMode-Ancestral State Reconstruction

PIC correlation test

GeoSSE

add geosse capabilities

bisse

Provide an explanation box in EasyMode

Phenomics users requested an explanation box for each easy mode analysis.
We need to display the exact test that was completed (e.g. what would the equivalent command(s) in R be?).
We need a brief explanation text of what the test is doing.
We need to re-state clearly the model used in the test (e.g. ace with what parameters?).

sniff input files

Can expert and easy modes "sniff" files to figure out if they are:
nexus vs newick format trees
.csv or .tsv

Major issues include:
several programs reserve ".tre" for trees and ".phy" for phylip-format data files

read nexus files

Arbor expert and easy modes need to be able to read nexus files to obtain a tree and the associated traits

model adequacy plots for fitDiscrete and fitContinuous

Histograms of simulations versus actual data

integration with iPlant

Users requested the ability to integrate with their iPlant accounts.
Hosting a version of aRbor at iPlant/atmosphere could allow:

user authentication based on existing iPlant users
file directory system so that existing iPlant users can pull their data into iPlant.

Need Bayesian ancestral state reconstruction

make.treedata should delete empty columns

We should check for and prune out empty columns and rows

functions not working in Arbor

Phylogenetic signal (Arbor) - csv=HeliContRare.csv (heliconia) Tree=HeliContRare.phy (make.treedata phy (2) (heliconia)
Error in eval(expr, envir, enclos) :
could not find function "detectCharacterType"

make.treedata - It seems that if there are NAs in the csv, even if you have run make.treedata the output doesn't work in some functions, maybe because the presence of some characters prevents the tree from being rarified, then when you select a column to run the analysis on, there is missing data and you get an error. If you remove all rows with NAs from the character table, then run make.treedata, it rarifies the tree and you can use the output in analyses.