joey711 / phyloseq Goto Github PK

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:

Home Page: http://joey711.github.io/phyloseq/

R 100.00%

phyloseq's People

Contributors

Stargazers

Watchers

Forkers

gjuggler ajm bioinformaticsarchive eclarke anmwinter zachcp balathlkuopio ahhua spholmes davidealbanese praveenkottayi liupfskygre raivokolde tt1972 jooolia chaokang2012 xmachak gunterkathagen plebeau antagomir linls1912 hjanime michberr xinchoubiology jnpaulson fangly cguzma8 rajaldebnath levmorgan granek abhik1368 mafed ajcardenasb iueayhu colinbrislawn randomeffect hensonmw ttlovett nate-d-olson benjjneb jfukuyama taniyarcb al3n70rn brooksomics amckinlay defleury jannacolt lunalovenioo shenmengyuan lwaldron fw1121 digideskio pingpi357 tankmermaid microbiome duydn wanyanw genomewalker microsud sawyerhicks ruinunes25 arnonl mebapa inambioinfo vmikk naupaka proprietary grabear rpatil8 nemochina2008 aboffin nmshahir cedricmidoux krischan ditteol ianartmor auserj laurenms gadams1959 zhilongjia funpipi ferninfm kdpuri taibiahmed alisaei yiluheihei schrenklab jimaz rosemjones ditag cedwardson4 nasha001 jingzhi1239 xtmgah diegoibt mikemc nikeetac birdysui luponsky mbelelep

phyloseq's Issues

Add documentation of subset_species(x, ...) and subset_samples(x, ...)

These new functions are very useful and quick. They add a lot to the basic trimming/subsetting arsenal in phyloseq. They should be a key example in the phyloseq_basics vignette.

Add table of accessor functions to vignette.

This shouldn't be hard to do. All accessors should be in the accessor-methods.r source file.

Add importer for RDP Pipeline

This is one of several importers that should be added

Add Importers for Mothur, pangea, RDP_pipeline, etc.

These are promised, and needed. It would help tremendously to have representative output from the pipelines and/or a formal description of their output. For some reason, these things often seem difficult to find.

Add importer for pyrotagger

This is one of several importers that should be added. Be sure to name the importer function clearly.

cca.phyloseq error - needs robustification

The following should have worked and performed (unconstrained) CA:

CA <- cca.phyloseq(x2)

But instead got the following error and no plot:
"
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "cca.phyloseq", for signature "otuSamTaxTree"
"

Add importer for phylOTU

This is one of several importers that should be added. Be sure to name the importer function clearly.

The phylOTU devel project can be found at https://github.com/sharpton/PhylOTU

Add build method for tre()

The other component data types have a build-method associated if their main argument is a raw data class (matrix or data.frame). This can be done with trees, provided the argument is a character (assume it is specifying a file path) or a "phylo" class tree (in which case, tre() should convert to "phylo4" and return.

Add doc in basics vignette for additional importers besides QIIME

QIIME was the original OTU-clustering pipeline in mind, but there are many more supported by phyloseq, or will be supported soon. These need to be documented in the basics_vignette included in the package. It should be among the first things that users can see, and that they can easily find these examples when they need to lower the bar for using phyloseq on other datasets.

Add citation to UniFrac function

This is for scholarly due-diligence, and also because it is required by Bioconductor:

http://bmf.colorado.edu/unifrac/about.psp

Lozupone, Hamady & Knight, "UniFrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.", BMC Bioinformatics 2006, 7:371

Lozupone, Hamady, Kelley & Knight, "Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities." Appl Environ Microbiol. 2007 Jan 12

Lozupone C, Knight R. "UniFrac: a new phylogenetic method for comparing microbial communities." Appl Environ Microbiol. 2005 Dec;71(12):8228-35.

Add quick alpha-diversity metrics summary

This is almost done for comman-line return. Can even be incorporated into the std show() method for otuTables.

In addition, should add this to the exploratory methods for barplots - taxaplot()

Add tools for visualizing species-network or sample-network

The species abundance table is the expected data source for this. Some code has been contributed to include some additional analysis and possible simulations for comparison and possibly testing. This needs testing, and revision to work with phyloseq framework.

Clean missing factors from variables in a sampleMap

Because categorical variables stored in a data.frame are usually stored as factors, AND because you can subset the elements of a factor, but the associated levels of the factor stays the same, bugs can arise in downstream methods as they attempt to handle levels of a variable that don't exist.

Can have sampleMap instantiation automatically look-for, and remove, levels for which there are no elements in the variable.

Add legends to tree plots, ggplot2 style.

The implementation is pretty nice, but would be much stronger with a legend.

Document data(ex1)

There are currently no details describing example dataset ex1.

tipglom documentation needs update

It is missing a @Usage section

It also could use some clarity in the argument descriptions.

Add check in (w)UniFrac that species match

This is a minor input check that will help clue a user that they haven't properly pruned their data prior to (w)UniFrac.

The user should be dealing with this by creating the complex combined object, since that really is a core aspect of this package. Simply creating the object will fix the problem, so it should suffice to add a test followed by a warning (or error?) message stating that the species components of the tree/table don't agree, and merging them with

phyloseq(...)

will fix the problem.

e.g.

wUniFrac(phyloseq(OTU, tree))

Also state this boldly in the (w)UniFrac documentation. Make it obvious.

merge_phyloseq() bug when merging heterogeneous objects

merging a sampleMap component and an otuTree object:

merge_phyloseq( myotutree, mysamplemap)

Returns the following error:
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic

As does phyloseq( myotutree, mysamplemap), although the latter shouldn't work with one complex object and one component.

This looks as though otuTree is not being splatted appropriately? Unclear what the bug is.

It does appear to be a bug, because simply subsetting each component, and then recombining with phyloseq() creates the expected otuSamTree object without errors. In other words, the following does work with the same data as above:

phyloseq(otuTable(myotutree), mysamplemap, tre(myotutree))

Add importer for AmpliconNoise/Perseus pipeline

This particular pipeline is described recently in BMC Bioinformatics:

http://www.biomedcentral.com/1471-2105/12/38

The projects themselves appear to be hosted on google-code:

http://code.google.com/p/ampliconnoise/

And example data is available at:

http://userweb.eng.gla.ac.uk/christopher.quince/Data/AmpliconNoise.html

However, example output is not provided, nor a formal description of the file formats returned. Would be nice to see this available somewhere. Any comments or suggestions much appreciated.

Add doc describing where to find relevant QIIME files

A default run of the QIIME pipeline will place the 3 or 4 desired output files in different directories.

The big phyloseq vignette (not the basics_vignette included in the package), includes a figure showing the directory structure and where to find the appropriate files.

Make a reference to this in the function documentation, and update the function names in the big vignette, and small one if it happens to mention this as well. readQiime() has been renamed to import_qiime( ).

Build Warning regarding phylobase import

This is some obscure namespace issue that may take some time to resolve. Would be nice if phylobase fixed this with an update. Unfortunately, it is difficult to fix their code to solve the problem if this stays an official dependency. Code is pretty entrenched with phylobase.

It might be possible to switch to a "depends" rather than "imports" dependency, which is not encouraged by Bioconductor, but may be justified if this warning is going to stall submission.

Add extension(s) for running parallelized (weighted) UniFrac

As implemented in R, both UniFrac and weighted UniFrac are very slow. However, both calculations are large sums that are extremely amenable to parallelization. Pre-release versions of phyloseq already included a version of this that worked for weighted-UniFrac, and there is no reason not to include a wrapper, and add the relevant parallel-R package to the "suggests" field of the Description file. A <require(pkg)> line should suffice.

Add hypergeometric test

For testing the effect between sample groups of a taxonomic rank.

A way of accounting for different frequencies of certain Genera whilst testing for significance of certain Genera appearing more (or less) often in a particular group of samples.

Susan Holmes has contributed example code. This can be wrapped or extended. Needs some investigating.

Add latest devel-version of devtools to requirement for devel-phyloseq

The devel version of phyloseq can be installed easily with the install_github() function of the devtools package. However, there are some bugs in the current CRAN version of the devtools package that have been fixed in the latest devel version of devtools available from Hadley. Ironically, you need the CRAN version of devtools in order to go on to install the github version.

install.packages(devtools)
library(devtools)
install_github("devtools"); library("devtools")
install_github("phyloseq", "joey711")

Should do it (provided you installed the other dependencies for phyloseq itself).

The doc for as() coercion methods only describe otuTable/matrix

There is no documentation for other coercion methods. The method as() is general, and probably shouldn't have its doc overwritten by the extensions of as in phyloseq. It needs to be added instead.

Error in tipglom, traced to bug in mergespecies

The following example should work, except it throws an error at the tipglom step...

library("phyloseq")
data(phylocom)
otu <- otuTable(phylocom$sample, speciesAreRows=FALSE)
tree <- as(phylocom$phylo, "phylo4")
x1 <- phyloseq(otu, tree)
print(x1)
library("phylobase")
plot(tre(x1))
x2 <- tipglom(x1, speciationMinLength=2.1)
plot(tre(x2))

Throws the following error:
"
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
"

This has been traced to the internal mergespecies() call. Strangely, the first few mergespecies iterations work, but fail when the pair being merged is c("sp15", "sp16"). Yet, at this point in the partially tipglommed-object, a further mergespecies with c("sp17", "sp18") will still work. Something odd is occurring with that pair, and fixing it will probably fix an important and dangerous bug in mergespecies, which in turn affects many other functions/methods.

coercion of sampleMap class to data.frame returns sampleMap

It is a coercion method, so it should return the specified class, not the original class:

data(ex1)
class( as(sampleMap(ex1), "data.frame") )

[1] "sampleMap"
attr(,"package")
[1] "phyloseq"

Add support for easily performing DPCoA on a phyloseq object

Example code for accomplishing this has already been contributed. It needs to be tested and revised for working within the phyloseq framework.

Improve documentation of geneFilterSample

Right now the example is not even evaluated in the basics vignette. A more informative example needs to be provided, and its behavior tested.

t(otuTable) fails to toggle @speciesAreRows value

Once transposed, an otuTable object should have its @speciesAreRows slot toggled (it's a single, logical value). Without this in place, downstream tools will behave badly and assume that species are samples, etc.

Check out t() and figure out what is behind this, make it work again.

This had been tested thoroughly in early builds.

Fix bug in mt (multtest wrapper)

The following should have generated a reasonable call to mt.minP, but instead threw an error:

mt(x2, "Diet")
Error in mt.checkclasslabel(classlabel, test) :
your setting of test is minP
the test needs to be a single character from c('t',f','blockf','pairt','wilcoxon','t.equalvar')

Need to identify and fix. Might be something missing in wrapper.

Add importer for PANGEA pipeline

This is one of several importers that should be added. Be sure to name the importer function clearly.

Change "phylo" to "phylo4" in inheritance diagram in basics vignette

The "phylo" label is leftover from when we used the "phylo" representation of a phylogenetic tree from "ape" package. Now it is always a "phylo4" tree from the "phylobase" package.

import_qiime(): Add GreenGenes and other alternative ref seq database options

import_qiime(): Add GreenGenes and other alternative ref seq database options. Greengenes in particular is very popular and should be supported alongside the RDP reference that QIIME uses by default.

For an example, a large jagged table of OTU-ID's and their associated taxonomic assignment is available at:

http://greengenes.lbl.gov/Download/OTUs/gg_otus_6oct2010/taxonomies/otu_id_to_greengenes.txt

Here is an example line from that file:

300253 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Oscillospira;s__

The only white space appears to be separating the OTU-ID from the taxonomy. The taxonomy is semicolon-delimited, with a three-character prefix indicating the taxonomic assignment.

Currently, this file appears to be properly read by import_qiime(), but the following things would improve the behavio:

(1) Prefixes should be used in filling the taxonomyTable to make sure assignments go in the correct column. This is useful to enforce consistency of taxonomic rank labels.

(2) The prefixes should be removed from the label after they are used. The rank is already stored as the column header.

(3) The GreenGenes taxonomy leaves a "N__" when no information is included for a particular rank. This should actually be an NA in the taxonomyTable in R. Otherwise there might be some unequal treatment of missing information.

IMPLEMENTATION:
Ideally, there is one additional option in import_qiime() that would be passed along to the internal OTU/tax importer. The default could remain the RDP file structure.

Create a merge_samples() function

Function should take as argument a phyloseq object that is (or contains) a sampleMap, as well as a variate name within the sampleMap's data.frame that will be used to condense the sampleMap via a rowsum() call.

If the primary argument class also contains an otuTable (that is, "otuSam" and its children), then the otuTable should be similarly condensed. The easiest way to achieve this is probably to split the otuTable from the complex object, orient as sample-by-species matrix, perform the identical rowsum() operation as above, and then re-join the two, while noting the new orientation of the otuTable.

NOTE: This will be useful for the fisher.test wrapper for condensing abundance tables into smaller categorically-grouped tables for experimentally informative hypergeometric test.

Add @usage tag and expression to most functions

The Usage field is extremely useful. For S4 methods this will be only in the generic header by default, and then added to specific methods if their usage departs from the generic.

Add wrapper for readQiime named import_qiime

This will pass all the same arguments to readQiime, but follows the naming scheme for the rest of the importer functions:

import_process_file( )

This is useful because >import_ in the R IDE will give a drop down of the available functions that "import" stuff. QIIME should be among the functions in that list.

This can probably be accomplished with an alias. You should check on this in case it is the simplest solution.

'

' @rdname readqiime-method

'

import_qiime <- readQiime

Does mothur include support for sampleMap type files?

Does mothur include support for sampleMap type files? That is, variate data corresponding to each sample? If so, should add this as well for a more comprehensive import that would work really nicely with phyloseq. I'm not sure if mothur supports these kind of analyses. This needs to be checked, and if so, added to the suite of mothur importers in phyloseq.

Add method extensions to subset for H.O. objects

There are some common subsetting tasks related to subsetting portions of a complicated experiment with many samples and nested structure in time/space/replicates. It will be very useful to have a subsetting feature that simplifies subsetting by a sampleMap variable (e.g. sequencing run, date, subject, or other categories), a set of taxonomic categories, etc. Which types of subsetting are allowed should depend on the object class.

Add importer for RDP multclassifier

http://rdp.cme.msu.edu/classifier/classifier.jsp

This is a potential alternative-branch / addition of the RDP pipeline, wherein raw sequences are preprocessed by RDP pipeline, but the classification/clustering is performed by the multclassifier on a user's local machine. The output will be different, and should include taxonomic classification data similar in nature to the output from pyrotagger.

I have not yet tested multclassifier to verify that its performance / output is appropriate for phyloseq. This should not be too difficult to do, and might add an extra feature on the RDP side. For the moment, only the RDP clust file appears to be appropriate for phyloseq import, and this can only create an otuTable (OTU abundance table), as the related data types are absent from the pipeline.

Additional example data sets.

Will take suggestions. So far:

The "human enterotype" dataset
Soils reproducibility dataset

Abridged vignette suitable for package build, inclusion

Need to include an abridged vignette. It must build quickly to not push our size or build-time limite. It must, however, go through the major features of the package. Might limit plots, make them very small, or set eval=FALSE for the difficult ones. Can take from large vignette available by link on front page.

subscript out of bound error during wUniFrac calculation

The following toy example using data from the Picante package should work quickly and without error:

data(phylocom)
tree <- phylocom$phylo
OTU <- phylocom$sample
ex3 <- phyloseq(otuTable(OTU, speciesAreRows=FALSE), tree)
wUniFrac(ex3)

Instead, the following error is received:
"
Error in eval(expr, envir, enclos) : subscript out of bounds
In addition: Warning message:
In asMethod(object) : trees with unknown order may be unsafe in ape
"

Add importer for Mothur otu output

This is one of several importers that should be added.

See:

http://www.mothur.org/wiki/OTU-based_approaches

http://www.mothur.org/wiki/Cluster

http://www.mothur.org/wiki/Group_file

http://www.mothur.org/wiki/List_file

reconcile_species() not pruning tree properly (or at all?)

This was originally detected as a bug in wUniFrac(), which is now closed, because its actually a pruning issue here.

define the example data, from picante package

data(phylocom)
tree <- phylocom$phylo
OTU <- otuTable(phylocom$sample, speciesAreRows=FALSE)
ex3 <- phyloseq(OTU, tree)

reconcile_species(ex3)
otuTree Object

<<< tree >>>
"phylo4"-class phylogenetic tree with
32 tips, and 31 internal nodes.
Tips: sp1 sp2 sp3 ...
Rooted.
<<< tree >>>

OTU Table [6 by 25]:
Samples: clump1, clump2a ... even, random
Species: sp1, sp10 ... sp8, sp9
sp1 sp10 sp11 sp12
clump1 1 0 0 0
clump2a 1 2 2 2
clump2b 1 0 0 0
...

Add abundance table simulation and/or resampling tools

It is a common question that arises in testing. The problem needs to be posed clearly. What features of the abundance table need to be considered

Add distglom() function

distglom() should agglomerate taxa based on distances, closely analogous to the way tipglom() agglomerates based on patristic distances from the phylogenetic tree.

Some OTU-clustering applications produce a distance matrix between all reads (e.g. mothur), and this can be imported and then used to further condense the number of "different" taxa according to their distances.

Modify ape dependency from "depends" to "imports"

ape package now has namespace (v2.8+, as of 2011 - 10 - 26). Modify dependency accordingly:

(1) Change from "Depends:" field to "Imports:" in DESCRIPTION file.

(2) Search all explicit function calls ape:: and adjust header to have tag:
@import ape

(3) Do this in an experimental build, and see if the "phylo" class is still imported as well. There is no "exportClass" statement in the ape NAMESPACE file. It should be considered untested and appears to be a manually-written namespace.

Add importer for RDP Pipeline

This is one of several importers that should be added

Fix bug in calcplot

A perfectly legitimate otuSamTaxTree object sent provided as the sole argument to calcplot, returns the following error:

calcplot(x2)
Error in sampleMap(object) :
error in evaluating the argument 'object' in selecting a method for function 'sampleMap': Error in get(all.vars(X)[1]) : object 'NA' not found

This bug needs to be identified and fixed. Seems to have begun occurring after a fix in cca.phyloseq. Probably not unrelated.

joey711 / phyloseq Goto Github PK

phyloseq's People

Contributors

Stargazers

Watchers

Forkers

phyloseq's Issues

The following example should work, except it throws an error at the tipglom step...

'

'

' @rdname readqiime-method

'

define the example data, from picante package

Recommend Projects

Recommend Topics

Recommend Org

Jobs