GithubHelp home page GithubHelp logo

joey711 / phyloseq Goto Github PK

View Code? Open in Web Editor NEW
570.0 570.0 187.0 114.52 MB

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:

Home Page: http://joey711.github.io/phyloseq/

R 100.00%

phyloseq's People

Contributors

audy avatar benjjneb avatar dtenenba avatar gjuggler avatar hpages avatar jfukuyama avatar jnpaulson avatar joey711 avatar kayla-morrell avatar link-ny avatar michberr avatar mikemc avatar nate-d-olson avatar nturaga avatar sonali-bioc avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phyloseq's Issues

Add Importers for Mothur, pangea, RDP_pipeline, etc.

These are promised, and needed. It would help tremendously to have representative output from the pipelines and/or a formal description of their output. For some reason, these things often seem difficult to find.

cca.phyloseq error - needs robustification

The following should have worked and performed (unconstrained) CA:

CA <- cca.phyloseq(x2)

But instead got the following error and no plot:
"
Error in function (classes, fdef, mtable) :
unable to find an inherited method for function "cca.phyloseq", for signature "otuSamTaxTree"
"

Add build method for tre()

The other component data types have a build-method associated if their main argument is a raw data class (matrix or data.frame). This can be done with trees, provided the argument is a character (assume it is specifying a file path) or a "phylo" class tree (in which case, tre() should convert to "phylo4" and return.

Add doc in basics vignette for additional importers besides QIIME

QIIME was the original OTU-clustering pipeline in mind, but there are many more supported by phyloseq, or will be supported soon. These need to be documented in the basics_vignette included in the package. It should be among the first things that users can see, and that they can easily find these examples when they need to lower the bar for using phyloseq on other datasets.

Add citation to UniFrac function

This is for scholarly due-diligence, and also because it is required by Bioconductor:

http://bmf.colorado.edu/unifrac/about.psp

Lozupone, Hamady & Knight, "UniFrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.", BMC Bioinformatics 2006, 7:371

Lozupone, Hamady, Kelley & Knight, "Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities." Appl Environ Microbiol. 2007 Jan 12

Lozupone C, Knight R. "UniFrac: a new phylogenetic method for comparing microbial communities." Appl Environ Microbiol. 2005 Dec;71(12):8228-35.

Add quick alpha-diversity metrics summary

This is almost done for comman-line return. Can even be incorporated into the std show() method for otuTables.

In addition, should add this to the exploratory methods for barplots - taxaplot()

Add tools for visualizing species-network or sample-network

The species abundance table is the expected data source for this. Some code has been contributed to include some additional analysis and possible simulations for comparison and possibly testing. This needs testing, and revision to work with phyloseq framework.

Clean missing factors from variables in a sampleMap

Because categorical variables stored in a data.frame are usually stored as factors, AND because you can subset the elements of a factor, but the associated levels of the factor stays the same, bugs can arise in downstream methods as they attempt to handle levels of a variable that don't exist.

Can have sampleMap instantiation automatically look-for, and remove, levels for which there are no elements in the variable.

Add check in (w)UniFrac that species match

This is a minor input check that will help clue a user that they haven't properly pruned their data prior to (w)UniFrac.

The user should be dealing with this by creating the complex combined object, since that really is a core aspect of this package. Simply creating the object will fix the problem, so it should suffice to add a test followed by a warning (or error?) message stating that the species components of the tree/table don't agree, and merging them with

phyloseq(...)

will fix the problem.

e.g.

wUniFrac(phyloseq(OTU, tree))

Also state this boldly in the (w)UniFrac documentation. Make it obvious.

merge_phyloseq() bug when merging heterogeneous objects

merging a sampleMap component and an otuTree object:

merge_phyloseq( myotutree, mysamplemap)

Returns the following error:
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic

As does phyloseq( myotutree, mysamplemap), although the latter shouldn't work with one complex object and one component.

This looks as though otuTree is not being splatted appropriately? Unclear what the bug is.

It does appear to be a bug, because simply subsetting each component, and then recombining with phyloseq() creates the expected otuSamTree object without errors. In other words, the following does work with the same data as above:

phyloseq(otuTable(myotutree), mysamplemap, tre(myotutree))

Add importer for AmpliconNoise/Perseus pipeline

This particular pipeline is described recently in BMC Bioinformatics:

http://www.biomedcentral.com/1471-2105/12/38

The projects themselves appear to be hosted on google-code:

http://code.google.com/p/ampliconnoise/

And example data is available at:

http://userweb.eng.gla.ac.uk/christopher.quince/Data/AmpliconNoise.html

However, example output is not provided, nor a formal description of the file formats returned. Would be nice to see this available somewhere. Any comments or suggestions much appreciated.

Add doc describing where to find relevant QIIME files

A default run of the QIIME pipeline will place the 3 or 4 desired output files in different directories.

The big phyloseq vignette (not the basics_vignette included in the package), includes a figure showing the directory structure and where to find the appropriate files.

Make a reference to this in the function documentation, and update the function names in the big vignette, and small one if it happens to mention this as well. readQiime() has been renamed to import_qiime( ).

Build Warning regarding phylobase import

This is some obscure namespace issue that may take some time to resolve. Would be nice if phylobase fixed this with an update. Unfortunately, it is difficult to fix their code to solve the problem if this stays an official dependency. Code is pretty entrenched with phylobase.

It might be possible to switch to a "depends" rather than "imports" dependency, which is not encouraged by Bioconductor, but may be justified if this warning is going to stall submission.

Add extension(s) for running parallelized (weighted) UniFrac

As implemented in R, both UniFrac and weighted UniFrac are very slow. However, both calculations are large sums that are extremely amenable to parallelization. Pre-release versions of phyloseq already included a version of this that worked for weighted-UniFrac, and there is no reason not to include a wrapper, and add the relevant parallel-R package to the "suggests" field of the Description file. A <require(pkg)> line should suffice.

Add hypergeometric test

For testing the effect between sample groups of a taxonomic rank.

A way of accounting for different frequencies of certain Genera whilst testing for significance of certain Genera appearing more (or less) often in a particular group of samples.

Susan Holmes has contributed example code. This can be wrapped or extended. Needs some investigating.

Add latest devel-version of devtools to requirement for devel-phyloseq

The devel version of phyloseq can be installed easily with the install_github() function of the devtools package. However, there are some bugs in the current CRAN version of the devtools package that have been fixed in the latest devel version of devtools available from Hadley. Ironically, you need the CRAN version of devtools in order to go on to install the github version.

install.packages(devtools)
library(devtools)
install_github("devtools"); library("devtools")
install_github("phyloseq", "joey711")

Should do it (provided you installed the other dependencies for phyloseq itself).

Error in tipglom, traced to bug in mergespecies

The following example should work, except it throws an error at the tipglom step...

library("phyloseq")
data(phylocom)
otu <- otuTable(phylocom$sample, speciesAreRows=FALSE)
tree <- as(phylocom$phylo, "phylo4")
x1 <- phyloseq(otu, tree)
print(x1)
library("phylobase")
plot(tre(x1))
x2 <- tipglom(x1, speciationMinLength=2.1)
plot(tre(x2))

Throws the following error:
"
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
"

This has been traced to the internal mergespecies() call. Strangely, the first few mergespecies iterations work, but fail when the pair being merged is c("sp15", "sp16"). Yet, at this point in the partially tipglommed-object, a further mergespecies with c("sp17", "sp18") will still work. Something odd is occurring with that pair, and fixing it will probably fix an important and dangerous bug in mergespecies, which in turn affects many other functions/methods.

t(otuTable) fails to toggle @speciesAreRows value

Once transposed, an otuTable object should have its @speciesAreRows slot toggled (it's a single, logical value). Without this in place, downstream tools will behave badly and assume that species are samples, etc.

Check out t() and figure out what is behind this, make it work again.

This had been tested thoroughly in early builds.

Fix bug in mt (multtest wrapper)

The following should have generated a reasonable call to mt.minP, but instead threw an error:

mt(x2, "Diet")
Error in mt.checkclasslabel(classlabel, test) :
your setting of test is minP
the test needs to be a single character from c('t',f','blockf','pairt','wilcoxon','t.equalvar')

Need to identify and fix. Might be something missing in wrapper.

import_qiime(): Add GreenGenes and other alternative ref seq database options

import_qiime(): Add GreenGenes and other alternative ref seq database options. Greengenes in particular is very popular and should be supported alongside the RDP reference that QIIME uses by default.

For an example, a large jagged table of OTU-ID's and their associated taxonomic assignment is available at:

http://greengenes.lbl.gov/Download/OTUs/gg_otus_6oct2010/taxonomies/otu_id_to_greengenes.txt

Here is an example line from that file:

300253 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Oscillospira;s__

The only white space appears to be separating the OTU-ID from the taxonomy. The taxonomy is semicolon-delimited, with a three-character prefix indicating the taxonomic assignment.

Currently, this file appears to be properly read by import_qiime(), but the following things would improve the behavio:

(1) Prefixes should be used in filling the taxonomyTable to make sure assignments go in the correct column. This is useful to enforce consistency of taxonomic rank labels.

(2) The prefixes should be removed from the label after they are used. The rank is already stored as the column header.

(3) The GreenGenes taxonomy leaves a "N__" when no information is included for a particular rank. This should actually be an NA in the taxonomyTable in R. Otherwise there might be some unequal treatment of missing information.

IMPLEMENTATION:
Ideally, there is one additional option in import_qiime() that would be passed along to the internal OTU/tax importer. The default could remain the RDP file structure.

Create a merge_samples() function

Function should take as argument a phyloseq object that is (or contains) a sampleMap, as well as a variate name within the sampleMap's data.frame that will be used to condense the sampleMap via a rowsum() call.

If the primary argument class also contains an otuTable (that is, "otuSam" and its children), then the otuTable should be similarly condensed. The easiest way to achieve this is probably to split the otuTable from the complex object, orient as sample-by-species matrix, perform the identical rowsum() operation as above, and then re-join the two, while noting the new orientation of the otuTable.

NOTE: This will be useful for the fisher.test wrapper for condensing abundance tables into smaller categorically-grouped tables for experimentally informative hypergeometric test.

Add wrapper for readQiime named import_qiime

This will pass all the same arguments to readQiime, but follows the naming scheme for the rest of the importer functions:

import_process_file( )

This is useful because >import_ in the R IDE will give a drop down of the available functions that "import" stuff. QIIME should be among the functions in that list.

This can probably be accomplished with an alias. You should check on this in case it is the simplest solution.

'

'

' @rdname readqiime-method

'

import_qiime <- readQiime

Does mothur include support for sampleMap type files?

Does mothur include support for sampleMap type files? That is, variate data corresponding to each sample? If so, should add this as well for a more comprehensive import that would work really nicely with phyloseq. I'm not sure if mothur supports these kind of analyses. This needs to be checked, and if so, added to the suite of mothur importers in phyloseq.

Add method extensions to subset for H.O. objects

There are some common subsetting tasks related to subsetting portions of a complicated experiment with many samples and nested structure in time/space/replicates. It will be very useful to have a subsetting feature that simplifies subsetting by a sampleMap variable (e.g. sequencing run, date, subject, or other categories), a set of taxonomic categories, etc. Which types of subsetting are allowed should depend on the object class.

Add importer for RDP multclassifier

http://rdp.cme.msu.edu/classifier/classifier.jsp

This is a potential alternative-branch / addition of the RDP pipeline, wherein raw sequences are preprocessed by RDP pipeline, but the classification/clustering is performed by the multclassifier on a user's local machine. The output will be different, and should include taxonomic classification data similar in nature to the output from pyrotagger.

I have not yet tested multclassifier to verify that its performance / output is appropriate for phyloseq. This should not be too difficult to do, and might add an extra feature on the RDP side. For the moment, only the RDP clust file appears to be appropriate for phyloseq import, and this can only create an otuTable (OTU abundance table), as the related data types are absent from the pipeline.

Abridged vignette suitable for package build, inclusion

Need to include an abridged vignette. It must build quickly to not push our size or build-time limite. It must, however, go through the major features of the package. Might limit plots, make them very small, or set eval=FALSE for the difficult ones. Can take from large vignette available by link on front page.

subscript out of bound error during wUniFrac calculation

The following toy example using data from the Picante package should work quickly and without error:

data(phylocom)
tree <- phylocom$phylo
OTU <- phylocom$sample
ex3 <- phyloseq(otuTable(OTU, speciesAreRows=FALSE), tree)
wUniFrac(ex3)

Instead, the following error is received:
"
Error in eval(expr, envir, enclos) : subscript out of bounds
In addition: Warning message:
In asMethod(object) : trees with unknown order may be unsafe in ape
"

reconcile_species() not pruning tree properly (or at all?)

This was originally detected as a bug in wUniFrac(), which is now closed, because its actually a pruning issue here.

define the example data, from picante package

data(phylocom)
tree <- phylocom$phylo
OTU <- otuTable(phylocom$sample, speciesAreRows=FALSE)
ex3 <- phyloseq(OTU, tree)

reconcile_species(ex3)
otuTree Object

<<< tree >>>
"phylo4"-class phylogenetic tree with
32 tips, and 31 internal nodes.
Tips: sp1 sp2 sp3 ...
Rooted.
<<< tree >>>

OTU Table [6 by 25]:
Samples: clump1, clump2a ... even, random
Species: sp1, sp10 ... sp8, sp9
sp1 sp10 sp11 sp12
clump1 1 0 0 0
clump2a 1 2 2 2
clump2b 1 0 0 0
...

Add distglom() function

distglom() should agglomerate taxa based on distances, closely analogous to the way tipglom() agglomerates based on patristic distances from the phylogenetic tree.

Some OTU-clustering applications produce a distance matrix between all reads (e.g. mothur), and this can be imported and then used to further condense the number of "different" taxa according to their distances.

Modify ape dependency from "depends" to "imports"

ape package now has namespace (v2.8+, as of 2011 - 10 - 26). Modify dependency accordingly:

(1) Change from "Depends:" field to "Imports:" in DESCRIPTION file.

(2) Search all explicit function calls ape:: and adjust header to have tag:
@import ape

(3) Do this in an experimental build, and see if the "phylo" class is still imported as well. There is no "exportClass" statement in the ape NAMESPACE file. It should be considered untested and appears to be a manually-written namespace.

Fix bug in calcplot

A perfectly legitimate otuSamTaxTree object sent provided as the sole argument to calcplot, returns the following error:

calcplot(x2)
Error in sampleMap(object) :
error in evaluating the argument 'object' in selecting a method for function 'sampleMap': Error in get(all.vars(X)[1]) : object 'NA' not found

This bug needs to be identified and fixed. Seems to have begun occurring after a fix in cca.phyloseq. Probably not unrelated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.