GithubHelp home page GithubHelp logo

russel88 / miceco Goto Github PK

View Code? Open in Web Editor NEW
40.0 4.0 11.0 894 KB

Various functions for analysis of microbial community data

License: GNU General Public License v3.0

R 100.00%
amplicon-sequencing 16s microbiome microbiome-analysis null-model microbial-communities microbial-ecology

miceco's Introduction

Travis Build Status Project Status: Active - The project has reached a stable, usable state and is being actively developed. Package-License

MicEco: Various functions for analysis for microbial community data

Installation

library(devtools)
install_github("Russel88/MicEco")

Citation

DOI

Phyloseq extensions

ps_prune

Prune taxa (ASVs, OTUs) from a phyloseq object based on their abundance and/or prevalence

ps_venn

Make Venn diagram of shared taxa (ASVs, OTUs) across sample groups from a phyloseq object. Overlap can be weighted by relative abundance

ps_euler

Make Euler diagram of shared taxa (ASVs, OTUs) across sample groups from a phyloseq object. Overlap can be weighted by relative abundance

ps_pheatmap

Make pretty heatmap directly from a phyloseq object. Built-in agglomoration, filtering, ordering, scaling, transformation, and annotation.

rcurve

Rarefaction curve (theoretical and fast) from a phyloseq object. Output ready for plotting in ggplot2

ps_tax_clean

Clean tax_table such that NAs are replaced with names of the most specific known taxonomy prefixed with the rank.

Miscellaneous functions

clr

CLR transformation of community matrix, with multiplicative zero replacement

adonis_OmegaSq

Calculate the unbiased effect size estimation (partial) omega-squared for adonis (PERMANOVA) models. Note that the calculation is similar to a standard ANOVA and is not based on a theoretical foundation specifically for PERMANOVA.

WdS.test

Wd* - robust distance-based multivariate analysis of variance (https://doi.org/10.1186/s40168-019-0659-9). This code is taken from https://github.com/alekseyenko/WdStar/. An alternative to PERMANOVA.

UniFrac.multi

With unrooted phylogenies UniFrac sets the root randomly on the tree. The position of the root affects the results. This function runs UniFrac multiple times in parallel, with different roots, and takes the average to smooth potential bias.

proportionality

Calculate proportionality on a phyloseq object or otu-table. Proposed by Lovell et al. 2016 Proportionality: a valid alternative to correlation for relative data (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004075)

16S rRNA gene copy number analyses

community_rrna

Calculate the average 16S rRNA copy number of the OTUs in a community/sample

Please cite Schostag et al. 2019 ISMEJ if you use this function.

rarefy_rrna

Combines rarefaction with 16S rRNA copy number correction. It rarefies counts with a probability of the inverse 16S rRNA copy number, such that besides rarefying the read counts the otu-table will be corrected for the varying 16S rRNA copy numbers of the OTUs.

Neutral model

neutral.fit

Fit neutral model developed by Sloan et al. (2006, Environ Microbiol 8(4):732-740) and implemented by Burns et al. (2015, ISME J 10(3):655-664).

neutral.rand

Fit neutral model developed by Sloan et al. (2006, Environ Microbiol 8(4):732-740) and implemented by Burns et al. (2015, ISME J 10(3):655-664) several times on ramdomly picked samples and with 16S rRNA gene copy number corrected rarefaction (rarefy_rrna).

Beta diversity null models

ses.UniFrac

Standardized effect size of UniFrac, based on null models created with permatfull/permatswap from the vegan package, or simple shuffling of phylogenetic tree.

ses.comdist

Standardized effect size of MPD (mean pairwise distance) separating taxa in two communities, a measure of phylogenetic beta diversity (also called betaNRI and betaMPD). This is a combination of ses.mpd (Standardized effect size of MPD in single communities) and comdist (MPD between two communities) from the picante package.

ses.comdistnt

Standardized effect size of MNTD (mean nearest taxon distance) separating taxa in two communities, a measure of phylogenetic beta diversity (also called betaNTI and betaMNTD). This is a combination of ses.mntd (Standardized effect size of MNTD in single communities) and comdistnt (MNTD between two communities) from the picante package.

ses.comdist2

As ses.comdist, but null models are created with permatfull/permatswap from the vegan package

ses.comdistnt2

As ses.comdistnt, but null models are created with permatfull/permatswap from the vegan package

comdist.par

A parallel version of the comdist function from the picante package for significant speedup on large datasets

comdistnt.par

A parallel version of the comdistnt function from the picante package for significant speedup on large datasets

ses.mpd.par

A parallel version of the ses.mpd function from the picante package for significant speedup on large datasets

ses.mntd.par

A parallel version of the ses.mntd function from the picante package for significant speedup on large datasets

ses.permtest

Permutation test of z-matrix from ses.comdist, ses.comdist2, ses.comdistnt, ses.comdistnt2 and ses.UniFrac.

Copyright notice

rarefy_rrna: Some code is from vegan licensed under GPL-2 (https://github.com/vegandevs/vegan)

ses.mpd.par, ses.mntd.par, comdist.par, comdistnt.par, ses.comdist, ses.comdist2, ses.comdistnt and ses.comdistnt2: Some code is from picante licensed under GPL-2 (https://github.com/skembel/picante)

miceco's People

Contributors

jarioksa avatar russel88 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

miceco's Issues

ps_venn some advice (percent and label)?

It was easy to use ps_venn
I was wondering if it is possible to change the labels of the circles to be on their edges?
so rather than
image

is to be like
image

So also can I make changes where the percents appear only on the cross-section between circles and not for every circle?
to only have % in here as you can see (screenshot from my first fig shared)
image

here are my codes

ps_venn(ps.prev, 
        group = "treatment", 
        quantities = list(type=c("percent","counts"), font = 2),
        labels = list(cex = 1)

Number of reads to sample when using rarefy_rrna

Hello,

First of all, thank you for coding this package, which I use a lot to speed up ses.mpd calculations on a multi-core machine.

I have a question regarding the number of reads to sample when using rarefy_rrna. I think that choosing the number of reads is not that straightforward as when using any other simpler rarefaction tool. I think choosing something like min(sample_sums(physeq)) as the number of reads is not completely correct as we are correcting and estimating the number of reads based on the fact that some organisms have multiple copies of the rRNA operon.

Do you have any suggestion or recommendation on how to choose the number?

Thank you,
Eduardo

request to add in export list of unique OTUs and shared OTU

Hi Russel88,
I like your venn diagram display. However, I find it difficult to extract the unique taxa and shared taxa through the console, when I set the plot to "False". Hence, I would love to see an add-on features to export the list, or into a data frame. Thank you

Change order of grouping categories in Venn diagram

Hey,

Thanks for the fantastic package! Is there a way to change the order of grouping categories when plotting Venn diagrams with function ps_venn?

In the example below, I would like the category PUM.ANT between PUM and ANT, and PUM.BET between PUM and BET.

Rplot01

Cheers
Camille

Inconsistent results

Hello!

I'm using two dfferent functions (ps_venn and plot(venn(list_core)) and I'm obtaining two different results. I'm not understanding why this is happening, as the dataset is the same for both diagrams. I think there's something with the parameters..

diagrama de Venn 1
venn2

For the Diagram 1, using eulerr, microbiome and microbiomeutilities packages, the script is:

for (n in situation_list)

ps.sub <- subset_samples(ps.rel, Situation == n)

core_m <- core_members(ps.sub,
detection = 0.001,
prevalence = 0.50)
print(paste0(n, length(core_m)))
list_core[[n]] <- core_m

}

mycols <- c(Wild, Captive)
venn <- plot(venn(list_core),
fills = mycols)

venn

For the Diagram 2, using MicEco package, the script is:
venn2 <- ps_venn(ps3_dna,
"Situation",
fraction = 0.5,
weight = FALSE,
relative = TRUE,
plot = TRUE, #
)

venn2

Could you, please, help me understand that? Do you have any instructions on how to perform this analysis properly?

Thanks in advance!

The result of neutral.fit differs from other packages

Hi Russel,

Thank you for the package.

As the title suggested, the R squared value differed from those of other packages, such as iCAMP's snm and reltools' fit_sncm when tested on the same dataset.

fit_sncm(t(data.frame(otu_table(OR16S_rare)))) # R2=0.4450697 m=0.1035867
fit_sncm(t(data.frame(otu_table(VA16S_rare)))) # R2=0.3859323 m=0.0897298

snm(t(data.frame(otu_table(OR16S_rare)))) # R2=0.4450697 m=0.1035854
snm(t(data.frame(otu_table(VA16S_rare)))) # R2=0.3859323 m=0.08973541

neutral.fit(t(data.frame(otu_table(OR16S_rare)))) # R2=0.7925008 m=0.1035908
neutral.fit(t(data.frame(otu_table(VA16S_rare)))) # R2=0.8073008 m=0.08970018

The R squared is about 0.3 to 0.4 more than the other two. I am new to modeling. Could you give me some insight on the differences?

Thank you,
Xiaoping

In ps_venn percentage does not add up to 100%

Hi MicEco :)

I am using ps_venn function, but the percentage of each group does not add up to 100%. I used this code before taking the number of samples into account:

ps_venn(
  ps.prop,
  group = "Ecosites", type = "Pond"
)

And this is the diagram:

Venn_no_fraction

And then I added the fraction:

fractions <- c(Dugout = 0.2, Upland = 0.2, Lowland = 0.6)

ps_venn(
ps.prop,
group = "Ecosites", type = "Pond", relative = TRUE, quantities = list(type = c("percent", "counts")), fraction = fractions
)

And got this:

Venn_with_fraction

But the sum of the unique and common percentages does not add up. Any idea what went wrong?

Thank you,
Z.

Citation

Hi Russel

How do I cite my MicEco package in my paper ?

Hesham

Using community_rrna with Silva reference database for taxonomy

Hi,

I'm trying to use the community_rrna function but I have trained my taxa from the Silva database because Greengenes hasn't been updated. The function says I can use an OTU-table with taxa as rows and OTU names as rownames and a dataframe with two variables: "ID" is the OTU id matched by rownames in x and "Copy" is the copy number. However, that doesn't include any sample information. Shouldn't the OTU table have OTUs as rownames and samples as column names and the frequency of each OTU in each sample?

If this doesn't work, can you post the fasta file you used from Greengenes to assign taxonomy in your phyloseq?

Thanks!

label and percent/count font size and space in between?

Hi team
Anyway to leave space between the label and the percent/count? so that I increase the font of each without overlapping each other.
Thank you
M
small size
image
large size
image

my codes in case you would like to have a look
image

picante error in ses.comdistnt2

i've attached my distance matrix, strata vector, and phylogenetic distance matrix.

I use the following code to load and run them:

library(ape)
library(picante)
library(MicEco)
library(doSNOW)

tsv.data <- read.delim("../otu_data/clustered_sequences/test_abundances.txt", row.names=1)
phydf <- read.delim("../otu_data/clustered_sequences/test_distmat.txt", row.names=1)
strata_df <- read.delim("../otu_data/clustered_sequences/test_strata.txt", row.names=1)
strat_vec <- unname(unlist(strata_df[,'CollectionAgency']))
strata_ <- as.integer(as.factor(strat_vec)[drop = TRUE])
phydist = as.matrix(phydf)
mntd_scores <- ses.comdistnt2(tsv.data, phydist, method = "quasiswap", strata =strata_ , abundance.weighted = TRUE, runs = 5, cores=1)

The error I get is:

Error in match.comm.dist(comm, dis) : 
  Community data set lacks taxa (column) names, these are required to match distance matrix and community data

I do not get this error when strata = NULL

test_abundances.txt
test_distmat.txt
test_strata.txt

Error in do.call(c, singles[-x]) : second argument must be a list

Hello!

I am having this error message (Error in do.call(c, singles[-x]) : second argument must be a list) when trying to print the list using ps_venn. Here follows the code:

venn_list <- ps_venn(ps3_dna.t,
"Situation",
fraction = 0.9,
weight = FALSE,
relative = TRUE,
plot = FALSE,
)

venn_list

I am using an agglomerated (genus level) phyloseq object.

Any advice on that?

Thanks in advance!

Supported groups

Hi Russel!

I am using ps_venn which I find super useful, my only question is, I am trying to run a diagram for 10 variables but I get an error stating that the max groups admitted are 5. Is there a workaround this? I understand the final plot would look really messy but I would like to see how it turns out.

Thanks!!

Area-proportional Venn diagram?

Hi team
Thanks for making things easier to create a Venn diagram directly on phyloseq object.
Area-proportional Venn diagram (also called a Venn diagram by area) is preferable because of grasping the idea of shared and unique quickly.
This is a suggestion to implement to your wonderful package, if possible?
Cheers
M

ps_venn and ps_euler -- obtain list of shared and not-shared OTUs?

Hi, I'm loving the ps_venn and ps_euler commands. They make great images and are super easy to use, but is there a way to also output the list of shared and not-shared ASVs or OTUs? I'd like to be able to cross-reference the data to some other things I'm doing. I've looked over the readme files, but I'm not sure savvy with code so I couldn't find an apparent way to easily do this.

Thanks!
Alicia Reigel

Error in rep(rrna.rev, times = x[i, ]) : invalid 'times' argument

Hi, this is my command and output:

rarefy_rrna.matrix(EQM_spec_Z1_TaxID_sorted_by_Abundance_descending, 1000, copy.database = "v13.5", seed = NULL,

  •         trim = FALSE)
    

Remember to set seed! Now set to 1562117525.63036
Error in rep(rrna.rev, times = x[i, ]) : invalid 'times' argument

I'm not sure about the usage of 'times' here.

Installation on R (4.0.4) - issue

Hi,

I'm trying to install the MicEco package using the command lines in R (version 4.0.4):

install.packages("remotes")
remotes::install_github("Russel88/MicEco")

But I got the following error message : " Error: Failed to install 'MicEco' from GitHub:
Git does not seem to be installed on your system"

Do you have any idea what should be the problem ?

Thank you for your time!

ps_venn

Trying to produce a venn of ASV overlap between two samples. I have my phyloseq object with my sample_data(ps). Im trying to group by a categorical variable in my sample data (either spec or environ) and keep getting this error:

ps_venn(stdc2, group = 'Type', type = 'counts')
Error in aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) :
no rows to aggregate

after checking my phlyloseq object, it looks normal:
phyloseq-class experiment-level object
otu_table() OTU Table: [ 87079 taxa and 62 samples ]
sample_data() Sample Data: [ 62 samples by 7 sample variables ]
tax_table() Taxonomy Table: [ 87079 taxa by 7 taxonomic ranks ]

Note: my sample_names(ps) are only numbers. Is this error because my sample names are not the grouping variable?

Font size in ps_venn

Hey,

Thanks for the fantastic package, and function ps_venn.

I have been unable to change the font size of the numbers and/or labels. Could you please help?

Best wishes.

fraction setting and OTUs in venn plot

Hello,

Thank you for the great tool. I am preparing some ven diagrams based on different fractions(0.3,0.5 and 0.7). I expected to see OTUs that are persent in fraction 0.7, are also present in lower fraction settings(0.3 and 0.5) but that is not always the case.

Am i missing something ?

Thank you,

ps_venn ran fine the first time and then get an error; ps_euler runs fine.

Hello

I get this error Error in drawVennDiagram(data = x, small = small, showSetLogicLabel = showSetLogicLabel, : gplots.drawVennDiagram: This internal function is used wrongly. Please call the function 'venn' with the same arguments, instead. What this could mean? ps_euler works fine and I am able to plot.

Thanks in advance.

Suprising gRsqr calculation in neutral.fit

Hello,

I am using the neutral.fit function and I was surprised by the gRsqr results I was getting: they were not sustaining the visual impression I had of the fit. I tried to calculate the generalized R squared using the formula provided in Burns et al. (2015, ISME J 10(3):655-664) and I got very different results that fitted what I expected.

In Burns and al., they calculate R2 with: R2 = 1 - SSerr/SStotal
with SSerr the sum of squares of residuals and SStotal the total sum of squares

In neutral.fit, gRsqr is calculated with: R2 = 1 - exp(-as.numeric(logLik(m.mle))/length(p))
with p the number of observations and logLik(m.mle) the log likelihood of the model predicted by mle2

Do you have an insight on the difference between the two calculations?

Thanks,

Installation failed

Hi,

I'm trying to install the MicEco package using R Studio (version "Ghost Orchid" Release (077589bc, 2021-09-20) for macOS) and this command:

githubinstall("MicEco")

(I have installed Git ahead)

But I got the following error message :
Downloading GitHub repo Russel88/MicEco@master
Error: Failed to install 'MicEco' from GitHub:
Command failed (1)
In addition: Warning message:
In system(full, intern = TRUE, ignore.stderr = quiet) : running command ''/usr/bin/git' ls-remote https://git.bioconductor.org/packages/phyloseq RELEASE_3_14 2>/dev/null' had status 1

Do you have any idea how can I solve the problem?
Thank you!

ps_venn

when I run my code for ps_venn

ps_venn (ps.nonc.nocyano, SampleType,
fraction = 0.1,
weight = FALSE,
type = "percent",
relative = TRUE,
plot = TRUE)

SampleType is my variable
here is when I run sampledata
Sample Data: [357 samples by 2 sample variables]:
SampleType Sample2
1001SH CFH 1001SH
1001WH CFH 1001WH
1001WS CS 1001WS

I got this error
Error in paste("value ~ Var1 +", group) : object 'SampleType' not found

it also with other function such as ps_euler, ps_pheatmap.

Showing less decimal places in the relative abundance venn diagram

Hi there,
First off, this package is great, thanks for making it. I was wondering though if there is a way to have less decimal places show up in the relative abundance venn diagrams. Currently it shows 7 decimal places which is a little overwhelming to look at.
Let me know if this is possible,
Thanks!

ps_venn and ps_euler give different results

Hi Russel, I have been playing with your package and really enjoy it!

Based on the same dataset, the Venn and Euler diagrams look different and I don't understand why. For example in the Euler plot, what happened to the 23 OTUs shared between all forest types in the Venn plot? Am I missing something here?

Thanks!

ps_venn(
  data,
  group = "ForestType",
  quantities = list(type=c("counts")),
  plot = TRUE
  )

image

ps_euler(
  data,
  group = "ForestType",
  quantities = list(type=c("counts")),
  plot = TRUE
  )

image

Plot ASV counts and relative abundance

Hello,

Thank you for this useful package!
Would there be a way to combine both the ASV/OTU count and the relative abundance they represent using weight = T in the plotted Venn diagram ?

Have a great day,

Simon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.