GithubHelp home page GithubHelp logo

jessie-hou / microeco Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chiliubio/microeco

0.0 1.0 0.0 10.38 MB

An R package for data analysis in microbial community ecology

License: GNU General Public License v3.0

R 2.37% C++ 0.01% HTML 97.62%

microeco's Introduction

microeco

An R package for data analysis in microbial community ecology

Background

In microbial community ecology, with the development of the high-throughput sequencing techniques, the increasing data amount and complexity make the community data analysis and management a challenge. There has been a lot of R packages created for the community data analysis in microbial ecology, such as phyloseq, microbiomeSeq, ampvis2, mare and microbiome. However, it is still difficult to perform data mining fast and efficiently. Based on this, we created R package microeco.

Main Features

  • R6 Class to store and analyze data; fast, flexible and modularized
  • Plotting the taxonomic abundance
  • Venn diagram
  • Alpha diversity
  • Beta diversity
  • Differential abundance analysis
  • Indicator species analysis
  • Environmental data analysis
  • Network analysis
  • Null model analysis
  • Functional analysis

Installing R/RStudio

If you do not already have R/RStudio installed, do as follows.

  1. Install R
  2. Install RStudio
  3. With Windows, install also Rtools

Put R and Rtools in the computer env PATH, for example your_directory\R-3.6.3\bin\x64, your_directory\Rtools\bin and your_directory\Rtools\mingw_64\bin
Open RStudio...Tools...Global Options...Packages, select the appropriate mirror in Primary CRAN repository.

Install microeco

Directly install microeco online.

# If devtools package is not installed, first install it
install.packages("devtools")
# then install microeco
devtools::install_github("ChiLiubio/microeco")

If the installation of microeco is failed because of the bad internet, download the package first, then install it.

devtools::install_local("microeco-master.zip")

Use

See the detailed package tutorial (https://chiliubio.github.io/microeco/) and the help documentations. If you want to run the codes in the tutorial completely, you need to install some additional packages, see the following Notes part.

Notes

packages important

To keep the start and use of microeco package simplified, the installation of microeco only depend on several packages, which are compulsory-installed and very useful in the data analysis. These packages include R6, ape, vegan, rlang, data.table, magrittr, dplyr, tibble, reshape2, scales, grid, ggplot2, RColorBrewer, Rcpp, RcppArmadillo and RcppEigen. So the question is that you may encounter an error when using a class or function that invoke an additional package like this:

library(microeco)
data(sample_info)
data(otu_table)
data(taxonomy_table)
data(phylo_tree)
dataset <- microtable$new(sample_table = sample_info, otu_table = otu_table, tax_table = taxonomy_table, phylo_tree = phylo_tree)
dataset$tidy_dataset()
dataset$cal_betadiv(unifrac = TRUE)
Error in loadNamespace(name) : there is no package called ‘GUniFrac’ ...

The solutions:

  1. install the package when encounter such an error. Indeed, it's very easy to install in Rstudio. Just try it.

  2. install the packages in advance. We recommend this solution if you are interest at many methods of the microeco package. We first show some packages that are necessary in some functions.

Package where description
GUniFrac cal_betadiv UniFrac distance matrix
picante cal_alphadiv Faith’s phylogenetic alpha diversity
agricolae cal_diff(method = anova) multiple comparisons
ggpubr plot_alpha some plotting functions
ggdendro plot_clustering plotting clustering dendrogram
MASS trans_diff$new(method = “lefse”,…) linear discriminant analysis
randomForest trans_diff$new(method = “rf”,…) random forest analysis
ggrepel trans_rda reduce the text overlap in the plot
pheatmap plot_corr(pheatmap = TRUE) correlation heatmap with clustering dendrogram
igraph trans_network class network related operations
rgexf save_network save network with gexf style
VGAM trans_corr class Generates Dirichlet random variates in SparCC
RJSONIO trans_func the dependency of biom package
ggalluvial plot_bar(use_alluvium = TRUE) alluvial plot

Then, if you want to install these packages or some of them, you can do like this:

# If a package is not installed, it will be installed from CRAN.
# First select the packages of interest
packages <- c("GUniFrac", "picante", "agricolae", "ggpubr", "ggdendro", "MASS", "randomForest", 
	"ggrepel", "pheatmap", "igraph", "rgexf", "VGAM", "RJSONIO", "ggalluvial")
# Now check or install
lapply(packages, function(x) {
	if(!require(x, character.only = TRUE)) {
		install.packages(x, dependencies = TRUE)
	}})

WGCNA

In the correlation-based network, when the species number is large, the correlation algorithm in WGCNA is very fast compared to the cor function in R base. WGCNA depends on several packages in Bioconductor, including GO.db and impute. So if you want to install WGCNA, first install GO.db (https://bioconductor.org/packages/release/data/annotation/html/GO.db.html) and impute (http://www.bioconductor.org/packages/release/bioc/html/impute.html) with the following code.

# install GO.db and impute
# First check and install BiocManager package
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
# install GO.db and impute
BiocManager::install("GO.db")
BiocManager::install("impute")
BiocManager::install("preprocessCore")
# then install WGCNA.
install.packages("WGCNA", dependencies = TRUE)

ggtree

Plotting the cladogram from the LEfSe result requires the ggtree package in bioconductor (https://bioconductor.org/packages/release/bioc/html/ggtree.html).

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("ggtree")

chorddiag

The R package chorddiag is used for the chord plot in the network analysis and can be installed from Github https://github.com/mattflor/chorddiag

Tax4Fun

Tax4Fun is an R package used for the prediction of functional potential of microbial communities.

  1. install Tax4Fun package
install.packages(system.file("extdata", "biom_0.3.12.tar.gz", package="microeco"), repos = NULL, type = "source")
install.packages(system.file("extdata", "qiimer_0.9.4.tar.gz", package="microeco"), repos = NULL, type = "source")
install.packages(system.file("extdata", "Tax4Fun_0.3.1.tar.gz", package="microeco"), repos = NULL, type = "source")
  1. download SILVA123 reference data from http://tax4fun.gobics.de/  unzip SILVA123.zip , move it to a place you can remember

python

Predicting the functional potential on the biogeochemical cycles require python 2.7 and several packages.

  1. download python 2.7 from https://www.python.org/downloads/release
  2. With windows, put python in the computer env PATH manually,  such as your_directory_path\python and your_directory_path\python\Scripts
  3. Open terminal or cmd or Powershell , run
pip install numpy
pip install argparse

If the installation is too slow and failed, use -i select the appropriate mirror, for example, in China, you can use:

pip install numpy -i https://pypi.douban.com/simple/
pip install argparse -i https://pypi.douban.com/simple/

FlashWeave

FlashWeave is a julia package used for network analysis. It predicts ecological interactions among microbes from large-scale compositional abundance data (i.e. OTU tables constructed from sequencing data) through statistical co-occurrence.

  1. download and install julia from https://julialang.org/downloads/
  2. Put julia in the computer env PATH, such as your_directory_path\Julia\bin
  3. Open terminal or cmd or Powershell, input julia, install FlashWeave following the operation in https://github.com/meringlab/FlashWeave.jl

Gephi

Gephi is used to open saved network file, i.e. network.gexf in the tutorial. You can download Gephi and learn how to use it from https://gephi.org/users/download/

plotting

All the plots in the package rely on the ggplot2 package system. We provide some parameters to change the corresponding plot. If you want to change the output plot, you can also assign the output a name and use the ggplot2-style grammer to modify it as you need. Of course, you can also directly modify the function or class to reload them.

read your file

In this part, we show how to construct the object of microtable class using the raw otu file from qiime.

# use the raw data files stored inside the package
otu_file_path <- system.file("extdata", "otu_table_raw.txt", package="microeco")
# the example sample table is csv style
sample_file_path <- system.file("extdata", "sample_info.csv", package="microeco")
# phylogenetic tree
phylo_file_path <- system.file("extdata", "rep_phylo.tre", package="microeco")
# load microeco and qiimer, if qiimer is not installed, see Tax4Fun part to install qiimer package
library(microeco)
library(qiimer)
# read and parse otu_table_raw.txt; this file does not have the first commented line, so we use commented = FALSE
otu_raw_table <- read_qiime_otu_table(otu_file_path, commented=FALSE)
# obtain the otu table data.frame
otu_table_1 <- as.data.frame(otu_raw_table[[3]])
colnames(otu_table_1) <- unlist(otu_raw_table[[1]])
# obtain the taxonomic table  data.frame
taxonomy_table_1 <- as.data.frame(split_assignments(unlist(otu_raw_table[[4]])))
# read sample metadata table, data.frame
sample_info <- read.csv(sample_file_path, row.names = 1, stringsAsFactors = FALSE)
# read the phylogenetic tree
phylo_tree <- read.tree(phylo_file_path)
# check whether the tree is rooted, if unrooted, transform to rooted
if(!is.rooted(phylo_tree)){
	phylo_tree <- multi2di(phylo_tree)
}
# make the taxonomic table clean, this is very important
taxonomy_table_1 %<>% tidy_taxonomy
# create a microtable object
dataset <- microtable$new(sample_table = sample_info, otu_table = otu_table_1, tax_table = taxonomy_table_1, phylo_tree = phylo_tree)
# for other operations, see the tutorial (https://chiliubio.github.io/microeco/) and the help documentations
# the class documentation include the function links, see the microtable class, input:
?microtable
# see the tidy_dataset function in the microtable, click the link or input:
?tidy_dataset

QQ

If the user has problems or suggestions, feel free to join the QQ group for discussions.
QQ group: 207510995

Acknowledgement

  • R6, The main class system in this package.
  • lefse python script, The main lefse codes are translated from lefse python script.
  • phyloseq, the idea of data structures of microtable class in microeco comes from phyloseq-class in package phyloseq.
  • microbiomeSeq, the method that calculates the roles of nodes within- and among- modules connectivity is modified from the package microbiomeSeq.
  • SpiecEasi, the method that calculates SparCC is modified from the package SpiecEasi.
  • microbiomeMarker, the method that plots the LEfSe cladogram is modified from the package microbiomeMarker.

microeco's People

Contributors

chiliubio avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.