GithubHelp home page GithubHelp logo

multihierarchical_nevadotoluca's Introduction

Dispersal limitations and long-term persistence drive differentiation from haplotypes to communities within a tropical sky-island: evidence from community metabarcoding

The repository contains the pipeline to perform bioinformatics tools from metabarcoding data of the project Dispersal limitations and long-term persistence drive differentiation from haplotypes to communities within a tropical sky-island: evidence from community metabarcoding. We focus on a single tropical sky-island, Nevado de Toluca of the Transmexican Volcanic Belt, where we sampled whole-communities of arthropods for eight orders with a comparable design at a spatial scale ranging from 50 m to 20 km, using 840 pitfall traps and whole community metabarcoding. These samples were then used to build metabarcoding libraries of arthropods using COI marker.

Primers and overhang adapter:

COI expected size 418 pb

B_F 5' CCIGAYATRGCITTYCCICG 3' (Shokralla et al., 2015)

Fol-degen-R 5’ TANACYTCNGGRTGNCCRAARAAYCA 3' (Yu et al., 2012)

The overhang adapter sequence must be added to the locus‐specific primer for the region to be targeted. The Illumina overhang adapter sequences to be added to locus‐specific sequences are:

Forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG‐[locus‐ specific sequence] Reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]

Repository organization

Contains data and scripts for the sections Bioinformatics processing to identify OTUs at different thresholds of genetic similarity, Community diversity and composition and Similarity distance-decay and landscape connectivity of the manuscript.

The bin directory cointains the scripts used in this pipeline. All scripts run using bin as working directory. The data directory is not included in this repository, but data is available at this dryad.

Data comes from Cornell Institute of Biotechnology, Cornell University, USA, for sequencing on a lane of Illumina MiSeq 2x300 bp.

/bin/

The scripts in /bin should be run in the order they are numbered. R functions used by some of these scripts are not numbered and have the extension .R. Html notebooks are provided for some of the analyses in R.

Scripts content:

  • 0.0install_software.txt well, not actually a script, but this fille contains packages for the 0.1Processing_BioSoup.sh and 0.2Processing_Soup_metaMATE.sh scripts into Arribas et al., 2020. The packages are fastqc, fastx-toolkit, trimmomatic-0.36, pairfq-0.17, usearch-9.2, and usearch-10.
  • 0.1Processing_BioSoup.sh Paired-end reads of samples were quality filtered following procedures described by Arribas et al. (2020). Briefly, processing included quality checking, primer removal, pair merging, quality filtering, denoising, and clustering each library independently. Size COI = 418 bp.
  • 0.2Processing_Soup_metaMATE.sh Paired-end reads of samples were quality filtered following procedures described by Arribas et al. (2020). Briefly, processing included quality checking, primer removal, pair merging, quality filtering, denoising, and clustering each library independently. Size COI = 416-420 bp.
  • 0.3Steps_afterProcessing.txtwell, not actually a script. Contains the steps for each library: Get unix, blast to MEGAN and visualised tree in figtree.
  • 0.4Searching_Stop_Codons.txt not actually a script. Here, each ASV dataset was aligned in Geneious for searching codon stops.
  • 0.5to_get_ASV_tables.sh script for geting a community table then generated with read-counts (haplotype abundance) of each retained ASV for the eight orders by matching ASVs against the complete collection of reads.
  • 0.6to_get_UPGMAtree_GMYC_MH_lineages_All.r This script gets all scripts for the first step (source) that obtained lineages at different clustering levels for each of the orders. STEP 1 and 2: We get analysis of the UPGMAtree and GMYC each lineages. Step 3: We get analysis to apply NODE.MIN get trees multiple each lineages (e.g. We went at "Arachnida/SpeciesDelimitation/2to apply NODE.MIN_Get_trees_multiple_each_lineage_ArachnidaStep2" and we ejecuted each script with differente order).
  • 1.PrepareRastersNT.r Script to reclassify altitude, slope, and vegetation types output rasters to desired values and to to create a flat lanscape.
  • 2.PlotRaster.R Plots each resistance among with sampling points.
  • 3.to_get_conservative_threhold_on_original_all.r This script gets conservative thershold on origibale table ASVs by artropods order. It is important to place the correct number of columns and rows, because this can change in each group. In this step, we can join all the groups or put them separately as in this case.
  • 4.to_get_Diversity_All.rThis script gets all scripts (source) of diversity using 8 arthropods order at multi-hierarchical levels.
  • 5.Plot_Diversity_All.rThis script plot "Community diversity and composition", "plot global Richness by sites", "Beta diversity", "Non-Metric Multidimensional scaling (NMDS) ordinations of community similarity", and "Accumulation Curves".
  • 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.rThis script gets all scripts (source) to estimate "Distance decay at the multihierchical leveles" and Isolation by resistence.
  • 7.Plot_DistanceDecay_IBD_IBRThis script plot "Distance decay" and Isolation by resistance at the figure 5, figure 6 and Figure S6.
  • 8.Circuitscape_Tutorial.md well, not actually a script. Contains the settings used to run Circuitscape for each of the rasters.

These scripts use the data in genetic, spatial and meta.

/genetic

Contains genetic data in and data out for each order

Genetic data in corresponds to 0.1Processing_BioSoup.sh output using the subset of each order.

Genetic data out corresponds to 4.to_get_Diversity_All.rusing the subset of 8 orders and 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r output using the subset of Diptera and Collembola order.

/spatial

Contais spatial data as follows:

  • /Elevation contains .asc rasters for the Nevado de Toluca. NevTol_Alt.tif is the original dataset, the rest are the result of reclassifying (output of 1.PrepareRastersNT script) it for each of the altitudinal resistance surfaces.

  • /Slope contains .tif rasters for the Nevado de Toluca slope. NevTol_Pen.tif is the original dataset, the rest are the result of reclassifying (output of 1.PrepareRastersNT script) it for each of the slope resistance surfaces.

  • /VegetationType contains .tif rasters for the Nevado de Toluca using Vegetation type. nevado_f.tif is the original dataset, the rest are the result of reclassifying (output of 1.PrepareRastersNT script) it for each of the vegetation type resistance surfaces.

  • surveyed_mountainNevadoToluca contains sampling points (.csv) used in 2.PlotRasters.R

  • IBDistanceMatrix contains geomatrix focal point (*.txt) used in to run 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r.

  • IBResistanceFlatMatrix flat output (*.txt) from Circuit scape used in to run 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r.

  • IBResistanceSlopeMatrix slope output (*.txt) from Circuit scape used in to run 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r.

  • IBResistanceVegTypeMatrix Vegetation type output (*.txt) from Circuit scape used in to run 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r.

  • IBResitanceElevationMatrix elevation output (*.txt) from Circuit scape used in to run 6.to_get_BetaDiversity_DistanceDecay_IBD_IBR.r.

  • Circuitscape contains the focal points (*_FocalPoint.txt) used to run Circuit scape and the output (/out).

/meta

ConservationForestNevadoToluca.csv contains metadata for each of the samples sequenced in a lane Miseq. Each column names refer to:

  • id: sample number ID
  • code: ID of the sequencing run
  • label_metabarcoding: sample name of each library: e.g. CON_NTO_TLC_31TCONS1: CON Conservation (Treatment), NTONevado de Toluca (Mountain), TLCTlacotepec (locality), 31TCONS1 ID of the sequencing sample run.
  • locality: Locality of the sampling
  • key_locality: Abbreviation of the sampling location
  • municipality: Municipality of the sampling
  • state: State of the sampling
  • natural_protected_area: Natural Protected Area of the sampling
  • key_natural_protected_area: Abbreviation of the Natural Protected Area
  • latitude: Latitude of the sampling
  • longitude: Longitude of the sampling
  • samplig_altitude: Meter above level seal of each the sampling point.
  • treatment: Name of the treatment
  • key_treatment: Abbreviation of the treatment
  • season: Name of season in the sampling
  • forest_type: Name of the tree specie in the forest.

END

multihierarchical_nevadotoluca's People

Contributors

aliciamstt avatar nancy-galvez avatar

Watchers

 avatar  avatar

multihierarchical_nevadotoluca's Issues

Run glm alnalsis with altitude C raster

Discussing the results of Altitude B raster, we think it would be a good idea to re-run the analysis with an additional raster assuming no dispersal above and below a given altitude.

Basically we want something like the "grey ring" of Altitude B, but where high conductance will be allowed (black instead of grey).
image

For this:

  • Use a conductance matrix where:

We gave maximum levels of conductance (1) from the minimal (3010) to the maximum altitudes (3386) of our sampling. Altitudes outside of our sampling area but where Abies can still be found were assumed to allow conductance to a lesser degree (.2). Altitudes outside our sampling and Abies range were allowed no conductance (0).

4000,4457,0
3386,4000,0.2
3010,3386,1
2000,3010,0.2
0,2000,0

  • Reclassify raster given the prev matrix
  • Run Circuitscape
  • Run glm test

Double check that conductances are correctly assigned to colors en raster plot

In the script 2.PlotRasters.R I assigned a grey color depending on the conductance, as follow:

0 = white
.1 ="grey90"
.2 = "grey80"
.3 = "grey70"
.5 = "grey50"
.7 = "grey30"
.9 = "grey10"
1 = "black"

We have to manually double check I did this correctly for each of the surfaces, first: checking against the values of table S1, and second, checking that the code matches in to each value in the argument col= and breaks=. Notice that for the values different from 0 and 1 in breaks, the values should be set .01 less than the conductance value. And the number of breaks is equal to n +1 the number of conductance values.

Example of correct code:

# altitude A conductaces 0, .1, .2, .3, 1 
x<-raster("../spatial/Elevation/rcl_Alt_A.asc")
plot(x, col=c("white", "grey90", "grey80", "grey70" , "black"), 
     breaks=c(0, .09, .19, .29, .99, 1), # add breakpoints so colors correspond to conductances
     legend=FALSE, xaxt='n', yaxt='n')

Hipotesis J is missing form script 1.PrepareRastersNT.r

When making the plot for Fig S5 I realized the raster vegetation J is mentioned in Tabla S1 and its results en Table S2, S3 so you must have build it. However it is not in the script 1.PrepareRastersNT.r used to prepare the rasters and thus it is not available to plot it for Figure S5.

To do:

  • Add code to create Vegetation J raster in 1.PrepareRastersNT.r

  • Include Vegetation J raster to plot in script 2.PlotRasters.R

¿Como bajar datos COI arthropodos desde BOLD o NCBI al cluster, para usarse como "blast secuencias de referencia"?

Nesecito secuencias COI artropodos bajados de BOLD o NCBI al cluster. Secuencias que usaré para "blastear" mis bibliotecas de metabarcoding.

Desde la terminal usé el siguiente codigo:
wget http://v4.boldsystems.org/index.php/API_Public/sequence?marker=COI-5

Aunque solo baja cerca de 3 mil secuencias.

Cuando voy a la pagina: https://www.boldsystems.org/index.php/Public_SearchTerms
Busco "Arthropoda" en datos publicos secuencias, bajo cerca 4 gygas de secuencias.

¿Alguien puede orientarme?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.