GithubHelp home page GithubHelp logo

jungbluth / edna2obis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aomlomics/edna2obis

0.0 0.0 0.0 97.73 MB

Code to convert eDNA metabarcoding data to Darwin Core for OBIS

License: MIT License

Shell 0.19% Python 1.72% Jupyter Notebook 98.09%

edna2obis's Introduction

edna2obis workflow

Introduction

Rationale:

DNA derived data are increasingly being used to document taxon occurrences. To ensure these data are useful to the broadest possible community, GBIF published a guide entitled "Publishing DNA-derived data through biodiversity data platforms." This guide is supported by the DNA derived data extension for Darwin Core, which incorporates MIxS terms into the Darwin Core standard.

This use case draws on both the guide and the extension to develop a workflow for incorporating a DNA derived data extension file into a Darwin Core archive.

Project abstract:

Seawater was collected on board the NOAA ship Ronald H. Brown as part of the fourth Gulf of Mexico Ecosystems and Carbon Cycle (GOMECC-4) cruise from September 13 to October 21, 2021. Sampling for GOMECC-4 occurred along 16 coastal-offshore transects across the entire Gulf of Mexico and an additional line at 27N latitude in the Atlantic Ocean. We also collected eDNA samples near Padre Island National Seashore (U.S. National Parks Service), a barrier island located off the coast of south Texas. Vertical CTD sampling was employed at each site to measure discrete chemical, physical, and biological properties. Water sampling for DNA filtration was conducted at 54 sites and three depths per site (surface, deep chlorophyll maximum, and near bottom) to capture horizontal and vertical gradients of bacterial, protistan, and metazoan diversity across the Gulf. The resulting ASVs, their assigned taxonomy, and the metadata associated with theircollection are the input data for the OBIS conversion scripts presented here.

Published data

NOAA Omics MIMARKS-based metadata template

This code was developed to convert a custom Google Sheet metadata template developed by NOAA Omics at AOML. To use the sheet for your own data, copy the Google Sheet to your Google Drive. Note that we have not tested the data validation functionality when downloading the Google Sheet to use as an Excel file.

AOML_MIMARKS.survey.water.6.0 v1.0.2

Requirements

  • Python 3
  • Python 3 packages:
    • os
  • External packages:
    • Bio.Entrez from biopython
    • numpy
    • pandas
    • openpyxl
    • pyworms
    • multiprocess
  • Custom modules:
    • WoRMS_matching

Repo structure

.
+-- README.md                   :Description of this repository
+-- LICENSE                     :Repository license
+-- .gitignore                  :Files and directories to be ignored by git
|
+-- raw
|   +-- gomecc-16S-asv.tsv                 :Source data containing 16S ASV sequences, taxon matches, and number of reads
|   +-- gomecc-18S-asv.tsv                 :Source data containing 18S ASV sequences, taxon matches, and number of reads
|   +-- gomecc4_AOML_MIMARKS.survey.water.6.0.xlsx     :Source data containing metadata about samples, DNA preparation, and analysis
|   +-- gomecc_AOML2DwC standards.xlsx    :Data dictionary file mapping metadata terms to DarwinCore
|
+-- src
|   +-- edna2obis_conversion_code.ipynb  :Darwin Core mapping Jupyter Notebook
|   +-- WoRMS_matching.py                :Functions for querying the World Register of Marine Species
|   +--edna2obis_conversion_code.md      :Markdown version on conversion code, for viewing
|
+-- processed
|   +-- occurrence.csv          :Occurrence file, generated by edna2obis_conversion_code
|   +-- dna_extension.csv       :DNA Derived Data Extension file, generated by edna2obis_conversion_code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.