GithubHelp home page GithubHelp logo

adamklie / predictmee Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 85.07 MB

Predicting missing metadata with recurrent neural network (RNNs) based entity extraction

Jupyter Notebook 100.00%
neural-networks rnns lstms metadata-extraction sra

predictmee's Introduction

predictMEE - Predicting Missing Metadata with Entity Extraction

Requirements

Conda or miniconda installation

The predictMEE model and analysis workflow requires a variety of packages to be installed prior to running the code. The easiest way to install all the necessary packages is by installing the Anaconda3 or minconda3 python package manager.

Data

The data download script to automate the download and preprocessing of the SRA attribute-value pairs is still a work in progress. For now you can find the data that was used here.

Word Embedding Model

The downloadable word2vec model can be found here

Installation

Substituting your GH username below, you can clone this repo to the curent directory with

git clone https://[email protected]/aklie/predictMEE.git

Configuring environment

Then, install the required packages with the following commands

cd predictMEE/config
conda env create -f deep_nlp_cpu.yml  # Load the envrionment
conda activate deep_nlp_cpu  # Activate the environment

If you are planning on recapitulatiing the full analysis, you will need to mimic the file structure shown below.

├── bin
│   ├── dataLandscapeSRA.ipynb
│   ├── downloadData.ipynb
│   ├── evaluateModel.ipynb
│   ├── evaluatePrediction.ipynb
│   ├── generateTestSet.ipynb
│   ├── mergeAttributes.ipynb
│   ├── predictMetadata.ipynb
│   └── trainModels.ipynb
├── config
│   └── deep_nlp_cpu.yml
├── data
│   ├── allSRS_05_15_2018.pickle
│   ├── BioSampleAttributes.pickle
│   ├── BioSampleAttributes.xml
│   ├── sra_dump.pickle
│   └── wikipedia-pubmed-and-PMC-w2v
├── doc
│   ├── figures
│   ├── submission
│   └── tables
├── models
├── README.md
└── results
    ├── embedding
    ├── prediction
    ├── training
    └── validation

Running notebooks

Certain notebooks require data and output from other notebooks. In order to run the analysis as was completed for the paper cited below, run the notebooks in the following order.

  1. downloadData.ipynb
  2. dataLandscapeSRA.ipynb
  3. mergeAttributes.ipynb
  4. generateTestSet.ipynb
  5. trainModels.ipynb
  6. evaluateModel.ipynb
  7. predictMetadata.ipynb
  8. evaluatePrediction.ipynb

Citation

Klie A, Tsui BY, Mollah S, Skola D, Dow M, Hsu C-N, et al. Increasing metadata coverage of SRA BioSample entries using deep learning-based named entity recognition. Database. 2021;2021. doi:10.1093/database/baab021

predictmee's People

Contributors

adamklie avatar

Watchers

 avatar  avatar

predictmee's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.