GithubHelp home page GithubHelp logo

deep-gdae's Introduction

Deep-GDAE

Gene-Disease Association Extraction

Deep-GDAE integrates the specificities of a Convolution Neural Network (CNN) and an Attention-based Bidirectional Long Short-Term Memory Network to classify Gene-Disease Associations.

Deep-GDAE Corpus

Along with the benchmark dataset, we have generated a Gene-Disease Association Corpus using DisGeNET (database of GDAs) and PubTator (to retrieve biomedical texts). Using PubTator, we find all the PMIDs containing at least one gene and disease name. Then all the sentences are passed through three steps of filtering for producing the false instances. Samples of the true class are extracted from DisGeNET, considering only curated associations. Deep-GDAE Corpus contains 8000 sentences (4000 samples for True Associations and 4000 samples for False Associations) with 1904 and 3635 unique diseases and genes respectively.

Execution

1. Pre-trained word embedding models

Download one of the following pre trained word embedding files: Add the path of downloaded file to the preProcess notebooks (replace 'wefile' with your own path )

2. Run the preProcess notebooks to generate the required pickle files for training the model

3. Execute one of the benchmark datasets as listed here to verify the performance.

  • utils.ipynb contains the required methods which are called by other notebooks

1.[Befree].

  • preProcess.ipynb Reads the data set and creates the primitive features including word and position embeddings, and saves the required file for training as a pickle file.

  • BeFree-3class.ipynb Evaluation on the Genetic Association Database (GAD) : GAD is an archive of human genetic association studies of complex diseases and disorders.

  • BeFree-2class_EUADR.ipynb Evaluation on the EU-ADR dataset. It contains annotations on drugs, diseases, genes and proteins, and associations between them. Here we focus on gene disease associations.

2.[SNPPhenA corpus] corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature.

  • SNP.ipynb Results of prforming Deep-GDAE on the SNPPhenA corpus, which was developed with the purpose of extracting the ranked associations of SNPs and phenotypes from GWA studies.

  • SNP-Transfer Learning.ipynb We selected the SNP-phenotype dataset for transferring knowledge from the gene-disease domain. The rich features transferred from the base model can help to train the new model with SNP-phenotype sequences

deep-gdae's People

Contributors

esmaeilnourani avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.