GithubHelp home page GithubHelp logo

kilicogluh / lbd-covid Goto Github PK

View Code? Open in Web Editor NEW
27.0 9.0 16.0 27.81 MB

Drug repurposing for COVID-19 using literature-based discovery

Python 0.76% Jupyter Notebook 85.02% Shell 0.15% Roff 13.43% Awk 0.63%

lbd-covid's Introduction

Drug repurposing for COVID-19 using literature-based discovery

This repository contains source code related to the publication

Zhang, R., Hristovski, D., Schutte, D., Kastrin, A., Fiszman, M., & Kilicoglu, H. (2021). Drug repurposing for COVID-19 via knowledge graph completion. Journal of Biomedical Informatics, 115, 103696. https://doi.org/10.1016/j.jbi.2021.103696

Prerequisites

  • Python 3.6 with packages lxml, numpy, and pandas
  • Perl 5 with module Text::NSP
  • AWK

Directory Structure

  • ./data directory contains input files
  • ./preprocessing directory contains scripts for preparing data
  • ./filtering directory contains scripts for filtering predications with BERT
  • ./models directory contains scripts for knowledge graph completion
  • ./predictions directory contains output files from graph completion models

Usage

  1. Download and set up SemMedDB
  2. Create ./data directory in project's root folder
  3. Prepare sub_rel_obj_pyear_edat_pmid_sent_id_sent.tsv.gz file and place it into the ./data/SemMedDB directory
  4. Download SemRepped CORD-19 dataset and extract files into ./data/cord-19 directory
  5. Prepare SemMedDB and CORD-19 data using the ./preprocessing/run.sh file
  6. Run Python notebooks in the ./filtering directory
  7. Run Python notebooks in the ./models directory

Contact

Halil Kilicoglu (halil (at) illinois.edu)

lbd-covid's People

Contributors

akastrin avatar daltonsumn avatar kilicogluh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lbd-covid's Issues

Unable to access count.pl and statistics.pl in preprocessing/run.sh

https://github.com/kilicogluh/lbd-covid/tree/master/preprocessing

For preprocessing, the given code asks you to run "run.sh", where two pearl scripts are called for before finding concept_degree i.e. count.pl and statistics.pl . In order to run the compute_score script, the script takes concept degree and stats as input to calculate the score. I was wondering if the scripts were missing or was not provided for other reasons. Also, it would be helpful if you could provide if there are any example files for extracted sub_rel_obj, stats, count, and concept degree from the dataset that you used.

KG Download

Hi, could you please provide the download link of your KG?

Filtering out non-informative concepts and semantic relations UNI-LJ

Filter out non-informative concepts and semantic relations. Obviously, we are already having two filtering scripts (UNI-LJ and UMN). This issue is for the UNI-LJ group. At some stage, we will have to merge them together or select one of them.

The filtering script should be flexible with an external JSON file where the filtering constraints are specified and can be modified without changing the main filtering script.

Further filtering of SemMedDB43 arguments (concepts)

In addition to filtering according to the GENERIC concepts table in SemMedDB43, we can further filter out concepts (arguments of semantic relations) that we believe are still too general. Below are two lists. They both have the same format. The counts(frequencies) correspond to the degrees of the nodes (sum of in+out degrees).

Please have a look at these files and suggest which concepts (arguments) we should eliminate by adding them to the GENERIC concepts list. Of course, all the semantic relations where they appear should be filtered out, too. @kilicogluh , @daltonsUMN , @akastrin @DimitarH @[email protected] @[email protected]

The files are TAB delimited with these fields:

  1. CUI
  2. Concept name
  3. Semantic type abbreviation(s), if more than 1, then delimited with ";"
  4. Number of semantic relation instances in which the concept occurs as an argument
  5. Number of (aggregated) semantic relations in which the concept occurs as an argument

Links to the files:

  • Ordered by descending semantic relation occurrence. download link

  • Ordered by descending semantic relation instance occurrence count. download link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.