Light

kilicogluh / lbd-covid Goto Github PK

View Code? Open in Web Editor NEW

27.0 9.0 16.0 27.81 MB

Drug repurposing for COVID-19 using literature-based discovery

Python 0.76% Jupyter Notebook 85.02% Shell 0.15% Roff 13.43% Awk 0.63%

lbd-covid's Introduction

Drug repurposing for COVID-19 using literature-based discovery

This repository contains source code related to the publication

Zhang, R., Hristovski, D., Schutte, D., Kastrin, A., Fiszman, M., & Kilicoglu, H. (2021). Drug repurposing for COVID-19 via knowledge graph completion. Journal of Biomedical Informatics, 115, 103696. https://doi.org/10.1016/j.jbi.2021.103696

Prerequisites

Python 3.6 with packages lxml, numpy, and pandas
Perl 5 with module Text::NSP
AWK

Directory Structure

./data directory contains input files
./preprocessing directory contains scripts for preparing data
./filtering directory contains scripts for filtering predications with BERT
./models directory contains scripts for knowledge graph completion
./predictions directory contains output files from graph completion models

Usage

Download and set up SemMedDB
Create ./data directory in project's root folder
Prepare sub_rel_obj_pyear_edat_pmid_sent_id_sent.tsv.gz file and place it into the ./data/SemMedDB directory
Download SemRepped CORD-19 dataset and extract files into ./data/cord-19 directory
Prepare SemMedDB and CORD-19 data using the ./preprocessing/run.sh file
Run Python notebooks in the ./filtering directory
Run Python notebooks in the ./models directory

Contact

Halil Kilicoglu (halil (at) illinois.edu)

lbd-covid's People

Contributors

Stargazers

Watchers

Forkers

nlptechx jbdatascience gazzola yl2565 zhangpl109 akastrin viniciusbeckerdesouza pj0616 hell-to-heaven nguyendinhlam88 kingfish777 hegdean mercy-dol yidesdo97 nike-adidas ollawone

lbd-covid's Issues

Unable to access count.pl and statistics.pl in preprocessing/run.sh

https://github.com/kilicogluh/lbd-covid/tree/master/preprocessing

For preprocessing, the given code asks you to run "run.sh", where two pearl scripts are called for before finding concept_degree i.e. count.pl and statistics.pl . In order to run the compute_score script, the script takes concept degree and stats as input to calculate the score. I was wondering if the scripts were missing or was not provided for other reasons. Also, it would be helpful if you could provide if there are any example files for extracted sub_rel_obj, stats, count, and concept degree from the dataset that you used.

KG Download

Hi, could you please provide the download link of your KG?

Filtering out non-informative concepts and semantic relations UNI-LJ

Filter out non-informative concepts and semantic relations. Obviously, we are already having two filtering scripts (UNI-LJ and UMN). This issue is for the UNI-LJ group. At some stage, we will have to merge them together or select one of them.

The filtering script should be flexible with an external JSON file where the filtering constraints are specified and can be modified without changing the main filtering script.

Where is sub_rel_obj_pyear_edat_pmid_sent_id_sent.tsv.gz?

Can a link be provided from where we can download this file?

Further filtering of SemMedDB43 arguments (concepts)

In addition to filtering according to the GENERIC concepts table in SemMedDB43, we can further filter out concepts (arguments of semantic relations) that we believe are still too general. Below are two lists. They both have the same format. The counts(frequencies) correspond to the degrees of the nodes (sum of in+out degrees).

Please have a look at these files and suggest which concepts (arguments) we should eliminate by adding them to the GENERIC concepts list. Of course, all the semantic relations where they appear should be filtered out, too. @kilicogluh , @daltonsUMN , @akastrin @DimitarH @[email protected] @[email protected]

The files are TAB delimited with these fields:

CUI
Concept name
Semantic type abbreviation(s), if more than 1, then delimited with ";"
Number of semantic relation instances in which the concept occurs as an argument
Number of (aggregated) semantic relations in which the concept occurs as an argument

Links to the files:

Ordered by descending semantic relation occurrence. download link
Ordered by descending semantic relation instance occurrence count. download link

statistic.pl and count.pl files are missing

Where can I find the statistic.pl and count.pl files?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs