CoronaWhy.org is a global volunteer organization dedicated to driving actionable insights into significant world issues using industry-leading data science, artificial intelligence and knowledge sharing. CoronaWhy was founded during the 2020 COVID-19 crisis, following a White House call to help extract valuable data from more than 50,000 coronavirus-related scholarly articles, dating back decades. Currently at over 1000 volunteers, CoronaWhy is composed of data scientists, doctors, epidemiologists, students, and various subject matter experts on everything from technology and engineering to communications and program management.
Read about our creations before you start.
-
Task-Risk helps to identify risk factors that can increase the chance of being infected, or affects the severity or the survival outcome of the infection
-
Task-Ties to explore transmission, incubation and environment stability
-
Named Entity Recognition across the entire corpus of CORD-19 papers with full text
-
Match Clinical Trials allows exploration of the results from the COVID-19 International Clinical Trials dataset
-
COVID-19 Literature Visualization helps to explore the data behind the AI-powered literature review
More detailed information about every dashboard published on Kaggle.
Download COVID-19 Open Research Dataset Challenge (CORD-19) from Kaggle
bash ./download_dataset.sh
Start Jupyter by executing
docker-compose up
Jupyter notebook is running on port 8888, test CORD-19 pipeline by running commands:
docker cp ./tests covid-19-infrastructure_jupyter_1:/home/jovyan/
docker exec -it covid-19-infrastructure_jupyter_1 /bin/bash
pip install googletrans
cd tests
python ./cord-processing.py
It should produce v12* files in the same folder. File v12_sentences.json contains all extracted entities on sentences level corresponding to CoronaWhy Elasticsearch collection.
Follow all updates from our YouTube and CoronaWhy Github
How to access Elasticsearch and Dataverse, notebook
CoronaWhy Elasticsearch Tutorial notebook
How to Create Knowledge Graph, notebook
Dataverse Colab Connect, notebook
GitHub dataset sync with Dataverse, notebook
You can connect your notebooks to the number of services listed below, all services coming from CoronaWhy Labs have an experimental status. Join the fight against COVID-19 if you want to help us!
Dataverse deployed as a data service on https://datasets.coronawhy.org Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others.
CoronaWhy Elasticsearch has CORD-19 indexes on sentences level and available at CoronaWhy Search
Available indexes:
MongoDB service deployed on mongodb.coronawhy.org and available from CoronaWhy Labs Virtual Machines. Please contact our administrators if you want to use it.
Our Hypothesis annotation service is running on hypothesis.labs.coronawhy.org and allows to manually annotate CORD-19 papers. Please try our Hypothesis Demo if you're interested.
We are providing Virtuoso as a service with public SPARQL Endpoint that offers an HTTP-based Query Service that operates on Entity Relationship Types (Relations) represented as RDF sentence collections using the SPARQL Query Language. https://virtuoso.openlinksw.com
You can run a simple SPARQL query to get some overview of triples from CoronaWhy Knowledge Graph.
Kibana deployed as a community service connected to CoronaWhy Elasticsearch on https://kibana.labs.coronawhy.org Allows to visualize Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps. https://www.elastic.co/kibana
BEL Commons 3.0 available as a service https://bel.labs.coronawhy.org
An environment for curating, validating, and exploring knowledge assemblies encoded in Biological Expression Language (BEL) to support elucidating disease-specific, mechanistic insight.
You can watch the introduction video and read Corona BEL Tutorial if you want to know more.
Indra will deployed as a service on https://labs.coronawhy.org/indra (in development).
INDRA (Integrated Network and Dynamical Reasoning Assembler) generates executable models of pathway dynamics from natural language (using the TRIPS and REACH parsers), and BioPAX and BEL sources (including the Pathway Commons database and NDEx.
Geoparser as a service https://geoparser.labs.coronawhy.org
The Geoparser is a software tool that can process information from any type of file, extract geographic coordinates, and visualize locations on a map. Users who are interested in seeing a geographical representation of information or data can choose to search for locations using the Geoparser, through a search index or by uploading files from their computer. https://github.com/nasa-jpl-memex/GeoParser
Tabula allows you to extract data from PDF files into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. We deployed it as a CoronaWhy service available for all community members. More information at Tabula website.
We use Teamchatviz to explore how communication works in our distributed team and learn how communication shapes culture in CoronaWhy community. https://moovel.github.io/teamchatviz/
We are working on the deployment Neo4j graph database.
I’m an AI researcher and here’s how I fight corona by Artur Kiulian
Exploration of Document Clustering with SPECTER Embeddings by Brandon Eychaner
COVID-19 Research Papers Geolocation by Ishan Sharma