covid-19-net / covid-19-community Goto Github PK

Community effort to build a Neo4j Knowledge Graph (KG) that links heterogeneous data about COVID-19

License: MIT License

Shell 0.03% Jupyter Notebook 99.97%

graphs4good covid-19 coronavirus knowledge-graph jupyter binder neo4j

covid-19-community's Introduction

Covid-19-Community

This project is a community effort to build a Neo4j Knowledge Graph (KG) that integrates heterogeneous biomedical and environmental datasets to help researchers analyze the interplay between host, pathogen, the environment, and COVID-19.

Knowledge Graph Schema

Location Subgraph: This subgraph represents the geographic hierarchy from the world to the city level (population > 1000), as well as PostalCode (US ZIP) and US Census Tract level. Each geographic node has a Location label (not shown), to simplify finding locations without specifying a specific level in the geographic hierarchy.

Epidemiology Subgraph: This subgraph represents COVID-19 cases including information about viral strains, and the pathogen and host organism. Cases and Strains are linked to the locations where they were reported and found, respectively.

Biology Subgraph: This subgraph represents organism, genome, chromosome, gene, variant, protein, protein structure, protein domain, protein family, pathogen-host protein-protein interactions, and links to publications.

Population Characteristics Subgraph: This subgraph represents data from the American Community Survey 2018 5-year estimates. Selected population characteristics that may be risk factors for COVID-19 infections have been included. These data are currently available at three geographic levels: US Counties (Admin2), US ZIP Codes (PostalCode), and US Census Tract (Tract).

Note, this KG is work in progress and changes frequently.

Browse the Knowledge Graph with the Neo4j Browser

The Knowledge Graph is updated daily approximately between 07:00 and 09:00 UTC.

View of Neo4j Browser showing the result of a query about interactions of the Spike glycoprotein with human host proteins and related publications in PubMedCentral.

You can browse the Knowledge Graph here (click the launch button and follow the instructions below)

Run a Full-text Query

Full-text queries enable a wide range of search options including exact phrase queries, fuzzy and wildcard queries, range queries, regular expression queries, and use of boolean expressions (see tutorial on FulltextQuery).

The KG can be searched by the following full-text indices:

bioentities Organism, Genome, Chromosome, Gene, GeneName, Protein, ProteinName, ProteinDomain, ProteinFamily, Structure, Chain, Outbreak, Strain, Variant, Publication

bioids keyword (exact) query for bioentity identifiers (e.g., id, taxonomyId, accession, proId, genomeAccession, doi, variantType, variantConsequence)

sequences full-text and regular expression query for protein sequences

locations UNRegion, UNSubRegion, UNIntermediateRegion, Country, Admin1, Admin2, USRegion, USDivision, City, PostalCode, Tract, CruiseShip

geoids keyword (exact) query for geographic identifiers (e.g., zip codes, fips codes, country iso codes)

Full-text queries have the following format:

CALL db.index.fulltext.queryNodes('<type of entity>', '<text query>') YIELD node, score

The queries return the node and score for each match (higher scores indicate closer matches).

Example full-text query for `bioentities` for proteins that contain the word spike in the name

Query: (copy and paste into Neo4j browser)

CALL db.index.fulltext.queryNodes("bioentities", "spike") YIELD node
WHERE 'Protein' IN labels(node) // only return nodes with the label Protein
RETURN node

Result:

The full-text query matches several Spike proteins from several coronaviruses. The SARS-CoV-2 Spike glycoprotein (uniprot:P0DTC2) is highlighted in the center with its four cleavage products: Spike glycoprotein without signal peptide (uniprot.chain:PRO_0000449646), Spike protein S1 (uniprot.chain:PRO_0000449647), Spike protein S2 (uniprot.chain:PRO_0000449648), and Spike protein S2' (uniprot.chain:PRO_0000449649) linked by a CLEAVED_BY relationship.

Example full-text query: find spike proteins - tabular results

The following query returns the names of the matched bioentities and the labels of the nodes (e.g., Protein, ProteinName) sorted by the match score in descending order.

Query:

CALL db.index.fulltext.queryNodes("bioentities", "spike") YIELD node, score
RETURN node.name, labels(node), score

Result:

Run a Cypher Query

Specific Nodes and Relationships in the KG can be searched using the Cypher query language.

Example Cypher query: find viral strains collected in Houston

Query: (limited to 10 hits)

MATCH (s:Strain)-[:FOUND_IN]->(l:Location{name: 'Houston'}) RETURN s, l LIMIT 10

Result:

This subgraph shows viral strains (green) of the SARS-CoV-2 virus carried by human hosts in Houston (organisms in gray). The strains have several variants (e.g., mutations)(red) in common. Details of the high-lighted variant is shown at the bottom. This variant is a missense mutation in the S gene (S:c.1841gAt>gGt): the base "A" (Adenosine) found in the Wuhan-Hu-1 reference genome NC_45512 was mutated to a "G" (Guanine) at position 23403, resulting in the encoded Spike glycoprotein (QHD43416) to be changed from a "D" (Aspartic acid) to a "G" (Glycine) amino acid at position 614 (QHD43416.1:p.614D>G).

Example Cypher query: aggregate cummulative COVID-19 case numbers at the US state (Admin1) level

Query:

MATCH (o:Outbreak{id: "COVID-19"})<-[:RELATED_TO]-(c:Cases{date: date("2020-08-31"), source: 'JHU'})-[:REPORTED_IN]->(a:Admin2)-[:IN]->(a1:Admin1)
RETURN a1.name as state, sum(c.cases) as cases, sum(c.deaths) as deaths
ORDER BY cases DESC;

Result:

Note, some cases in the COVID-19 Data Repository by Johns Hopkins University cannot be mapped to a county or state location (e.g., correctional facilities, missing location data). Therefore, the results of this query will underreport the actual number of cases.

Query the Knowledge Graph in Jupyter Notebook

Cypher queries can be run in Jupyter Notebooks to enable reproducible data analyses and visualizations.

You can run the following Jupyter Notebooks in your web browser:

NOTE: Authentication is now required to launch binder! Sign into GitHub from your browser, then click on the launch binder badge below to launch Jupyter Lab.

Pangeo Binder is unsupported and may not always be available or slow. Launching Jupyter Lab may take a few minutes.

Once Jupyter Lab launches, navigate to the notebooks/queries and notebooks/analysesdirectory and run the following notebooks:

Notebook	Description
FulltextQuery	Runs example fulltext queries
CaseCounts	Runs example queries for case counts
Locations	Runs example queries for locations
Demographics	Runs example queries for demographics data from the American Community Survey
SocialCharacteristics	Runs example queries for social characteristics from the American Community Survey
EconomicCharacteristics	Runs example queries for economic characteristics from the American Community Survey
Housing	Runs example queries for housing characteristics from the American Community Survey
Bioentities	Runs example queries for bioentities
EmergingStrains	Analyze emerging SARS-CoV-2 Strains
EmergingStrainsInLiterature	Analyze emerging SARS-CoV-2 Strains based on mentioning in the Literature
StrainB.1.1.7	Analyze B.1.1.7 Strain
AnalyzeVariantsSpikeGlycoprotein	Analyze SARS-CoV-2 Spike Glycoprotein Variants
Coronavirus3DStructures	Inventory of coronavirus 3D protein structures
GraphVisualization	Demo of graph visualization with Cytoscape
MapMutationsTo3D	Map mutations from SARS-CoV-2 strains to 3D Structures
RiskFactorsByStateCounty	Explore Risk Factors for COVID-19 for Counties in US States
RiskFactorsSanDiegoCounty	Explore Risk Factors for COVID-19 for San Diego County
CovidRatesByStates	Explore COVID-19 confirmed cases and death rates for states in a selected country
...	add examples here ...

Run Jupyter Notebook Examples locally

To run the example notebooks on your laptop or desktop computer, follow the steps below.

Prerequisites: Miniconda3 (light-weight, preferred) or Anaconda3 and Mamba

Install Miniconda3
Update an existing miniconda3 installation: conda update conda
Install Mamba: conda install mamba -n base -c conda-forge
Install Git (if not installed): conda install git -n base -c anaconda

1. Clone this Git repository

git clone https://github.com/covid-19-net/covid-19-community.git
cd covid-19-community

2. Create a conda environment

The file environment.yml specifies the Python version and all packages required by the tutorial.

mamba env create -f environment.yml

Activate the conda environment

conda activate covid-19-community

3. Launch Jupyter Lab

jupyter lab

Navigate to the notebooks/queries directory to run the example Jupyter Notebooks and notebooks/analyses directory to run analyses.

4. Deactivate the conda environment

When you are finished with your analysis, deactivate the conda environment or close the terminal window.

conda deactivate

To remove the CONDA environment, run conda env remove -n covid-19-community

Run Jupyter Notebook Examples on SDSC Expanse

To launch Jupyter Lab on Expanse, use the galyleo script. Specify your XSEDE account number with the --account option.

Clone this git repository

git clone https://github.com/covid-19-net/covid-19-community.git

Start an interactive session with the galyleo script

This script will generate a URL for your Jupyter Lab session.

galyleo launch --account <account_number> --partition shared --cpus 8 --memory 16 --time-limit 01:00:00 --conda-env covid-19-community --conda-yml "${HOME}/covid-19-community/environment.yml"  --mamba

Launch Juypter Lab

Open a new tab in your web browser and paste the Jupyter Lab URL You should see the Satellite Reserver Proxy Servive page launch in your browser. Wait until Jupyter Lab launches. This may take a few minutes.
End the interactive session

From the Jupyter Lab file menu, choose Shutdown to terminate the session.

Data Download, Preparation, and Integration

COVID-19-Net Knowledge Graph is created from publically available resources, including databases, files, and web services. A reproducible workflow, defined in this repository, is used to run a daily update of the knowledge graph. The Jupyter notebooks listed in the table below download, clean, standardize, and integrate data in the form of .csv files for ingestion into the Knowledge Graph. The prepared data files are saved in the NEO4J_HOME/import directory and cached intermediate files are saved in the NEO4J_HOME/import/cache directory. These notebooks are run daily at 07:00 UTC in batch using Papermill with the update script to download the latest data and update the Knowlege Graph.

Notebook	Description
00b-NCBITaxonomy	Downloads the NCBI taxonomy for a subset of organisms
00b-PANGOLineage	Downloads the PANGO lineage designations for SARS-CoV-2
00e-GeoNamesCountry	Downloads country information from GeoNames.org
00f-GeoNamesAdmin1	Downloads first administrative divisions (State, Province, Municipality) information from GeoNames.org
00g-GeoNamesAdmin2	Downloads second administrative divisions (Counties in the US) information from GeoNames.org
00h-GeoNamesCity	Downloads city information (population > 1000) from GeoNames.org
00i-USCensusRegionDivisionState2017	Downloads US regions, divisions, and assigns state FIPS codes from the US Census Bureau
00j-USCensusCountyCity2017	Downloads US County FIPS codes from the US Census Bureau
00k-UNRegion	Downloads UN geographic regions, subregions, and intermediate region information from United Nations
00m-USHUDCrosswalk	Downloads mappings of US Census tracts to US Postal Service ZIP codes and US Counties
00n-GeoNamesData	Downloads longitude, latitude, elevation, and population data from GeoNames.org
00o-GeoNamesPostalCode	Downloads US zip code, place name, latitude, longitude data from GeoNames.org
01a-UniProtGene	Downloads chromosome and gene information from UniProt
01a-UniProtProtein	Downloads protein information from UniProt
01b-NCBIGeneProtein	Downloads gene and protein information from NCBI
01c-CNCBStrain	Downloads SARS-CoV-2 viral strain metadata from CNCB (China National Center for Bioinformation)
01c-CNCBVariation	Downloads variant data from CNCB (China National Center for Bioinformation)
01d-Nextstrain	Downloads the SARS-CoV-2 strain metadata from Nextstrain
01e-ProteinProteinInteraction	Downloads SARS-CoV-2 - human protein interaction data from IntAct
01f-PDBStructure	Downloads 3D protein structures from the Protein Data Bank
01g-PfamDomain	Downloads mappings between PDB protein chains and Pfam domains
01h-CORDLineages	Maps publications and preprints in the CORD-19 data set to PANGO lineages
01h-PublicationLink	Downloads mappings between datasets and publications indexed by PubMed Central (PMC) and Preprints (PPR) and PubMed (PM)
02a-JHUCases	Downloads cummulative confimed cases and deaths from the COVID-19 Data Repository by Johns Hopkins University
02a-JHUCasesLocation	Standardizes location data for the COVID-19 Data Repository by Johns Hopkins University
02c-SDHHSACases	Downloads cummulative confirmed COVID-19 cases from the County of San Diego, Health and Human Services Agency
03a-USCensusDP02Education	Downloads social characteristics (DP02) from the American Community Survey 5-Year Data 2018
03a-USCensusDP02Computers	Downloads social characteristics (DP02) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Commuting	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Employment	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03HealthInsurance	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Income	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Income	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Occupation	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP03Poverty	Downloads economic characteristics (DP03) from the American Community Survey 5-Year Data 2018
03a-USCensusDP04	Downloads housing (DP04) from the American Community Survey 5-Year Data 2018
03a-USCensusDP05	Downloads demographic data estimates (DP05) from the American Community Survey 5-Year Data 2018
...	Future notebooks that add new data to the knowledge graph

How to run the Data Download and Preparation steps locally

Note, the following steps have been implemented for MacOS and Linux only.

Several data sources have changed or have become unavailable. Some of the preparation notebooks may not work.

Prerequisites: Miniconda3 (light-weight, preferred) or Anaconda3 and Mamba

Install Miniconda3
Install Mamba: conda install mamba -n base -c conda-forge

1. Fork this project

A fork is a copy of a repository in your GitHub account. Forking a repository allows you to freely experiment with changes without affecting the original project.

In the top-right corner of this GitHub page, click Fork.

Then, download all materials to your laptop by cloning your copy of the repository, where your-user-name is your GitHub user name. To clone the repository from a Terminal window or the Anaconda prompt (Windows), run:

git clone https://github.com/your-user-name/covid-19-community.git
cd covid-19-community

2. Create a conda environment

The file environment-prep.yml specifies the Python version and all packages required for the data preparation steps.

mamba env create -f environment-prep.yml

Activate the conda environment

conda activate covid-19-community-prep

3. Install Neo4j Desktop

Download Neo4j

Then, launch the Neo4j Browser, create an empty database, set the password to "neo4jbinder", and close the database.

4. Set Environment Variable

Add the environment variable NEO4J_HOME with the path to the Neo4j database installation to your .bash_profile file, e.g.

export NEO4J_HOME="/Users/username/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-.../installation-4.0.3"

Add the environment variable NEO4J_IMPORT with the path to the Neo4j database import directory to your .bash_profile file, e.g.

export NEO4J_IMPORT="/Users/username/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-.../installation-4.0.3/import"

5. Run Data Download Notebooks

Start Jupyter Lab.

jupyter lab

Navigate to the (notebooks/dataprep/) directory and run all notebooks in alphabetical order to download, clean, standardize and save the data in the NEO4J_HOME/import directory for ingestion into the Neo4j database.

6. Upload Data into a Local Neo4j Database

Afer all data files have been created in step 5, run (notebooks/local/2-CreateKGLocal.ipynb to import the data into your local Neo4j database. Make sure the Neo4j Browser is closed before running the database import!

7. Browse local KG in Neo4j Browser

After step 6 has completed, start the database in the Neo4j Browser to interactively explore the KG or run local queries.

How can you contribute?

File an issue to discuss your idea so we can coordinate efforts
Help with specific issues
Suggest publically accessible data sets
Add Jupyter Notebooks with data analyses, maps, and visualizations
Report bugs or issues

Citation

Peter W. Rose, David Valentine, Ilya Zaslavsky, COVID-19-Net: Integrating Health, Pathogen and Environmental Data into a Knowledge Graph for Case Tracking, Analysis, and Forecasting. Available online: https://github.com/covid-19-net/covid-19-community (2020).

Please also cite the data providers.

Data Providers

The schema below shows how data sources are integrated into the nodes of the Knowledge Graph.

Acknowledgements

Neo4j provided technical support and organized the community development: "GraphHackers, Let’s Unite to Help Save the World — Graphs4Good 2020".

Students of the UCSD Spatial Data Science course DSC-198: EXPLORING COVID-19 PANDEMIC WITH DATA SCIENCE

Contributors: Kaushik Ganapathy, Braden Riggs, Eric Yu

Alexander Din, U.S. Department of Housing and Urban Development, for help with HUD Crosswalk Files.

Project KONQUER team members at UC San Diego and UTHealth at Houston.

Project Pangeo hosts a Binder instance used to launch Jupyter Notebooks on the web. Pangeo is supported, in part, by the National Science Foundation (NSF) and the National Aeronautics and Space Administration (NASA). Google provided compute credits on Google Compute Engine.

Funding

Development of this prototype is in part supported by the National Science Foundation under Award Numbers:

NSF Convergence Accelerator Phase I (RAISE): Knowledge Open Network Queries for Research (KONQUER) (1937136)

NSF RAPID: COVID-19-Net: Integrating Health, Pathogen and Environmental Data into a Knowledge Graph for Case Tracking, Analysis, and Forecasting (2028411)

covid-19-community's People

Contributors

Stargazers

Watchers

covid-19-community's Issues

VM that hosts Neo4j DB is down

Binder: fails to build ipycytoscape

Failed to build ipycytoscape
Pip subprocess error:
Running command git clone -q https://github.com/pwrose/ipycytoscape.git /tmp/pip-req-build-_8vpzekr
ERROR: Command errored out with exit status 1:
command: /srv/conda/envs/notebook/bin/python /srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpdkkmg4qe
cwd: /tmp/pip-req-build-_8vpzekr
Complete output (122 lines):
running bdist_wheel
running jsdeps
Installing build dependencies with npm. This may take a while...

npm install

added 1246 packages, and audited 1313 packages in 44s

21 vulnerabilities (13 low, 5 moderate, 3 high)

To address issues that do not require attention, run:
npm audit fix

To address all issues (including breaking changes), run:
npm audit fix --force

Run npm audit for details.
npm notice
npm notice New minor version of npm available! 7.0.8 -> 7.11.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v7.11.0
npm notice Run npm install -g [email protected] to update!
npm notice

npm run build

[email protected] build
npm run build:lib && npm run build:all

[email protected] build:lib
tsc

[email protected] build:all
npm run build:labextension && npm run build:nbextension

[email protected] build:labextension
npm run clean:labextension && jupyter labextension build .

[email protected] clean:labextension
rimraf ipycytoscape/labextension

Traceback (most recent call last):
File "/tmp/pip-build-env-lgjks042/overlay/bin/jupyter-labextension", line 5, in
from jupyterlab.labextensions import main
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyterlab/init.py", line 7, in
from .labapp import LabApp
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyterlab/labapp.py", line 14, in
from jupyter_server._version import version_info as jpserver_version_info
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_server/init.py", line 15, in
from ._version import version_info, version
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_server/_version.py", line 5, in
from jupyter_packaging import get_version_info
ImportError: cannot import name 'get_version_info' from 'jupyter_packaging' (/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/init.py)
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run clean:labextension && jupyter labextension build .

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_862Z-debug.log
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run build:labextension && npm run build:nbextension

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_889Z-debug.log
npm notice
npm notice New minor version of npm available! 7.0.8 -> 7.11.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v7.11.0
npm notice Run npm install -g [email protected] to update!
npm notice
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run build:lib && npm run build:all

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_919Z-debug.log
/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/dist.py:645: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
% (opt, underscore_opt))
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in
main()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(hook_input['kwargs'])
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 205, in build_wheel
metadata_directory)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 222, in build_wheel
wheel_directory, config_settings)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 207, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 150, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 123, in
setup(setup_args)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(attrs)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 503, in run
[self.run_command(cmd) for cmd in cmds]
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 503, in
[self.run_command(cmd) for cmd in cmds]
File "/srv/conda/envs/notebook/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 274, in run
c.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 381, in run
run(npm_cmd + ['run', build_cmd], cwd=node_package)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 225, in run
return subprocess.check_call(cmd, kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/srv/conda/envs/notebook/bin/npm', 'run', 'build']' returned non-zero exit status 1.

ERROR: Failed building wheel for ipycytoscape
ERROR: Could not build wheels for ipycytoscape which use PEP 517 and cannot be installed directly

CondaEnvException: Pip failed

4.0 pyhd8ed1ab_0 conda-forge/noarch 36 KB
libgcc-ng 9.2.0 h24d8f2e_2 installed
libgcc-ng 9.3.0 h2828fa1_19 conda-forge/linux-64 8 MB
libgomp 9.2.0 h24d8f2e_2 installed
libgomp 9.3.0 h2828fa1_19 conda-forge/linux-64 376 KB
libstdcxx-ng 9.2.0 hdf63c60_2 installed
libstdcxx-ng 9.3.0 h6de172a_19 conda-forge/linux-64 4 MB
openssl 1.1.1g h516909a_0 installed
openssl 1.1.1k h7f98852_0 conda-forge/linux-64 2 MB
sqlite 3.32.3 hcee41ef_1 installed
sqlite 3.35.5 h74cdb3f_0 conda-forge/linux-64 1 MB
tornado 6.0.4 py37h8f50634_1 installed
tornado 6.1 py37h5e8e339_1 conda-forge/linux-64 646 KB

Summary:

Install: 177 packages
Upgrade: 9 packages

Total download: 544 MB

──────────────────────────────────────────────────────────────────────────────────────

time: 293.282
Removing intermediate container d26330eb3b1f
The command '/bin/sh -c TIMEFORMAT='time: %3R' bash -c 'time mamba env update -p ${NB_PYTHON_PREFIX} -f "binder/environment.yml" && time mamba clean --all -f -y && mamba list -p ${NB_PYTHON_PREFIX} '' returned a non-zero code: 1

USCensus zip level API returns an additional column: state

How to run these notebooks

I run the cypher queries with neo4j however, I am confused as to how to use notebooks, is my assumption correct that first we need to have the graphs ready by running all cypher queries and then connect the notebooks with the neo4j source to run them?

Database update failed

Fri Apr 9 14:00:02 UTC 2021
The transaction has been terminated. Retry your operation in a new transaction,
and you should see a successful result. The transaction has been terminated, so
no more locks can be acquired. This can occur because the transaction ran longer
than the configured transaction timeout, or because a human operator manually t
erminated the transaction, or because the database is shutting down. ForsetiClie
nt[7]

Files in Geolink are obsolete

It looks like these have been marked as obsolete, is 10-GeoLink still needed?

df4 = pd.read_csv(NEO4J_IMPORT / '02b-CDSCases.csv', dtype='str', usecols=['origLocation'])
df5 = pd.read_csv(NEO4J_IMPORT / '02d-GOBMXCasesAdmin1.csv', dtype='str', usecols=['origLocation'])
df6 = pd.read_csv(NEO4J_IMPORT / '02d-GOBMXCasesAdmin2.csv', dtype='str', usecols=['origLocation'])

01g-PfamDomainPDB.ipynb fails

The data format of the ftp://ftp.ebi.ac.uk/pub/databases/Pfam/mappings/pdb_pfam_mapping.txt file has changed since they updated it.

COVID-19-Net KG: cannot connect with browser

Remove notebooks that use GOBMX data

GOBMX data are not available anymore.

Excel xlsx format not supported in xlrd >= 2.0

Several notebooks that load Excel files in the xlsx format don't work anymore:

~/miniconda3/envs/covid-19-community/lib/python3.7/site-packages/pandas/io/excel/_base.py in in
it(self, path_or_buffer, engine, storage_options)
1079 if xlrd_version >= "2":
1080 raise ValueError(
-> 1081 f"Your version of xlrd is {xlrd_version}. In xlrd >= 2.0, "
1082 f"only the xls format is supported. Install openpyxl instead."
1083 )

ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Inst
all openpyxl instead.

Out of memory error in 01c-CNCBVariant.ipynb

MemoryError Traceback (most recent call last)
in
1 unique_var = variations[['id', 'variantType', 'start', 'end', 'ref', 'alt', 'varia
ntConsequence',
2 'proteinVariant', 'geneVariant', 'distance', 'proteinPosi
tion', 'proteinAccession',
----> 3 'taxonomyId', 'referenceGenome']].copy()

Can not reach the neo4j browser (http://132.249.238.185:7474/)

Not able to generate the dataset completely

In readme file, for the data preparation step, files inside dataprep has to be run numerically or alphanumerically? When I tried to run numerically i.e, 00b-NCBITaxonomy.ipynb an error occured. Also, when I ran alphanumerically 5 files reported error as they were not able to find the required files in the cache folder.

Download of SDHHSA data is failing

Launch Binder - service unavailable

The binder.pangeo.io service has been shut down due to crypto mining: binder.pangeo.io shut down due to crypto mining.

Starting neo4j on Jupyter notebook (Windows)

In 3-ExampleQueries, the code for starting Neo4J from the Jupyter Notebook does not appear to work on my machine (Windows 10 PC). However, it seems like this can be resolved by adjusting some of the code.

What worked for me:
Instead of !"$NEO4J_HOME"/bin/neo4j start, switch the direction of the slashes to !"$NEO4J_HOME"\bin\neo4j start.
The command !"$NEO4J_HOME"\bin\neo4j start may also result in an error "Service neo4j not found". This can be resolved by running !"$NEO4J_HOME"\bin\neo4j uninstall-service, and then !"$NEO4J_HOME"\bin\neo4j install-service.

I am unsure if the issues I encountered are Windows specific or only to my own machine, but the above are the changes that allow me to use the notebook as intended.

I can contribute to model Events (Covid Test, Information, Death) related to a Person

I can contribute to model Events (Covid Test, Information, Death) related to a Person.

Related to that I would like to colaborate to model relationships between people and model the Cell Phone tracking and location.

Furthermore, I could contribute to model tracking of people location based on the use of credit cards in shops, ATMs, etc.

Thanks in advance for your consideration.

Mario Íñiguez

COVID-19 Case Counts from Gobierno de Mexico - URLs are defunct

The URLs that were used to get COVID-19 case counts are not available anymore.

/reference_data/SpecialLocations.csv does not exist

While running JHUCasesLocations notebook I am getting this error:

, Please help

Neo4j graph visualization or plugin for Jupyter Lab

A Jupyter Lab plugin is urgently needed to visualize and interactively explore Neo4j KG.

Here is a list of JS libraries:
https://neo4j.com/developer/tools-graph-visualization/

https://ipython-cypher.readthedocs.io/en/latest/

https://nicolewhite.github.io/neo4j-jupyter/hello-world.html

Metadata Update

Hello,
Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 data https://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/D1Q9MF
Would you be willing to help with the maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for the medical science community.

Use log scale for choropleth

Use log scale to display number of strains per country in :
https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/analyses/StrainB.1.1.7.ipynb

Downloading SARS-CoV-2 Variation Data fails in dataprep/01c-CNCBVariant.ipynb notebook

It throws 'error_prem: 550 Failed to change directory' while downloading and caching data files with variant information. Can you please upload the data itself to the repository if possible?

Neo4j startup in Jupyter Lab

We start the Neo4j database in Jupyter Lab. How can we check if the database is up and running before starting executing Cypher queries with py2neo?

See: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/3-ExampleQueries.ipynb

Download for MX COVID-19 confirmed cases and deaths fails

The URls to download the Mexican confirmed cases and death give a 404 error (see: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/dataprep/02d-GOBMXCases.ipynb).

We need to find out if there is a different way to download these data, e.g., from here

Time series representation of outbreak data

Explore options to represent time series data about COVID-19 outbreak in the Neo4j graph.

The data are here:
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

See for example:
https://community.neo4j.com/t/how-to-model-timeseries-data-in-property-graph/8713
https://github.com/graphaware/neo4j-timetree

Automate batch upload of node and relationship files

Batch upload all node and relationship file in the /data directory:

See: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/2-CreateGraph.ipynb

01c-CNCBStrain.ipynb fails with key error

CNCB seems to have fixed the issue with the `Host column name. Need to update the Host column name.

KeyError Traceback (most recent call last)
in
1 # assign taxonomy id to host
----> 2 df['host'] = df['`Host'].str.strip()

Improve legend in Coronavirus3DStructures.ipynb

The plot "Cummulative {pathogen} structures by release date" should use a unique color for each type of protein.

notebooks/analyses/Coronavirus3DStructures.ipynb

Exception in 10a-GeoLink.ipynb

Exception encountered at "In [116]":

ValueError Traceback (most recent call last)
in
1 df[['geoName1','geoName2']] = df[['geoName1','geoName2']].apply(lambda x
: x if x[0] != '' else [x[1],x[0]], axis=1)
----> 2 df[['geoName2','geoName3']] = df[['geoName2','geoName3']].apply(lambda x
: x if x[0] != '' else [x[1],x[0]], axis=1)

Update pangolin software version in 01c-CNCBStrain.ipynb

The new version is: 3.0.3

Neo4j database shutdown

In the following notebook https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/3-ExampleQueries.ipynb we stop the Neo4j database at the end.

Often, the shutdown fails.

Does anyone know what causes this issue and how to shut down cleanly?

Launch Binder: Error during build: list index out of range

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerwxqq17u1'...
HEAD is now at e75b839 Merge pull request #106 from pwrose/master
Error during build: list index out of range

Posted issue on Binder Gitter channel.

Corona Data Scraper is defunct

Corona Data Scaper stopped updates around Oct. 2020.

Add "feline" to OrganismDirectory.csv

Map "feline" to taxonomy:9685

01f-PDBStructure.ipynb: PDBj web service fails

The notebook 01f-PDBStructure.ipynb fails because the PDBj web service is broken.

The webservice https://pdbj.org/rest/mine2_sql' URL does not work anymore. We need to find a replacement. I emailed PDBj to see if there is a new service we can use.

Daily update process has stopped due to out of memory error

We need to refactor the code that processes variants to re-instate the update process.

No processed cncb csv files

As a result, the CNCBVariant notebook is broken with all of them saying parsing failed, but in fact there's no csv files

Download of CNCB variant annotation fails

*** Downlading CNCB Strain Data ***
Downloading: ftp://download.big.ac.cn/GVM/Coronavirus/gff3/a
--2021-04-06 01:30:07-- ftp://download.big.ac.cn/GVM/Coronavirus/gff3/a
=> ‘/var/lib/neo4j/import/cache/raw/cncb/a/.listing’
Resolving download.big.ac.cn (download.big.ac.cn)... 124.16.164.229
Connecting to download.big.ac.cn (download.big.ac.cn)|124.16.164.229|:21... conn
ected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /GVM/Coronavirus/gff3 ...
No such directory ‘GVM/Coronavirus/gff3’.

re. Question on communication between Covid-19-related projects using Neo4j

Dear All,

I am writing you now to briefly introduce the COVID-19 Disease Map community project: https://covid.pages.uni.lu/, which includes manual-curated, domain-expert-approved information on COVID-19 disease mechanisms. We also have a Neo4j-dedicated component and we are interested in creating a communication between the COVID-19-Net and the COVID-19 Disease Map project. I am looking forward to learning your opinion on this.

Thank you for your time.

Best regards,
Irina Balaur

covid-19-net / covid-19-community Goto Github PK

covid-19-community's Introduction

Covid-19-Community

Knowledge Graph Schema

Browse the Knowledge Graph with the Neo4j Browser

Run a Full-text Query

Example full-text query for bioentities for proteins that contain the word spike in the name

Example full-text query: find spike proteins - tabular results

Run a Cypher Query

Example Cypher query: find viral strains collected in Houston

Example Cypher query: aggregate cummulative COVID-19 case numbers at the US state (Admin1) level

Query the Knowledge Graph in Jupyter Notebook

Run Jupyter Notebook Examples locally

Run Jupyter Notebook Examples on SDSC Expanse

Data Download, Preparation, and Integration

How to run the Data Download and Preparation steps locally

How can you contribute?

Citation

Data Providers

Acknowledgements

Funding

covid-19-community's People

Contributors

Stargazers

Watchers

Forkers

covid-19-community's Issues

Exception encountered at "In [116]":

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Example full-text query for `bioentities` for proteins that contain the word spike in the name