covid-19-net / covid-19-community Goto Github PK

View Code? Open in Web Editor NEW

118.0 9.0 77.0 38.58 MB

Community effort to build a Neo4j Knowledge Graph (KG) that links heterogeneous data about COVID-19

License: MIT License

Shell 0.03% Jupyter Notebook 99.97%

graphs4good covid-19 coronavirus knowledge-graph jupyter binder neo4j

covid-19-community's Issues

Launch Binder - service unavailable

The binder.pangeo.io service has been shut down due to crypto mining: binder.pangeo.io shut down due to crypto mining.

Use uniprot.chain prefix for cleavage products

https://registry.identifiers.org/registry/uniprot.chain#!

Download for MX COVID-19 confirmed cases and deaths fails

The URls to download the Mexican confirmed cases and death give a 404 error (see: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/dataprep/02d-GOBMXCases.ipynb).

We need to find out if there is a different way to download these data, e.g., from here

Metadata Update

Hello,
Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 data https://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/D1Q9MF
Would you be willing to help with the maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for the medical science community.

Neo4j startup in Jupyter Lab

We start the Neo4j database in Jupyter Lab. How can we check if the database is up and running before starting executing Cypher queries with py2neo?

See: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/3-ExampleQueries.ipynb

Time series representation of outbreak data

Explore options to represent time series data about COVID-19 outbreak in the Neo4j graph.

The data are here:
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

See for example:
https://community.neo4j.com/t/how-to-model-timeseries-data-in-property-graph/8713
https://github.com/graphaware/neo4j-timetree

Download of CNCB variant annotation fails

*** Downlading CNCB Strain Data ***
Downloading: ftp://download.big.ac.cn/GVM/Coronavirus/gff3/a
--2021-04-06 01:30:07-- ftp://download.big.ac.cn/GVM/Coronavirus/gff3/a
=> ‘/var/lib/neo4j/import/cache/raw/cncb/a/.listing’
Resolving download.big.ac.cn (download.big.ac.cn)... 124.16.164.229
Connecting to download.big.ac.cn (download.big.ac.cn)|124.16.164.229|:21... conn
ected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /GVM/Coronavirus/gff3 ...
No such directory ‘GVM/Coronavirus/gff3’.

Download of SDHHSA data is failing

01f-PDBStructure.ipynb: PDBj web service fails

The notebook 01f-PDBStructure.ipynb fails because the PDBj web service is broken.

The webservice https://pdbj.org/rest/mine2_sql' URL does not work anymore. We need to find a replacement. I emailed PDBj to see if there is a new service we can use.

Update pangolin software version in 01c-CNCBStrain.ipynb

The new version is: 3.0.3

Downloading SARS-CoV-2 Variation Data fails in dataprep/01c-CNCBVariant.ipynb notebook

It throws 'error_prem: 550 Failed to change directory' while downloading and caching data files with variant information. Can you please upload the data itself to the repository if possible?

How to run these notebooks

I run the cypher queries with neo4j however, I am confused as to how to use notebooks, is my assumption correct that first we need to have the graphs ready by running all cypher queries and then connect the notebooks with the neo4j source to run them?

Exception in 10a-GeoLink.ipynb

Exception encountered at "In [116]":

ValueError Traceback (most recent call last)
in
1 df[['geoName1','geoName2']] = df[['geoName1','geoName2']].apply(lambda x
: x if x[0] != '' else [x[1],x[0]], axis=1)
----> 2 df[['geoName2','geoName3']] = df[['geoName2','geoName3']].apply(lambda x
: x if x[0] != '' else [x[1],x[0]], axis=1)

Not able to generate the dataset completely

In readme file, for the data preparation step, files inside dataprep has to be run numerically or alphanumerically? When I tried to run numerically i.e, 00b-NCBITaxonomy.ipynb an error occured. Also, when I ran alphanumerically 5 files reported error as they were not able to find the required files in the cache folder.

Files in Geolink are obsolete

It looks like these have been marked as obsolete, is 10-GeoLink still needed?

df4 = pd.read_csv(NEO4J_IMPORT / '02b-CDSCases.csv', dtype='str', usecols=['origLocation'])
df5 = pd.read_csv(NEO4J_IMPORT / '02d-GOBMXCasesAdmin1.csv', dtype='str', usecols=['origLocation'])
df6 = pd.read_csv(NEO4J_IMPORT / '02d-GOBMXCasesAdmin2.csv', dtype='str', usecols=['origLocation'])

USCensus zip level API returns an additional column: state

Use log scale for choropleth

Use log scale to display number of strains per country in :
https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/analyses/StrainB.1.1.7.ipynb

Neo4j graph visualization or plugin for Jupyter Lab

A Jupyter Lab plugin is urgently needed to visualize and interactively explore Neo4j KG.

Here is a list of JS libraries:
https://neo4j.com/developer/tools-graph-visualization/

https://ipython-cypher.readthedocs.io/en/latest/

https://nicolewhite.github.io/neo4j-jupyter/hello-world.html

re. Question on communication between Covid-19-related projects using Neo4j

Dear All,

I am writing you now to briefly introduce the COVID-19 Disease Map community project: https://covid.pages.uni.lu/, which includes manual-curated, domain-expert-approved information on COVID-19 disease mechanisms. We also have a Neo4j-dedicated component and we are interested in creating a communication between the COVID-19-Net and the COVID-19 Disease Map project. I am looking forward to learning your opinion on this.

Thank you for your time.

Best regards,
Irina Balaur

Binder: fails to build ipycytoscape

Failed to build ipycytoscape
Pip subprocess error:
Running command git clone -q https://github.com/pwrose/ipycytoscape.git /tmp/pip-req-build-_8vpzekr
ERROR: Command errored out with exit status 1:
command: /srv/conda/envs/notebook/bin/python /srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpdkkmg4qe
cwd: /tmp/pip-req-build-_8vpzekr
Complete output (122 lines):
running bdist_wheel
running jsdeps
Installing build dependencies with npm. This may take a while...

npm install

added 1246 packages, and audited 1313 packages in 44s

21 vulnerabilities (13 low, 5 moderate, 3 high)

To address issues that do not require attention, run:
npm audit fix

To address all issues (including breaking changes), run:
npm audit fix --force

Run npm audit for details.
npm notice
npm notice New minor version of npm available! 7.0.8 -> 7.11.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v7.11.0
npm notice Run npm install -g [email protected] to update!
npm notice

npm run build

[email protected] build
npm run build:lib && npm run build:all

[email protected] build:lib
tsc

[email protected] build:all
npm run build:labextension && npm run build:nbextension

[email protected] build:labextension
npm run clean:labextension && jupyter labextension build .

[email protected] clean:labextension
rimraf ipycytoscape/labextension

Traceback (most recent call last):
File "/tmp/pip-build-env-lgjks042/overlay/bin/jupyter-labextension", line 5, in
from jupyterlab.labextensions import main
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyterlab/init.py", line 7, in
from .labapp import LabApp
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyterlab/labapp.py", line 14, in
from jupyter_server._version import version_info as jpserver_version_info
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_server/init.py", line 15, in
from ._version import version_info, version
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_server/_version.py", line 5, in
from jupyter_packaging import get_version_info
ImportError: cannot import name 'get_version_info' from 'jupyter_packaging' (/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/init.py)
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run clean:labextension && jupyter labextension build .

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_862Z-debug.log
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run build:labextension && npm run build:nbextension

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_889Z-debug.log
npm notice
npm notice New minor version of npm available! 7.0.8 -> 7.11.0
npm notice Changelog: https://github.com/npm/cli/releases/tag/v7.11.0
npm notice Run npm install -g [email protected] to update!
npm notice
npm ERR! code 1
npm ERR! path /tmp/pip-req-build-_8vpzekr
npm ERR! command failed
npm ERR! command sh -c npm run build:lib && npm run build:all

npm ERR! A complete log of this run can be found in:
npm ERR! /home/jovyan/.npm/_logs/2021-04-23T07_57_47_919Z-debug.log
/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/dist.py:645: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
% (opt, underscore_opt))
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in
main()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(hook_input['kwargs'])
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 205, in build_wheel
metadata_directory)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 222, in build_wheel
wheel_directory, config_settings)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 207, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/build_meta.py", line 150, in run_setup
exec(compile(code, file, 'exec'), locals())
File "setup.py", line 123, in
setup(setup_args)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(attrs)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 503, in run
[self.run_command(cmd) for cmd in cmds]
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 503, in
[self.run_command(cmd) for cmd in cmds]
File "/srv/conda/envs/notebook/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/srv/conda/envs/notebook/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 274, in run
c.run()
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 381, in run
run(npm_cmd + ['run', build_cmd], cwd=node_package)
File "/tmp/pip-build-env-lgjks042/overlay/lib/python3.7/site-packages/jupyter_packaging/setupbase.py", line 225, in run
return subprocess.check_call(cmd, kwargs)
File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/srv/conda/envs/notebook/bin/npm', 'run', 'build']' returned non-zero exit status 1.

ERROR: Failed building wheel for ipycytoscape
ERROR: Could not build wheels for ipycytoscape which use PEP 517 and cannot be installed directly

CondaEnvException: Pip failed

4.0 pyhd8ed1ab_0 conda-forge/noarch 36 KB
libgcc-ng 9.2.0 h24d8f2e_2 installed
libgcc-ng 9.3.0 h2828fa1_19 conda-forge/linux-64 8 MB
libgomp 9.2.0 h24d8f2e_2 installed
libgomp 9.3.0 h2828fa1_19 conda-forge/linux-64 376 KB
libstdcxx-ng 9.2.0 hdf63c60_2 installed
libstdcxx-ng 9.3.0 h6de172a_19 conda-forge/linux-64 4 MB
openssl 1.1.1g h516909a_0 installed
openssl 1.1.1k h7f98852_0 conda-forge/linux-64 2 MB
sqlite 3.32.3 hcee41ef_1 installed
sqlite 3.35.5 h74cdb3f_0 conda-forge/linux-64 1 MB
tornado 6.0.4 py37h8f50634_1 installed
tornado 6.1 py37h5e8e339_1 conda-forge/linux-64 646 KB

Summary:

Install: 177 packages
Upgrade: 9 packages

Total download: 544 MB

──────────────────────────────────────────────────────────────────────────────────────

time: 293.282
Removing intermediate container d26330eb3b1f
The command '/bin/sh -c TIMEFORMAT='time: %3R' bash -c 'time mamba env update -p ${NB_PYTHON_PREFIX} -f "binder/environment.yml" && time mamba clean --all -f -y && mamba list -p ${NB_PYTHON_PREFIX} '' returned a non-zero code: 1

/reference_data/SpecialLocations.csv does not exist

While running JHUCasesLocations notebook I am getting this error:

, Please help

Corona Data Scraper is defunct

Corona Data Scaper stopped updates around Oct. 2020.

Improve legend in Coronavirus3DStructures.ipynb

The plot "Cummulative {pathogen} structures by release date" should use a unique color for each type of protein.

notebooks/analyses/Coronavirus3DStructures.ipynb

Add "feline" to OrganismDirectory.csv

Map "feline" to taxonomy:9685

Can not reach the neo4j browser (http://132.249.238.185:7474/)

Automate batch upload of node and relationship files

Batch upload all node and relationship file in the /data directory:

See: https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/2-CreateGraph.ipynb

COVID-19-Net KG: cannot connect with browser

installation of py2neo 4.3.0 doesn't work on Windows 10

COVID-19 Case Counts from Gobierno de Mexico - URLs are defunct

The URLs that were used to get COVID-19 case counts are not available anymore.

No processed cncb csv files

As a result, the CNCBVariant notebook is broken with all of them saying parsing failed, but in fact there's no csv files

Launch Binder: Error during build: list index out of range

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerwxqq17u1'...
HEAD is now at e75b839 Merge pull request #106 from pwrose/master
Error during build: list index out of range

Posted issue on Binder Gitter channel.

Starting neo4j on Jupyter notebook (Windows)

In 3-ExampleQueries, the code for starting Neo4J from the Jupyter Notebook does not appear to work on my machine (Windows 10 PC). However, it seems like this can be resolved by adjusting some of the code.

What worked for me:
Instead of !"$NEO4J_HOME"/bin/neo4j start, switch the direction of the slashes to !"$NEO4J_HOME"\bin\neo4j start.
The command !"$NEO4J_HOME"\bin\neo4j start may also result in an error "Service neo4j not found". This can be resolved by running !"$NEO4J_HOME"\bin\neo4j uninstall-service, and then !"$NEO4J_HOME"\bin\neo4j install-service.

I am unsure if the issues I encountered are Windows specific or only to my own machine, but the above are the changes that allow me to use the notebook as intended.

I can contribute to model Events (Covid Test, Information, Death) related to a Person

I can contribute to model Events (Covid Test, Information, Death) related to a Person.

Related to that I would like to colaborate to model relationships between people and model the Cell Phone tracking and location.

Furthermore, I could contribute to model tracking of people location based on the use of credit cards in shops, ATMs, etc.

Thanks in advance for your consideration.

Mario Íñiguez

01c-CNCBStrain.csv is not available anymore

It seems like 01c-CNCBStrain.csv is gone and 01c-CNCBStrainPre.csv is the only file available. The Id column also doesn't exist anymore. This is breaking a few notebooks and Cypher queries

Neo4j database shutdown

In the following notebook https://github.com/covid-19-net/covid-19-community/blob/master/notebooks/3-ExampleQueries.ipynb we stop the Neo4j database at the end.

Often, the shutdown fails.

Does anyone know what causes this issue and how to shut down cleanly?

Out of memory error in 01c-CNCBVariant.ipynb

MemoryError Traceback (most recent call last)
in
1 unique_var = variations[['id', 'variantType', 'start', 'end', 'ref', 'alt', 'varia
ntConsequence',
2 'proteinVariant', 'geneVariant', 'distance', 'proteinPosi
tion', 'proteinAccession',
----> 3 'taxonomyId', 'referenceGenome']].copy()

01c-CNCBStrain.ipynb fails with key error

CNCB seems to have fixed the issue with the `Host column name. Need to update the Host column name.

KeyError Traceback (most recent call last)
in
1 # assign taxonomy id to host
----> 2 df['host'] = df['`Host'].str.strip()

01g-PfamDomainPDB.ipynb fails

The data format of the ftp://ftp.ebi.ac.uk/pub/databases/Pfam/mappings/pdb_pfam_mapping.txt file has changed since they updated it.

Remove notebooks that use GOBMX data

GOBMX data are not available anymore.

Remove redundant Spike glycoprotein entries

The Spike glycoprotein is included twice in KG. This is due to inconsistencies in the IntAct database.

Excel xlsx format not supported in xlrd >= 2.0

Several notebooks that load Excel files in the xlsx format don't work anymore:

~/miniconda3/envs/covid-19-community/lib/python3.7/site-packages/pandas/io/excel/_base.py in in
it(self, path_or_buffer, engine, storage_options)
1079 if xlrd_version >= "2":
1080 raise ValueError(
-> 1081 f"Your version of xlrd is {xlrd_version}. In xlrd >= 2.0, "
1082 f"only the xls format is supported. Install openpyxl instead."
1083 )

ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Inst
all openpyxl instead.

Daily update process has stopped due to out of memory error

We need to refactor the code that processes variants to re-instate the update process.

VM that hosts Neo4j DB is down

Database update failed

Fri Apr 9 14:00:02 UTC 2021
The transaction has been terminated. Retry your operation in a new transaction,
and you should see a successful result. The transaction has been terminated, so
no more locks can be acquired. This can occur because the transaction ran longer
than the configured transaction timeout, or because a human operator manually t
erminated the transaction, or because the database is shutting down. ForsetiClie
nt[7]

covid-19-net / covid-19-community Goto Github PK

covid-19-community's Issues

Exception encountered at "In [116]":

Recommend Projects

Recommend Topics

Recommend Org

Jobs