elswob / neo4j-build-pipeline Goto Github PK
View Code? Open in Web Editor NEWA Snakemake and Docker based pipeline to create a Neo4j graph.
License: MIT License
A Snakemake and Docker based pipeline to create a Neo4j graph.
License: MIT License
Profiling ot-drug-target with 10 threads, limiting to 100000 random rows workflow/scripts/utils/pandas-profiling.sh: line 15: shuf: command not found
Snakemake expand needs to find both nodes and rels in the config, e.g.
https://github.com/elswob/neo4j-build-pipeline/blob/main/workflow/Snakefile#L233
Using pandas to_csv
and read_csv
as a core part of the build process means that including lists of floats, e.g. vectors isn't possible as they get converted to strings. Could switch to a different format, e.g. pickle
Certain special characters in the password cause problems, e.g. &
Exception in thread "Thread-17" java.lang.RuntimeException: org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.staging.AbstractStep.issuePanic(AbstractStep.java:148) at org.neo4j.internal.batchimport.staging.AbstractStep.issuePanic(AbstractStep.java:140) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:59) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.input.BadCollector$NodesProblemReporter.exception(BadCollector.java:280) at org.neo4j.internal.batchimport.input.BadCollector.collect(BadCollector.java:170) at org.neo4j.internal.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:137) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:616) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:532) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:247) at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53) ... 1 more Duplicate input ids that would otherwise clash can be put into separate id space. Caused by:Id 'CAL' is defined more than once in group 'Drug-ID' WARNING Import failed. The store files in /data/databases/neo4j are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.input.BadCollector$NodesProblemReporter.exception(BadCollector.java:280) at org.neo4j.internal.batchimport.input.BadCollector.collect(BadCollector.java:170) at org.neo4j.internal.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:137) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:616) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:532) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:247) at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53) at java.base/java.lang.Thread.run(Thread.java:834) [Thu Oct 29 16:12:01 2020] Error in rule create_graph: jobid: 1 output: test/results/logs/master_import.log log: test/results/logs/create_graph.log (check log file(s) for error message) shell: echo 'Starting database...' #force load of .env file if it exists to avoid docker issues with cached variables if [ -f .env ]; then export $(cat .env | sed 's/#.*//g' | xargs); fi #create neo4j directories if not already done echo 'Creating Neo4j graph directories' python -m workflow.scripts.graph_build.create_neo4j > test/results/logs/create_graph.log #create container docker-compose up -d --no-recreate > test/results/logs/create_graph.log echo 'removing old database...' docker exec --user neo4j neo4j-pipeline-demo-graph sh -c 'rm -rf /var/lib/neo4j/data/databases/neo4j' > test/results/logs/create_graph.log docker exec --user neo4j neo4j-pipeline-demo-graph sh -c 'rm -f /var/lib/neo4j/data/transactions/neo4j/*' > test/results/logs/create_graph.log echo 'running import...' SECONDS=0 docker exec --user neo4j neo4j-pipeline-demo-graph sh /var/lib/neo4j/import/master_import.sh > test/results/logs/master_import.log duration=$SECONDS echo "Import took $(($duration / 60)) minutes and $(($duration % 60)) seconds." echo 'stopping container neo4j-pipeline-demo-graph...' docker stop neo4j-pipeline-demo-graph echo 'starting container neo4j-pipeline-demo-graph...' docker start neo4j-pipeline-demo-graph echo 'waiting a bit...' sleep 30 echo 'adding contraints and extra bits...' docker exec --user neo4j neo4j-pipeline-demo-graph sh /var/lib/neo4j/import/master_constraints.sh > test/results/logs/master_constraints.log echo 'waiting a bit for indexes to populate...' sleep 30 echo 'checking import report...' python -m workflow.scripts.graph_build.import-report-check test/neo4j/0.0.1/logs/import.report > test/results/logs/master_import.log echo 'Neo4j browser available here: localhost:27474/browser' (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Removing output files of failed job create_graph since they might be corrupted: test/results/logs/master_import.log Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.