GithubHelp home page GithubHelp logo

elswob / neo4j-build-pipeline Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 11.46 MB

A Snakemake and Docker based pipeline to create a Neo4j graph.

License: MIT License

Dockerfile 0.78% Python 95.92% Shell 3.30%
docker neo4j python snakemake

neo4j-build-pipeline's People

Contributors

elswob avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

neo4j-build-pipeline's Issues

No check for shuf

Profiling ot-drug-target with 10 threads, limiting to 100000 random rows workflow/scripts/utils/pandas-profiling.sh: line 15: shuf: command not found

pandas to_csv and vectors

Using pandas to_csv and read_csv as a core part of the build process means that including lists of floats, e.g. vectors isn't possible as they get converted to strings. Could switch to a different format, e.g. pickle

Building a graph after already creating files causes neo4j issue

Exception in thread "Thread-17" java.lang.RuntimeException: org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.staging.AbstractStep.issuePanic(AbstractStep.java:148) at org.neo4j.internal.batchimport.staging.AbstractStep.issuePanic(AbstractStep.java:140) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:59) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.input.BadCollector$NodesProblemReporter.exception(BadCollector.java:280) at org.neo4j.internal.batchimport.input.BadCollector.collect(BadCollector.java:170) at org.neo4j.internal.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:137) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:616) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:532) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:247) at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53) ... 1 more Duplicate input ids that would otherwise clash can be put into separate id space. Caused by:Id 'CAL' is defined more than once in group 'Drug-ID' WARNING Import failed. The store files in /data/databases/neo4j are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually org.neo4j.internal.batchimport.cache.idmapping.string.DuplicateInputIdException: Id 'CAL' is defined more than once in group 'Drug-ID' at org.neo4j.internal.batchimport.input.BadCollector$NodesProblemReporter.exception(BadCollector.java:280) at org.neo4j.internal.batchimport.input.BadCollector.collect(BadCollector.java:170) at org.neo4j.internal.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:137) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:616) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:532) at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:247) at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54) at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53) at java.base/java.lang.Thread.run(Thread.java:834) [Thu Oct 29 16:12:01 2020] Error in rule create_graph: jobid: 1 output: test/results/logs/master_import.log log: test/results/logs/create_graph.log (check log file(s) for error message) shell: echo 'Starting database...' #force load of .env file if it exists to avoid docker issues with cached variables if [ -f .env ]; then export $(cat .env | sed 's/#.*//g' | xargs); fi #create neo4j directories if not already done echo 'Creating Neo4j graph directories' python -m workflow.scripts.graph_build.create_neo4j > test/results/logs/create_graph.log #create container docker-compose up -d --no-recreate > test/results/logs/create_graph.log echo 'removing old database...' docker exec --user neo4j neo4j-pipeline-demo-graph sh -c 'rm -rf /var/lib/neo4j/data/databases/neo4j' > test/results/logs/create_graph.log docker exec --user neo4j neo4j-pipeline-demo-graph sh -c 'rm -f /var/lib/neo4j/data/transactions/neo4j/*' > test/results/logs/create_graph.log echo 'running import...' SECONDS=0 docker exec --user neo4j neo4j-pipeline-demo-graph sh /var/lib/neo4j/import/master_import.sh > test/results/logs/master_import.log duration=$SECONDS echo "Import took $(($duration / 60)) minutes and $(($duration % 60)) seconds." echo 'stopping container neo4j-pipeline-demo-graph...' docker stop neo4j-pipeline-demo-graph echo 'starting container neo4j-pipeline-demo-graph...' docker start neo4j-pipeline-demo-graph echo 'waiting a bit...' sleep 30 echo 'adding contraints and extra bits...' docker exec --user neo4j neo4j-pipeline-demo-graph sh /var/lib/neo4j/import/master_constraints.sh > test/results/logs/master_constraints.log echo 'waiting a bit for indexes to populate...' sleep 30 echo 'checking import report...' python -m workflow.scripts.graph_build.import-report-check test/neo4j/0.0.1/logs/import.report > test/results/logs/master_import.log echo 'Neo4j browser available here: localhost:27474/browser' (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Removing output files of failed job create_graph since they might be corrupted: test/results/logs/master_import.log Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.