GithubHelp home page GithubHelp logo

vmoody373 / odb2graphml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lukeasrodgers/odb2graphml

0.0 1.0 0.0 35 KB

Convert Orientdb's JSON export file to GraphML

License: MIT License

JavaScript 99.75% Makefile 0.25%

odb2graphml's Introduction

Build Status

This quickly-hacked-together node package converts the JSON exported by OrientDB into GraphML.

The goal is for the generated GraphML to be compatible with importing into Neo4j, via neo4j-shell-tools.

Installation and usage

npm install -g odb2graphml
odb2graphml --help

By default, this tool will remove any edges that don't have corresponding vertices/nodes. Neo4j's import tool will fail if it encounters edges that reference absent nodes. If, for some reason, you want to keep these edges, pass the -k or --keep option to the tool. This scenario (missing nodes) is likely to occur when doing a non-locking export of an OrientDB database.

Example

NB: using time is obviously optional.

$ time odb2graphml -i inputfile.json -v Vertex1Name,Vertex2Name -e Edge1Name,Edge2Name,Edge3Name
Success! Converted 4587271 edges and 192799 vertices. Pruned 2 edges. Written to out.graphml

real    12m58.002s
user    12m12.411s
sys     0m9.932s

$ neo4j-shell
Unable to find any JVMs matching version "1.7".
Welcome to the Neo4j Shell! Enter 'help' for a list of commands
NOTE: Remote Neo4j graph database service 'shell' at port 1337

neo4j-sh (?)$ import-graphml -c -t -i /Users/luke/tmp/out.graphml
GraphML-Import file /Users/luke/tmp/out.graphml rel-type RELATED_TO batch-size 40000 use disk-cache true
commit after 400000 row(s)  0. 12%: nodes = 192799 rels = 207200 properties = 664494 time 25784 ms total 25784 ms
commit after 800000 row(s)  1. 20%: nodes = 192799 rels = 607200 properties = 664494 time 10948 ms total 36732 ms
commit after 1200000 row(s)  2. 28%: nodes = 192799 rels = 1007200 properties = 664494 time 12850 ms total 49582 ms
commit after 1600000 row(s)  3. 36%: nodes = 192799 rels = 1407200 properties = 664494 time 10467 ms total 60049 ms
commit after 2000000 row(s)  4. 44%: nodes = 192799 rels = 1807200 properties = 664494 time 12106 ms total 72155 ms
commit after 2400000 row(s)  5. 52%: nodes = 192799 rels = 2207200 properties = 664494 time 11500 ms total 83655 ms
commit after 2800000 row(s)  6. 60%: nodes = 192799 rels = 2607200 properties = 664494 time 12628 ms total 96283 ms
commit after 3200000 row(s)  7. 68%: nodes = 192799 rels = 3007200 properties = 664494 time 14121 ms total 110404 ms
commit after 3600000 row(s)  8. 76%: nodes = 192799 rels = 3407200 properties = 664494 time 12484 ms total 122888 ms
commit after 4000000 row(s)  9. 84%: nodes = 192799 rels = 3807200 properties = 664494 time 12150 ms total 135038 ms
commit after 4400000 row(s)  10. 92%: nodes = 192799 rels = 4207200 properties = 664494 time 13538 ms total 148576 ms
finish after 4780070 row(s)  11. 99%: nodes = 192799 rels = 4587271 properties = 664494 time 12829 ms total 161405 ms
GraphML import created 4780070 entities.
neo4j-sh (?)$

Requirements

The code uses some ES6 features like template strings, so you will need a version of nodejs that supports those: at least v4.0.0.

Notes

The code should mostly work in its current verison, though there are almost certain bugs and edge cases I've missed.

It uses oboe.js for streaming JSON parsing, the idea being that some export files may be very large and we don't want to load them all into memory at once. Hence, it should be able to handle files of (more or less) arbitrarily large size.

Other notes:

  • In order to ensure all nodes are output into graphml before edges (which is required by neo4j's import tool), we make two passes through the input file; we could avoid this by using temporary files (also hacky) or in-memory write streams (which undermines the benefits of using streams in the first place). This approach is about 2x slower, but seems the least hacky, though more advanced knowledge of nodejs streams might provide a better solution.

odb2graphml's People

Contributors

lukeasrodgers avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.