database_Matrix

Setup of experiment

I use 2 docker containers one with postgresql and one with neo4j.

For neo4j:

First start docker

docker run \
-d --name neo4j \
--publish=7474:7474 \
--publish=7687:7687 \
--env NEO4J_AUTH=neo4j/class \
--env=NEO4J_dbms_memory_pagecache_size4G \
--env=NEO4J_dbms_memory_heap_initial__size4G \
--env=NEO4J_dbms_memory_heap_max__size4G \
neo4j

Copy data docker cp "path\to\nodes\social_network_nodes.csv" neo4j:/nodes.csv docker cp "path\to\edges\social_network_edges.csv" neo4j:/edges.csv
docker exec -it neo4j bash
neo4j stop
rm -rf data/databases/graph.db
Import data

neo4j-admin import \
    --nodes:Users nodes.csv \
    --relationships:ENDORSES edges.csv \
    --ignore-missing-nodes=true \
    --ignore-duplicate-nodes=true \
    --id-type=INTEGER

go to neo4j.conf and change all heap sizes to 8G or more
neo4j start
go to http://localhost:7474/browser/ and run create INDEX ON :Users(id)

For postgresql

First start docker

docker run -p 5432:5432 -d --name psql postgres:alpine

Copy data docker cp "path\to\nodes\social_network_nodes.csv" neo4j:/nodes.csv docker cp "path\to\edges\social_network_edges.csv" neo4j:/edges.csv
docker exec -it psql bash -c "psql -U postgres"
create table:

create table t_user(id int primary key,name varchar(100), job varchar(100), birthday date);
copy t_user(id,name,job,birthday) from 'nodes.csv' DELIMITER ',' CSV HEADER;

create table t_edges(source_node_id int references t_user(id),target_node_id int references t_user(id));
copy t_edges(source_node_id,target_node_id) from '/edges.csv' DELIMITER ',' CSV HEADER;

After this is done you should have 2 containers with databases with data in them Now run the main.java file and it should start printing lots of numbers.

Results in ms

depth	mean SQL	Median SQL	Mean Neo4j	Median Neo4j
depth 1	1066	850	1334	455
depth 2	1912	1696	516	414
depth 3	4039	3706	2294	635

Depth 4 and 5 takes way too long (several hours) to complete all 20 random nodes so those results will not be shown

Depth 1	Depth 2	Depth 3	Depth 4+5
			to be continued

Conclusion and discussion

With a small resultset (around depth 3 and down) neo4j is marginally faster, while after this it falls dratiscally in efficiency While SQL increases more or less linear with the amount of data handled. I might have done something horribly wrong in my queries or code, but neo4j is generally much more inconsistent in the time spent on the queries. From the pictures can it can be seen how neo4j's times swing much more than the SQL times.

As it is rare to go to this high depth. I would recomend a neo4j database as it is generally faster in the lower depths, plus the queries are much simpler to write compared to SQL. If features such as shortest path between nodes or something similar is needed then neo4j is highly recomended.

kasperpagh / database_matrix Goto Github PK

database_matrix's Introduction

database_Matrix

Setup of experiment

Results in ms

Conclusion and discussion

database_matrix's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs