GithubHelp home page GithubHelp logo

ibm / janusgraph-utils Goto Github PK

View Code? Open in Web Editor NEW
206.0 35.0 97.0 949 KB

Develop a graph database app using JanusGraph

Home Page: https://developer.ibm.com/patterns/develop-graph-database-app-using-janusgraph/

License: Apache License 2.0

Groovy 12.94% Shell 2.96% Java 83.27% Python 0.82%
janusgraph groovy graphson datamodel schema janusgraph-instance java ibmcode

janusgraph-utils's Introduction

Build Status

Develop a graph database app using JanusGraph

This Code Pattern contains sample data and code for running a Twitter-like application in JanusGraph. The utility code illustrates how to use OLTP APIs to define schema, ingest data, and query graph. Developers can use or modify the code to build and operate their custom graph applications, or create similar java and groovy files to interact with JanusGraph.

When the reader has completed this Code Pattern, they will understand how to:

  • Generate a synthetic graph dataset
  • Load a graph schema from json
  • Import graph data in csv files into JanusGraph database
  • Query and update graph data using Console and REST API
  • Setup and configure a distributed JanusGraph system

Flow

Prerequisites: Install and configure JanusGraph, Cassandra, ElasticSearch, janusgraph-utils

  1. The user generates Twitter sample schema and data using JanusGraph utilities
  2. The user loads schema and imports data in backend servers using JanusGraph utilities
  3. The user makes search and update requests in a REST/custom client
  4. The client app sends the REST requests to JanusGraph server
  5. The JanusGraph server interacts with backend to process and return graph data

Included components

  • Apache Cassandra: An open source, scalable, high availability database.
  • JanusGraph: A highly scalable graph database optimized for storing and querying large graphs. JanusGraph v0.1.1 was used for this code pattern development and test.

Featured technologies

  • Databases: Repository for storing and managing collections of data.
  • Java: A secure, object-oriented programming language for creating applications.

Watch the Video

Steps

Run locally

  1. Install prerequisites
  2. Clone the repo
  3. Generate the graph sample
  4. Load schema and import data
  5. Run interactive remote queries

1. Install prerequisites

NOTE: These prerequisites can be installed on one server. The instructions are written for Cassandra 3.10 and ElasticSearch 5.3.0 on Linux. Newer versions should work, but might not have been tested. The folder structures on Mac can be different. Check Cassandra and ElasticSearch official documentations for details.

Install Cassandra 3.10 on the storage server. Make the following changes in /etc/cassandra/cassandra.yaml and restart Cassandra.

start_rpc: true
rpc_address: 0.0.0.0
rpc_port: 9160
broadcast_rpc_address: x.x.x.x (your storage server ip)

Install ElasticSearch 5.3.0 on the index server. Make the following changes in /etc/elasticsearch/elasticsearch.yml and restart ElasticSearch.

network.host: x.x.x.x (your index server ip)

Install JanusGraph on the graph server:

  • Install java (1.8), maven (3.3.9, newer should work), git (2.7.5, newer should work)
  • Run git clone https://github.com/JanusGraph/janusgraph.git
  • Run the following commands in the janusgraph folder:
git checkout 4609b6731a01116e96e554140b37ad589f0ae0ca
mvn clean install -DskipTests=true
cp conf/janusgraph-cassandra-es.properties conf/janusgraph-cql-es.properties
  • Make the following changes in conf/janusgraph-cql-es.properties:
storage.backend=cql
storage.hostname=x.x.x.x (your storage server ip)
index.search.hostname=x.x.x.x (your index server ip)

Install a REST client, such as RESTClient add-on for Firefox, on the client machine.

2. Clone the repo

Clone the janusgraph-utils on the graph server and run mvn package.

git clone https://github.com/IBM/janusgraph-utils.git
cd janusgraph-utils/
mvn package

3. Generate the graph sample

Run the command in janusgraph-utils folder to generate data into /tmp folder.

./run.sh gencsv csv-conf/twitter-like-w-date.json /tmp

Modify the generated user file under /tmp so the sample queries will return with data.

sed -i -e '2s/.*/1,Indiana Jones/' /tmp/User.csv

4. Load schema and import data

A graph schema can be loaded from either the Gremlin console or a java utility. You can check the doc doc/users_guide.md for details. Alternatively, just run one command in janusgraph-utils folder to load schema and import data.

export JANUSGRAPH_HOME=~/janusgraph
./run.sh import ~/janusgraph/conf/janusgraph-cql-es.properties /tmp /tmp/schema.json /tmp/datamapper.json

5. Run interactive remote queries

Configure JanusGraph server by running these commands:

cd ~/janusgraph/conf/gremlin-server
cp ~/janusgraph-utils/samples/date-helper.groovy ../../scripts
cp ../janusgraph-cql-es.properties janusgraph-cql-es-server.properties
cp gremlin-server.yaml rest-gremlin-server.yaml

Add this line to janusgraph-cql-es-server.properties:

gremlin.graph=org.janusgraph.core.JanusGraphFactory

Change the following four lines in rest-gremlin-server.yaml:

host: x.x.x.x (your server ip)
channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
graph: conf/gremlin-server/janusgraph-cql-es-server.properties}
scripts: [scripts/empty-sample.groovy,scripts/date-helper.groovy]}}

Start JanusGraph server:

cd ~/janusgraph; ./bin/gremlin-server.sh ./conf/gremlin-server/rest-gremlin-server.yaml

Now you can query and update graph data using REST. For example, send REST requests using RESTClient in browser with following:

Method: POST
URL: http://x.x.x.x:8182
Body: {"gremlin":“query_to_run"}

You can find sample search and insert queries in samples/twitter-like-queries.txt.

Sample output

Sample output for "Find Indiana Jones' tweets that his followers retweeted"

Links

Learn more

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ

janusgraph-utils's People

Contributors

chinhuang007 avatar dolph avatar imgbotapp avatar ljbennett62 avatar markstur avatar sarthakghosh16 avatar scottdangelo avatar stevemart avatar tedhtchang avatar yhwang avatar zoellner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

janusgraph-utils's Issues

import failed insdie docker-janusgraph image

winnie got the exception when doing import inside the docker-janusgraph image
root@perf-rs-jn7q4-nnjcr:/home/janusgraph/janusgraph-utils# export JANUSGRAPH_HOME=/home/janusgraph/janusgraph ;./run.sh import /home/janusgraph/janusgraph/conf/janusgraph-cql-es.properties /tmp/ /tmp/schema.json /tmp/datamapper.json
JanusGraph lib path is set to /home/janusgraph/janusgraph/lib
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/janusgraph/janusgraph/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/janusgraph/janusgraph/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
23:12:00.375 [main] DEBUG o.a.c.c.PropertiesConfiguration - FileName set to janusgraph-cql-es.properties
23:12:00.388 [main] DEBUG o.a.c.c.PropertiesConfiguration - Base path set to /home/janusgraph/janusgraph/conf
23:12:00.396 [main] DEBUG o.a.c.c.ConfigurationUtils - ConfigurationUtils.locate(): base is /home/janusgraph/janusgraph/conf, name is janusgraph-cql-es.properties
23:12:00.396 [main] DEBUG o.a.c.c.DefaultFileSystem - Could not locate file janusgraph-cql-es.properties at /home/janusgraph/janusgraph/conf: no protocol: /home/janusgraph/janusgraph/conf
23:12:00.397 [main] DEBUG o.a.c.c.ConfigurationUtils - Loading configuration from the path /home/janusgraph/janusgraph/conf/janusgraph-cql-es.properties
23:12:00.844 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NEW_NODE_DELAY_SECONDS is undefined, using default value 1
23:12:00.850 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NOTIF_LOCK_TIMEOUT_SECONDS is undefined, using default value 60
23:12:00.870 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.USE_NATIVE_CLOCK is undefined, using default value true
23:12:01.206 [main] INFO c.datastax.driver.core.ClockFactory - Using native clock to generate timestamps.
23:12:01.218 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.NON_BLOCKING_EXECUTOR_SIZE is undefined, using default value 4
23:12:01.340 [main] DEBUG com.datastax.driver.core.Cluster - Starting new cluster with contact points [/10.45.0.138:9042]
23:12:01.365 [main] DEBUG i.n.u.i.l.InternalLoggerFactory - Using SLF4J as the default logging framework
23:12:01.375 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
23:12:01.375 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
23:12:01.376 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
23:12:01.376 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: true
23:12:01.378 [main] DEBUG i.n.util.internal.PlatformDependent - Java version: 8
23:12:01.378 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noUnsafe: false
23:12:01.379 [main] DEBUG i.n.util.internal.PlatformDependent - sun.misc.Unsafe: available
23:12:01.379 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noJavassist: false
23:12:01.455 [main] DEBUG i.n.util.internal.PlatformDependent - Javassist: available
23:12:01.456 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
23:12:01.456 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
23:12:01.456 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
23:12:01.467 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.FORCE_NIO is undefined, using default value false
23:12:01.472 [main] DEBUG i.n.u.internal.NativeLibraryLoader - -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
23:12:01.472 [main] DEBUG i.n.u.internal.NativeLibraryLoader - -Dio.netty.netty.workdir: /tmp (io.netty.tmpdir)
23:12:01.477 [main] DEBUG io.netty.util.NetUtil - Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%lo)
23:12:01.479 [main] DEBUG io.netty.util.NetUtil - /proc/sys/net/core/somaxconn: 128
23:12:01.494 [main] INFO com.datastax.driver.core.NettyUtil - Found Netty's native epoll transport in the classpath, using it
23:12:01.519 [main] DEBUG i.n.c.MultithreadEventLoopGroup - -Dio.netty.eventLoopThreads: 8
23:12:01.541 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetectionLevel: simple
23:12:01.549 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.EXTENDED_PEER_CHECK is undefined, using default value true
23:12:01.660 [main] DEBUG com.datastax.driver.core.Host.STATES - [/10.45.0.138:9042] preparing to open 1 new connections, total = 1
23:12:01.664 [main] DEBUG c.d.driver.core.SystemProperties - com.datastax.driver.DISABLE_COALESCING is undefined, using default value false
23:12:01.672 [main] DEBUG i.n.u.i.JavassistTypeParameterMatcherGenerator - Generated: io.netty.util.internal.matchers.com.datastax.driver.core.Message$ResponseMatcher
23:12:01.700 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 8
23:12:01.700 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 8
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 11
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 16777216
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.tinyCacheSize: 512
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
23:12:01.701 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
Exception in thread "main" java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cql.CQLStoreManager
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:69)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:477)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:409)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1353)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:107)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:75)
at com.ibm.janusgraph.utils.importer.BatchImport.main(BatchImport.java:33)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:58)
... 6 more
Caused by: java.lang.NoSuchMethodError: io.netty.util.internal.PlatformDependent.newLongCounter()Lio/netty/util/internal/LongCounter;
at io.netty.buffer.PoolArena.(PoolArena.java:65)
at io.netty.buffer.PoolArena$HeapArena.(PoolArena.java:638)
at io.netty.buffer.PooledByteBufAllocator.(PooledByteBufAllocator.java:179)
at io.netty.buffer.PooledByteBufAllocator.(PooledByteBufAllocator.java:153)
at io.netty.buffer.PooledByteBufAllocator.(PooledByteBufAllocator.java:145)
at io.netty.buffer.PooledByteBufAllocator.(PooledByteBufAllocator.java:128)
at com.datastax.driver.core.NettyOptions.afterBootstrapInitialized(NettyOptions.java:144)
at com.datastax.driver.core.Connection$Factory.newBootstrap(Connection.java:890)
at com.datastax.driver.core.Connection$Factory.access$100(Connection.java:740)
at com.datastax.driver.core.Connection.initAsync(Connection.java:138)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:796)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:253)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:201)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1483)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:330)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:305)
at com.datastax.driver.core.Cluster.connect(Cluster.java:247)
at org.janusgraph.diskstorage.cql.CQLStoreManager.initializeSession(CQLStoreManager.java:262)
at org.janusgraph.diskstorage.cql.CQLStoreManager.(CQLStoreManager.java:146)
... 11 more
Work around is to load netty-all first.
Inside run.sh:
java -cp $JANUSGRAPH_HOME/lib/netty-all-4.0.28.Final.jar:"$CP":"${utilityJar}" com.ibm.janusgraph.utils.importer.BatchImport "$@"

include some example performance metrics

It would be good to give some flavor of what kind of baseline performance a user could expect from using these utils. For example, this mailing list post reports they are getting 1000 vertices/minute during their load (not using this utility, 20M nodes, 50M edges). It would be fantastic if you could respond on that thread, then point to this project, and show the baseline performance numbers (which should be much better than 1000 vertices/minute). I'm pretty sure their bulkloading code is more of the problem than the backend configuration.

Something like: Using the pre-packaged JanusGraph distribution on an Intel i5 with 4 cores, 8 GB of RAM, these are the graph loading performance results (# vertices, # edges, # properties, total time, vertices/minute, edges/minute)

java.text.ParseException: Unparseable date:

I generate the demo by command
./run.sh gencsv csv-conf/twitter-like-w-date2.json /tmp/json
But error happends

java.text.ParseException: Unparseable date: "10-十一月-1995"
	at java.text.DateFormat.parse(DateFormat.java:366) ~[na:1.8.0_65]
	at com.ibm.janusgraph.utils.importer.util.BatchHelper.convertDate(BatchHelper.java:65) ~[janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at com.ibm.janusgraph.utils.importer.util.BatchHelper.convertPropertyValue(BatchHelper.java:74) ~[janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at com.ibm.janusgraph.utils.importer.edge.EdgeLoaderWorker.acceptRecord(EdgeLoaderWorker.java:119) [janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at com.ibm.janusgraph.utils.importer.edge.EdgeLoaderWorker.access$000(EdgeLoaderWorker.java:38) [janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at com.ibm.janusgraph.utils.importer.edge.EdgeLoaderWorker$1.accept(EdgeLoaderWorker.java:156) [janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at com.ibm.janusgraph.utils.importer.edge.EdgeLoaderWorker$1.accept(EdgeLoaderWorker.java:152) [janusgraph-utils-0.0.1-SNAPSHOT.jar:na]
	at java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:891) [na:1.8.0_65]

Update vertex not supported

Hi,
Currently, the utils project doesn't support updates to a vertex.
I was wondering if we could create a new set of classes that can be called VertexUpdaterWorker (or something like that..) that tries to fetch the vertex that already exists in the graph DB and then have the v.property(propName, convertedValue); update the vertex?

For example, with the current code, I can ingest the following data using the DataLoader.loadVertex() utility.
input.csv contains:
cust_id, is_active
1347, TRUE
1348, FALSE

The datamapper.json contains:

"vertexMap": {
"input.csv": {
"[VertexLabel]": "customer",
"is_active": "is_active"
"cust_id": "node_id"
}
},
"edgeMap": {}
}

In order to update the above nodes, I propose we have a couple of new fields in the datamapper.json that signify which field should be searched for in the graph DB and what field it maps to on the input CSV.
For example:
{
"vertexMap": {
"update.csv": {
"[VertexLabel]": "customer",<===== this signifies the vertex type that should be updated
"[SearchGraph]": "node_id", <==== this signifies which vertex should be updated
"[SearchCsv]": "cust_id", <======= this signifies node_id is mapped to cust_id in the CSV
"is_active": "is_active" <========= this is same as before
}
},
"edgeMap": {}
}

where update.csv contains:
cust_id, is_active 1347, FALSE 1348, TRUE

We can then create a new acceptRecord that does the below:

` JanusGraphVertex v;
// Find the vertex to be updated
try {
v = (JanusGraphVertex) graphTransaction.traversal().V().hasLabel(vertexLabel).has(searchGraphLabel, record.get(searchCsvLabel)).next();

    } catch (Exception e) {
        return;
    }

    try {
        // set the properties of the vertex
        for (String column : record.keySet()) {
            // Find the value, property to be updated
            String value = record.get(column);
            // If value="" or it is a vertex label then skip it
            if (value == null || value.length() == 0 || column.equals(vertexLabelFieldName))
                continue;

            String propName = (String) getPropertiesMap().get(column);
            if (propName == null) {
                continue;
            }

            // Update the value from String to property's datayype, ex. date & time
            Object convertedValue = BatchHelper.convertPropertyValue(value,
                        graphTransaction.getPropertyKey(propName).dataType());

            // Write property and value to vertex
            v.property(propName, convertedValue);
        }
    } catch (Exception e) {
        return;
    }

`

how to use the vertex property (for example userid ) as the node_id ?

for example

csv-conf/user.json

{
  "VertexTypes": [
    {
      "name": "user",
      "columns": {
        "username": {"dataType":"String","composit":true},
        "userId": {"dataType":"Integer","composit":true}

      },
      "row": 100
    }
  ],
  "EdgeTypes": [
    {
      "name": "friends",
      "columns": {
        "friends-p1": {"dataType":"Long","composit":true}
      },
      "relations": [
        {"left": "user", "right": "user", "row": 1000 }
      ]
    }
  ]
}

then generator the follows:

user.csv|head -n10

node_id,username,userId
1,TlEOjVQlfP,82148167
2,sJkwhQNHnA,38310209
3,EdfxWvPErO,50166099
4,aTnbqEVCiT,76187948
5,nLTXRfrEGi,69962497
6,pKbhjVzllM,23196452
7,tkhdUUCopo,17549806
8,NoDEVxvqee,21428717
9,HYkUPStqkR,59599883

user_friends_user_edges.csv|head -n10

Left,Right,friends-p1
80,45,13808503
98,60,93985131
36,24,61345939
34,63,2160659
26,9,87308187
48,83,55073058

I want to user node_id as the user_id,so it's much more easy to update the node
thanks for any response!

Build janusgraph-util with particular Janusgraph commit number

Currently janusgraph-utils is build with Janusgraphv0.1.1. and with a workaround to run with different Janusgraph build. Ideally, we need to build janusgraph-utils with a desired Janusgraph commit.
The maven-scm-plugin is able to download a particular commit but still need to figure how to build Janusgraph first and then build the janusgraph-utils

     <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-scm-plugin</artifactId>
        <version>1.9.5</version>
        <configuration>
          <connectionType>connection</connectionType>
          <scmVersion>170f424852525d8cb7fd40754c5b8a3f8a68a3f6</scmVersion>
          <scmVersionType>revision</scmVersionType>
          <checkoutDirectory>${project.build.directory}/janusgraph</checkoutDirectory>
        </configuration>
     </plugin> 

Support current cassandra and elasticsearch

The README specifically asks for installs of older versions. Should probably say "or newer", but one way or another should at least work w/ current versions instead of requiring old installs.

Current cassandra is 3.11
Current elasticsearch is 5.6.0

Hitting the JanusGraphException when loading vertexes

We are seeing the following exception when running the janusgraph-utils to import vertexes:
14:10:22,084 INFO VertexLoaderWorker:122 - Starting new thread ff35bb84-73e7-4482-b7e5-5eb93c15485e
14:10:24,314 ERROR StandardJanusGraph:740 - Could not commit transaction [20774] due to storage exception in commit
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:95)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:143)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:200)
at org.janusgraph.diskstorage.BackendTransaction.commitStorage(BackendTransaction.java:134)
at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:737)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1374)
at com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker.acceptRecord(VertexLoaderWorker.java:109)
at com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker.access$000(VertexLoaderWorker.java:35)
at com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker$1.accept(VertexLoaderWorker.java:130)
at com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker$1.accept(VertexLoaderWorker.java:126)
at java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:899)
at com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker.run(VertexLoaderWorker.java:126)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Permanent failure in storage backend
at org.janusgraph.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:263)
at org.janusgraph.diskstorage.cassandra.thrift.CassandraThriftStoreManager.mutateMany(CassandraThriftStoreManager.java:315)
at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:79)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:98)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:95)
at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
... 17 more
Caused by: TimedOutException(acknowledged_by:0, acknowledged_by_batchlog:true)
at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result$atomic_batch_mutate_resultStandardScheme.read(Cassandra.java:29624)
at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result$atomic_batch_mutate_resultStandardScheme.read(Cassandra.java:29592)
at org.apache.cassandra.thrift.Cassandra$atomic_batch_mutate_result.read(Cassandra.java:29526)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_atomic_batch_mutate(Cassandra.java:1108)
at org.apache.cassandra.thrift.Cassandra$Client.atomic_batch_mutate(Cassandra.java:1094)
at org.janusgraph.diskstorage.cassandra.thrift.CassandraThriftStoreManager.mutateMany(CassandraThriftStoreManager.java:310)
... 22 more

Any help on debugging this will be greatly appreciated. I have tried searching for this exception but to not much luck. Please note we are running on janusgraph 0.2.0 and cassandra is running locally with janusgraph and elasticsearch.

Add option to generate selfreferencing relation

Node with self referencing relation means a node can reference itself. Sometime we don't want the self-reference behavior.
e.g. In a twitter-like model a User should not follow himself.

sed -i error with some versions of sed

My sed (on a Mac) expects -i to be followed by an optional extension. So the current sed command fails like this:

Marks-MacBook-Pro:janusgraph-utils markstur$ sed -i '2s/.*/1,Indiana Jones/' /tmp/User.csv
sed: 1: "/tmp/User.csv": invalid command code U

It works w/ no extension by using the -e to terminate -i and start the expression. So I think this fix would be a more universal syntax (works on Mac):

sed -i -e '2s/.*/1,Indiana Jones/' /tmp/User.csv

If there are any issues w/ that, then I'd guess using > new-output-file instead of -i might be safest.

documentations on schema.json and datamapper.json

It would be good to explain some of the key fields in both schema.json and datamapper.json files. For example, in datamapper.json, are Left/Right in edgeMap section specific to janusgraph-utils? Or it should be consistent with the column names in the actual csv files. In schema.json, how these fields (e.g., partition, composite, etc.) relate to the configurations in JanusGraph? With some brief descriptions, the learning process can be much easier.

JanusGraphVertex added remains in the Graph DB if property already exist

Description

AIM is remove the ghost vertex which is formed which does not consists of all the properties desired since the code error out.

Below snippet calls the function where vertex is added with the properties

Screenshot 2019-03-13 at 6 50 18 PM

Calling function acceptRecord(record) throws an error when Property already exists which is unique and it is trying to add the same Property (key,value).

     Eg: Emp1 -> has properties Age: 40, EID: 123, Phone: xxxx
     EID is unique property
     Emp2 -> Age: 41, EID: 123, Phone: xxyx

Since EID already exists, the below code throws an error but the
JanusGraphVertex v = graphTransaction.addVertex(vertexLabel); has already been added which never gets deleted and leaves a ghost vertex.

Screenshot 2019-03-13 at 6 50 06 PM

Will changing the block to try catch work plus using graphTransaction.rollback() or v.remove(), to delete the vertex in the catch block

Rename GraphSON schema document

The users guide refers to the schema document as a "GraphSON schema document". I suggest avoiding the usage of GraphSON here. GraphSON on its own is not a standard of outside of Apache TinkerPop, and the idea of a "schema document" isn't one that is common to TinkerPop. Not all TinkerPop implementations can consume a "GraphSON schema document".

I would call something more generic like a "graph schema definition" and describe it as a JSON representation of the JanusGraph schema.

Documentation Improvement

Excellent Readme!
I would additionally recommend specifying version(s) of JanusGraph, Gremlin, Java, Maven, git etc. that is required (or recommended) within the README.

IMPORTER.md obsolete?

The IMPORTER.md file looks unfinished and probably obsolete. Is it replaced by the user_guide.md and the run.sh? Should remove it, if not needed somewhere.

org.apache.http.concurrent.FutureCallback Class Not Fund

Hbase as Backend store ,ES as backend index server occurs the error

fix with follows

  <properties>
    <apache.httpcomponents.httpcore.version>4.4.4</apache.httpcomponents.httpcore.version>
    <apache.httpcomponents.httpclient.verion>4.5.2</apache.httpcomponents.httpclient.verion>

  </properties>

    <dependency>
      <groupId>org.apache.httpcomponents</groupId>
      <artifactId>httpcore</artifactId>
      <version>${apache.httpcomponents.httpcore.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.httpcomponents</groupId>
      <artifactId>httpclient</artifactId>
      <version>${apache.httpcomponents.httpclient.verion}</version>
    </dependency>

error to batchImport data

13:49:10 INFO com.ibm.janusgraph.utils.importer.dataloader.DataFileLoader - Loading /Users/bianzexin/Downloads/tmp/Tweet.csv
13:49:10 INFO com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - Starting new thread def5562c-bfa6-44fd-bdb1-b57e796e92fc
13:49:11 ERROR com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - Error in acceptRecord:: Vertex:: Tweet :: java.lang.NullPointerException
13:49:11 INFO com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - acceptRecord::Vertex Removed
13:49:12 ERROR com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - Error in acceptRecord:: Vertex:: Tweet :: java.lang.NullPointerException
13:49:12 INFO com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - acceptRecord::Vertex Removed
13:49:12 ERROR com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - Error in acceptRecord:: Vertex:: Tweet :: java.lang.NullPointerException
13:49:12 INFO com.ibm.janusgraph.utils.importer.vertex.VertexLoaderWorker - acceptRecord::Vertex Removed

if (!v.properties(propName).hasNext()) { // TODO Convert properties between data types. e.g. Date Object convertedValue = BatchHelper.convertPropertyValue(value, graphTransaction.getPropertyKey(propName).dataType()); v.property(propName, convertedValue); }

graphTransaction.getPropertyKey(propName) return null

Problem about query to run!

image
The above code have sucessed!
image
But this failed! I Have created the composite index with the property key id_number and successfully use the index in gremlin-shell.
Help me, thank you!

HELLO I

image i try to change my phoneIndex to disable ,but the index is always installed
image help me please, how to remove the index -.-

Could not instantiate implementation

When I run the code :"./run.sh import /conf/janusgraph-cassandra-es.properties /tmp /tmp/schema.json /tmp/datamapper.json", there is some error---Exception in thread "main" java.lang.IllegalArgumentException: Could not instantiate implementation: org.janusgraph.diskstorage.cassandra.thrift.CassandraThriftStoreManager.
But, in gremlin console I can instantiate implementation successfully. Why?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.