rayokota / hgraphdb Goto Github PK

HBase as a TinkerPop Graph Database

License: Apache License 2.0

Java 100.00%

hgraphdb's Introduction

HGraphDB - HBase as a TinkerPop Graph Database

HGraphDB is a client layer for using HBase as a graph database. It is an implementation of the Apache TinkerPop 3 interfaces.

Note: For HBase 1.x, use HGraphDB 2.2.2. For HBase 2.x, use HGraphDB 3.0.0.

Installing

Releases of HGraphDB are deployed to Maven Central.

<dependency>
    <groupId>io.hgraphdb</groupId>
    <artifactId>hgraphdb</artifactId>
    <version>3.2.0</version>
</dependency>

Setup

To initialize HGraphDB, create an HBaseGraphConfiguration instance, and then use a static factory method to create an HBaseGraph instance.

Configuration cfg = new HBaseGraphConfiguration()
    .setInstanceType(InstanceType.DISTRIBUTED)
    .setGraphNamespace("mygraph")
    .setCreateTables(true)
    .setRegionCount(numRegionServers)
    .set("hbase.zookeeper.quorum", "127.0.0.1")
    .set("zookeeper.znode.parent", "/hbase-unsecure");
HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);

As you can see above, HBase-specific configuration parameters can be passed directly. These will be used when obtaining an HBase connection.

The resulting graph can be used like any other TinkerPop graph instance.

Vertex v1 = graph.addVertex(T.id, 1L, T.label, "person", "name", "John");
Vertex v2 = graph.addVertex(T.id, 2L, T.label, "person", "name", "Sally");
v1.addEdge("knows", v2, T.id, "edge1", "since", LocalDate.now());

A few things to note from the above example :

HGraphDB accepts user-supplied IDs, for both vertices and edges.
The following types can be used for both IDs and property values:
- boolean
- String
- numbers (byte, short, int, long, float, double)
- java.math.BigDecimal
- java.time.LocalDate
- java.time.LocalTime
- java.time.LocalDateTime
- java.time.Duration
- java.util.UUID
- byte arrays
- Enum instances
- Kryo-serializable instances
- Java-serializable instances

Using Indices

Two types of indices are supported by HGraphDB:

Vertices can be indexed by label and property.
Edges can be indexed by label and property, specific to a vertex.

An index is created as follows:

graph.createIndex(ElementType.VERTEX, "person", "name");
...
graph.createIndex(ElementType.EDGE, "knows", "since");

The above commands should be run before the relevant data is populated. To create an index after data has been populated, first create the index with the following parameters:

graph.createIndex(ElementType.VERTEX, "person", "name", false, /* populate */ true, /* async */ true);

Then run a MapReduce job using the hbase command:

hbase io.hgraphdb.mapreduce.index.PopulateIndex \
    -t vertex -l person -p name -op /tmp -ca gremlin.hbase.namespace=mygraph

Once an index is created and data has been populated, it can be used as follows:

// get persons named John
Iterator<Vertex> it = graph.verticesByLabel("person", "name", "John");
...
// get persons first known by John between 2007-01-01 (inclusive) and 2008-01-01 (exclusive)
Iterator<Edge> it = johnV.edges(Direction.OUT, "knows", "since", 
    LocalDate.parse("2007-01-01"), LocalDate.parse("2008-01-01"));

Note that the indices support range queries, where the start of the range is inclusive and the end of the range is exclusive.

An index can also be specified as a unique index. For a vertex index, this means only one vertex can have a particular property name-value for the given vertex label. For an edge index, this means only one edge of a specific vertex can have a particular property name-value for a given edge label.

graph.createIndex(ElementType.VERTEX, "person", "name", /* unique */ true);

To drop an index, invoke a MapReduce job using the hbase command:

hbase io.hgraphdb.mapreduce.index.DropIndex \
    -t vertex -l person -p name -op /tmp -ca gremlin.hbase.namespace=mygraph

Pagination

Once an index is defined, results can be paginated. HGraphDB supports keyset pagination, for both vertex and edge indices.

// get first page of persons (note that null is passed as start key)
final int pageSize = 20;
Iterator<Vertex> it = graph.verticesWithLimit("person", "name", null, pageSize);
...
// get next page using start key of last person from previous page
it = graph.verticesWithLimit("person", "name", "John", pageSize + 1);
...
// get first page of persons most recently known by John
Iterator<Edge> it = johnV.edgesWithLimit(Direction.OUT, "knows", "since", 
    null, pageSize, /* reversed */ true);

Also note that indices can be paginated in descending order by passing reversed as true.

Schema Management

By default HGraphDB does not use a schema. Schema management can be enabled by calling HBaseGraphConfiguration.useSchema(true). Once schema management is enabled, the schema for vertex and edge labels can be defined.

graph.createLabel(ElementType.VERTEX, "author", /* id */ ValueType.STRING, "age", ValueType.INT);
graph.createLabel(ElementType.VERTEX, "book", /* id */ ValueType.STRING, "publisher", ValueType.STRING);
graph.createLabel(ElementType.EDGE, "writes", /* id */ ValueType.STRING, "since", ValueType.DATE);

Edge labels must be explicitly connected to vertex labels before edges are added to the graph.

graph.connectLabels("author", "writes", "book");

Additional properties can be added to labels at a later time; otherwise labels cannot be changed.

graph.updateLabel(ElementType.VERTEX, "author", "height", ValueType.DOUBLE);

Whenever vertices or edges are added to the graph, they will first be validated against the schema.

Counters

One unique feature of HGraphDB is support for counters. The use of counters requires that schema management is enabled.

graph.createLabel(ElementType.VERTEX, "author", ValueType.STRING, "bookCount", ValueType.COUNTER);

HBaseVertex v = (HBaseVertex) graph.addVertex(T.id, "Kierkegaard", T.label, "author");
v.incrementProperty("bookCount", 1L);

One caveat is that indices on counters are not supported.

Counters can be used by clients to materialize the number of edges on a node, for example, which will be more efficient than retrieving all the edges in order to obtain the count. In this case, whenever an edge is added or removed, the client would either increment or decrement the corresponding counter.

Counter updates are atomic as they make use of the underlying support for counters in HBase.

Graph Analytics with Giraph

HGraphDB provides integration with Apache Giraph by providing two input formats, HBaseVertexInputFormat and HBaseEdgeInputFormat, that can be used to read from the vertices table and the edges tables, respectively. HGraphDB also provides two abstract output formats, HBaseVertexOutputFormat and HBaseEdgeOutputFormat, that can be used to modify the graph after a Giraph computation.

Finally, HGraphDB provides a testing utility, InternalHBaseVertexRunner, that is similar to InternalVertexRunner in Giraph, and that can be used to run Giraph computations using a local Zookeeper instance running in another thread.

See this blog post for more details on using Giraph with HGraphDB.

Graph Analytics with Spark GraphFrames

Apache Spark GraphFrames can be used to analyze graphs stored in HGraphDB. First the vertices and edges need to be wrapped with Spark DataFrames using the Spark-on-HBase Connector and a custom SHCDataType. Once the vertex and edge DataFrames are available, obtaining a GraphFrame is as simple as the following:

val g = GraphFrame(verticesDataFrame, edgesDataFrame)

See this blog post for more details on using Spark GraphFrames with HGraphDB.

Graph Analytics with Flink Gelly

HGraphDB provides support for analyzing graphs with Apache Flink Gelly. First the vertices and edges need to be wrapped with Flink DataSets by importing graph data with instances of HBaseVertexInputFormat and HBaseEdgeInputFormat. After obtaining the DataSets, a Gelly graph can be created as follows:

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Graph gelly = Graph.fromTupleDataSet(vertices, edges, env);

See this blog post for more details on using Flink Gelly with HGraphDB.

Support for Google Cloud Bigtable

HGraphDB can be used with Google Cloud Bigtable. Since Bigtable does not support namespaces, we set the name of the graph as the table prefix below.

Configuration cfg = new HBaseGraphConfiguration()
    .setInstanceType(InstanceType.BIGTABLE)
    .setGraphTablePrefix("mygraph")
    .setCreateTables(true)
    .set("hbase.client.connection.impl", "com.google.cloud.bigtable.hbase2_x.BigtableConnection")
    .set("google.bigtable.instance.id", "my-instance-id")
    .set("google.bigtable.project.id", "my-project-id");
HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);

Using the Gremlin Console

One benefit of having a TinkerPop layer to HBase is that a number of graph-related tools become available, which are all part of the TinkerPop ecosystem. These tools include the Gremlin DSL and the Gremlin console. To use HGraphDB in the Gremlin console, run the following commands:

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :install org.apache.hbase hbase-client 2.2.1
gremlin> :install org.apache.hbase hbase-common 2.2.1
gremlin> :install org.apache.hadoop hadoop-common 2.7.4
gremlin> :install io.hgraphdb hgraphdb 3.0.0
gremlin> :plugin use io.hgraphdb

Then restart the Gremlin console and run the following:

gremlin> graph = HBaseGraph.open("mygraph", "127.0.0.1", "/hbase-unsecure")

Performance Tuning

Caching

HGraphDB provides two kinds of caches, global caches and relationship caches. Global caches contain both vertices and edges. Relationship caches are specific to a vertex and cache the edges that are incident to the vertex. Both caches can be controlled through HBaseGraphConfiguration by specifying a maximum size for each type of cache as well as a TTL for elements after they have been accessed via the cache. Specifying a maximum size of 0 will disable caching.

Lazy Loading

By default, vertices and edges are eagerly loaded. In some failure conditions, it may be possible for indices to point to vertices or edges which have been deleted. By eagerly loading graph elements, stale data can be filtered out and removed before it reaches the client. However, this incurs a slight performance penalty. As an alternative, lazy loading can be enabled. This can be done by calling HBaseGraphConfiguration.setLazyLoading(true). However, if there are stale indices in the graph, the client will need to handle the exception that is thrown when an attempt is made to access a non-existent vertex or edge.

Bulk Loading

HGraphDB also provides an HBaseBulkLoader class for more performant loading of vertices and edges. The bulk loader will not attempt to check if elements with the same ID already exist when adding new elements.

Implementation Notes

HGraphDB uses a tall table schema. The schema is created in the namespace specified to the HBaseGraphConfiguration. The tables look as follows:

Vertex Table

Row Key	Column: label	Column: createdAt	Column: [property1 key]	Column: [property2 key]	...
[vertex ID]	[label value]	[createdAt value]	[property1 value]	[property2 value]	...

Edge Table

Row Key	Column: label	Column: fromVertex	Column: toVertex	Column: createdAt	Column: [property1 key]	Column: [property2 key]	...
[edge ID]	[label value]	[fromVertex ID ]	[toVertex ID]	[createdAt value]	[property1 value]	[property2 value]	...

Vertex Index Table

Row Key	Column: createdAt	Column: vertexID
[vertex label, isUnique, property key, property value, vertex ID (if not unique)]	[createdAt value]	[vertex ID (if unique)]

Edge Index Table

Row Key	Column: createdAt	Column: vertexID	Column: edgeID
[vertex1 ID, direction, isUnique, property key, edge label, property value, vertex2 ID (if not unique), edge ID (if not unique)]	[createdAt value]	[vertex2 ID (if unique)]	[edge ID (if unique)]

Index Metadata Table

Row Key	Column: createdAt	Column: isUnique	Column: state
[label, property key, element type]	[createdAt value]	[isUnique value]	[state value]

Note that in the index tables, if the index is a unique index, then the indexed IDs are stored in the column values; otherwise they are stored in the row key.

If schema management is enabled, two additional tables are used:

Label Metadata Table

Row Key	Column: id	Column: createdAt	Column: [property1 key]	Column: [property2 key]	...
[label, element type]	[id type]	[createdAt value]	[property1 type]	[property2 type]	...

Label Connections Table

Row Key	Column: createdAt
[from vertex label, edge label, to vertex label]	[createdAt value]

HGraphDB was designed to support the features mentioned here.

Future Enhancements

Possible future enhancements include MapReduce jobs for the following:

Cleaning up stale indices.

hgraphdb's People

Contributors

Stargazers

Watchers

hgraphdb's Issues

Read vertices

Hi,
I followed your example and added two vertices and an edge via following piece of code:

        HBaseGraphConfiguration cfg = new HBaseGraphConfiguration()
                .setInstanceType(InstanceType.DISTRIBUTED)
                .setGraphNamespace("fg2")
                .setCreateTables(false)
                .setGraphTablePrefix("myprefix")
                .setRegionCount(3);
        HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);
        Vertex v1 = graph.addVertex(T.id, 1, T.label, "person", "name", "John");
        Vertex v2 = graph.addVertex(T.id, 2, T.label, "person", "name", "Sally");

but when trying to read a vertex with following command:

  graph.vertex(1).property("person")

Hgraphdb said that there is no such a vertex:

  Exception in thread "main" io.hgraphdb.HBaseGraphNotFoundException: Vertex does not exist: 1
   at io.hgraphdb.readers.VertexReader.load(VertexReader.java:32)
   at io.hgraphdb.readers.VertexReader.load(VertexReader.java:15)
   at io.hgraphdb.models.ElementModel.load(ElementModel.java:48)
   at io.hgraphdb.HBaseElement.load(HBaseElement.java:126)
   at io.hgraphdb.HBaseGraph.vertex(HBaseGraph.java:267)
   at testHgraphDb.main(testHgraphDb.java:24)

I checked tables via shell and tables are created and the mentioned vertices and edge are added. I was wondering if you could tell what I should do?
-Thanks
Ronald

Insert data with spark, some data can not be queried out

Version: 0.4.10

I use spark to import 3 billion vertices and 2 billion edges, but when the query I found some vertices can not be queried out, very very strange.

Set vertexes = graph.traversal().V(id).hasLabel(vertexLabel).out(edgeLabel).toSet();

val cfg = new HBaseGraphConfiguration()
.setInstanceType(HBaseGraphConfiguration.InstanceType.DISTRIBUTED)
.setGraphNamespace("test")
.setCreateTables(false)
.setRegionCount(30)
.setBulkLoaderSkipWAL(true)
.setLazyLoading(true)

spark.read.orc(Edges)
.foreachPartition(iter => {
val (connection, graph) = createGraphConn(graphConfig)
val loader = new HBaseBulkLoader(graph)
val updateTime = getUpdateTime
try {
while (iter.hasNext) {
val line = iter.next()
val columns = line.mkString(sep).split(sep, -1)
val src = new HBaseVertex(graph, DefaultValueUtil.toJavaLong(columns(0)))
val dst = new HBaseVertex(graph, DefaultValueUtil.toJavaLong(columns(1)))
loader.addEdge(src, dst, edgeLabel,
T.id, Math.abs(LongHashFunction.farmUo().hashChars(src + "_" + dst)).toString)
}
loader.close()
} catch {
case e: Exception => e.printStackTrace()
} finally {
connection.close()
graph.close()
}

populate my data by using HBaseBulkLoader you recommended, but I still can't visit my data through Gremlin Console.

Hi rayokota, thanks for help

I have populate my data by using HBaseBulkLoader you recommended, but I still can't visit my data through Gremlin Console.

Here is the code of loading data:

Preparing data via a MapReduce job:
hadoop jar /usr/hdp/2.5.3.0-37/hbase/lib/hbase-server-1.1.2.2.5.3.0-37.jar importtsv -Dhbase.zookeeper.quorum=172.16.**.** -Dzookeeper.znode.parent=/hbase-unsecure -Dimporttsv.bulk.output=hdfs:///graph_data/result -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,f:~l,f:name bulkloader:vertices hdfs:///graph_data/test_node.txt

2.Completing the data load:

./bin/hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs:///graph_data/result bulkloader:vertices

when the loading is completed, I can see the data in the HBase

but when I visit the data using Gremlin Console.

graph=HBaseGraph.open("bulkloader","172.16.*.*", "/hbase-unsecure")
g=graph.traversal()
g.V()

Here comes the error:

Could you help me with this problem？I don't know how to solve this.

Besides, I have seen you also support Giraph and Flink Gelly for Graph Analytics , can we use Giraph and Flink Gelly to import data ?

Thank you very much!

Why codes execute finished, but the program does't stop automatically

When i use HBaseGraph interface to query the vertex, when query finished and the graph.close() method is invoked, but why the program does not stop automatically? I must stop the program manually.

HBaseGraphConfiguration config = new HBaseGraphConfiguration()
                .setGraphNamespace("mygraph")
                .setCreateTables(true)
                .set("hbase.zookeeper.quorum", "panda3,panda4,panda5")
                .set("zookeeper.znode.parent", "/hbase_lions")
                .setInstanceType(HBaseGraphConfiguration.InstanceType.DISTRIBUTED);

        config.setElementCacheMaxSize(0);
        config.setRelationshipCacheMaxSize(0);
        config.setLazyLoading(true);
        config.setStaleIndexExpiryMs(0);

        HBaseGraph graph = (HBaseGraph) GraphFactory.open(config);
        // get persons named John
        Iterator<Vertex> it = graph.verticesByLabel("person", "name", "John");
        while (it.hasNext()){
            System.out.println(it.next());
        }
        graph.close();

Show HGraphDB graph in Gephi using Gremlin Console

Hi,
I want to have a graphical interface of HGraphDB graph in Gephi based on this link.
I load the graph according to this issue. But when I create a fatjar of HGraphDb with most dependencies and add it to my CLASSPATH, the Gephi could not be connected from Gremlin Console while without adding that to CLASSPATH console connected to Gephi without any problem.

Do you have any idea?

Error:

plugin activated: tinkerpop.server
plugin activated: tinkerpop.gephi
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin> :plugin use io.hgraphdb
==>io.hgraphdb activated
gremlin> :remote connect tinkerpop.gephi
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/fariba/AppSource/Tinkerpop/apache-tinkerpop-gremlin-server-3.3.1/RequiredJars/hgraphdb-1.0.4-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/fariba/AppSource/Tinkerpop/apache-tinkerpop-gremlin-console-3.3.1/lib/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
INSTANCE
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.(SSLConnectionSocketFactory.java:144)
at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:955)
at org.apache.http.impl.client.HttpClients.createDefault(HttpClients.java:58)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.jsr223.GephiRemoteAcceptor.(GephiRemoteAcceptor.groovy:62)
at org.apache.tinkerpop.gremlin.console.jsr223.GephiGremlinPlugin$GephiConsoleCustomizer.getRemoteAcceptor(GephiGremlinPlugin.java:39)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.PluggedIn.remoteAcceptor(PluggedIn.groovy:82)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.commands.RemoteCommand$_closure1.doCall(RemoteCommand.groovy:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.codehaus.groovy.tools.shell.ComplexCommandSupport.executeFunction(ComplexCommandSupport.groovy:83)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.codehaus.groovy.tools.shell.ComplexCommandSupport.execute(ComplexCommandSupport.groovy:70)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.codehaus.groovy.tools.shell.Shell.execute(Shell.groovy:104)
at org.codehaus.groovy.tools.shell.Groovysh.super$2$execute(Groovysh.groovy)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.tools.shell.Groovysh.executeCommand(Groovysh.groovy:260)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:159)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:124)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:132)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:152)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:83)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.Console.(Console.groovy:146)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:232)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:453)

Supported HBase and Hadoop versions?

Data traversal query delay

I found a few hours of delay in data traversal query after using HBulkLoader to insert 1 billion vertexes and 5 billion edges, and why? Is there any way to solve it?

Can JanusGraph be a final product of this project?

JanusGraph said itself a database. However I think it is more like a database abstraction.

JanusGraph can use HBase as the backend storage. It has robust interfaces of HBase, Tinkerpop, and many else. So could this project be substituted by JanusGraph?

why edgeindex has CREATED_AT

Hi,
why we need the CREATED_AT, What is its role?
Thanks for your enthusiastic reply！

[PS: I deal the hot spot when I create vertexIndices by HexStringSplit Style,It is a violent method ]

VertexIndexModel:

public byte[] serializeForRead(String label, boolean isUnique, String key, Object value) {
PositionedByteRange buffer = new DynamicPositionedMutableByteRange(4096);
buffer.put((byte) '0');
byte[] prefix = serializePrefix(label, isUnique, key);
ValueUtils.serializeWithSalt(buffer, prefix);
if (value != null) {
ValueUtils.serialize(buffer, value);
}
buffer.setLength(buffer.getPosition());
buffer.setPosition(0);
byte[] bytes = new byte[buffer.getRemaining()];
buffer.get(bytes);
byte[] firstByte = getMd5FirstByte(bytes);
bytes[0] = firstByte[0];
bytes[1] = firstByte[1];
return bytes;
}

public byte[] serializeForWrite(Vertex vertex, boolean isUnique, String key) {
PositionedByteRange buffer = new DynamicPositionedMutableByteRange(4096);
buffer.put((byte) '0');//fill to 00~FF
byte[] prefix = serializePrefix(vertex.label(), isUnique, key);
ValueUtils.serializeWithSalt(buffer, prefix);
ValueUtils.serialize(buffer, vertex.value(key));

    //get the front bytes
    PositionedByteRange bufferPrefix = buffer.deepCopy();
    bufferPrefix.setLength(bufferPrefix.getPosition());
    bufferPrefix.setPosition(0);
    byte[] bytesPrefix = new byte[bufferPrefix.getRemaining()];
    bufferPrefix.get(bytesPrefix);

    if (!isUnique) {
        ValueUtils.serialize(buffer, vertex.id());
    }
    buffer.setLength(buffer.getPosition());
    buffer.setPosition(0);
    byte[] bytes = new byte[buffer.getRemaining()];
    buffer.get(bytes);

    //replace the front two byte to hex string byte
    byte[] firstByte = getMd5FirstByte(bytesPrefix);
    bytes[0] = firstByte[0];
    bytes[1] = firstByte[1];
    return bytes;
}

/**
* get the first two bytes based on md5 of prefix+value
* @param bytes
* @return
*/
private static byte[] getMd5FirstByte(byte[] bytes){

    byte[] byteContent = new byte[2];
    byteContent[0] = '0';
    byteContent[1] = '0';
    try {
        String rowkey = new String(bytes,"gbk");
        String rowkeyMd5 = getMD5(rowkey);
        if(StringUtils.isNotEmpty(rowkeyMd5)){
            byte[] rowkeyMd5Bytes = rowkeyMd5.getBytes("gbk");
            if(rowkeyMd5Bytes.length > 0){
                byteContent[0] = rowkeyMd5Bytes[0];
                byteContent[1] = rowkeyMd5Bytes[1];
            }
        }
        return byteContent;
    }catch (Exception e){
        byteContent[0] = 'f';
        byteContent[1] = 'f';
        return byteContent;
    }
}

/**
 * get md5 by the prefix+value
 * @param str 
 * @return 
 */
public static String getMD5(String str) {
    try {
        MessageDigest md = MessageDigest.getInstance("MD5");
        md.update(str.getBytes("gbk"));
        return new BigInteger(1, md.digest()).toString(16);
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

when I have 20 fragmentations, it is average.
if I will use the api in the future to insert and search datas. Is there a problem with this method?

JDK version limit

Hgraphdb is based on JDK1.8, but at present, many of the servers are still jdk1.7 and will not be easily changed in a short time. Are there any ways to run hgraphdb above jdk1.7?

Read HGraphDB edges based on their labels

Hi,
I'm trying to create Gelly graph from HGraphDB. I followed your example and everything is ok!
But, I want to get edges based on their labels instead of their property.

For example, I need just read edges which have label "port" with all their properties. Is there any idea?

Graph Traversal

Hi,

I'm having some issues with graph traversal as well. Basically what I'm trying to do is:

GraphTraversal<Vertex, Vertex> inspiredFrom = hBaseGraph.traversal().V(source1Vertex).outE("inspiredFrom").inV().inE("inspiredFrom").outV().value(); GraphTraversal result = inspiredFrom.valueMap("distance"); result.next();

And I get this error:
`java.lang.ClassCastException: io.hgraphdb.HBaseVertex cannot be cast to org.apache.tinkerpop.gremlin.structure.Property

at org.apache.tinkerpop.gremlin.process.traversal.step.map.PropertyValueStep.map(PropertyValueStep.java:40)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.MapStep.processNextStart(MapStep.java:37)

`
Is there another way I could cycle through the results?

Thanks!

Best way to only add a vertex if not present?

Hi There,

If I'm using custom IDs for my vertices, what is the prescribed way to add a vertex with a particular ID that may already exist?

Doing something like this

Vertex vertex;
try {
  vertex = graph.vertex("my-id");
} catch (HBaseGraphNotFoundException e) {
  vertex = graph.addVertex(T.id, "my-id");
}

doesn't seem great, but the other methods like findVertex only seem to consult the cache instead of the DB?

Would using HBaseBulkLoader that doesn't check if it exists be a way around this? Is this a bit of a hack?

Table auto creation and deletion for label_metadata and label_connections tables

In HBaseGraphUtils, createTables and dropTables are missing tables label_metadata and label_connections. In case one uses useSchema=true, and tries to add schema enforcements with graph.createLabel calls, it will fail since the 2 tables do not get created.

HBaseGraphUtils line 112 should contain:
createTable(config, admin, Constants.LABEL_METADATA, HConstants.FOREVER);
createTable(config, admin, Constants.LABEL_CONNECTIONS, HConstants.FOREVER);

and line 154:
dropTable(config, admin, Constants.LABEL_METADATA);
dropTable(config, admin, Constants.LABEL_CONNECTIONS);

Apache S2Graph

Hello! I'm excited to hear about your development work on an Apache TinkerPop graph database based on Apache HBase. I read through your blog post too.

Have you tried reaching out to the Apache S2Graph community to contribute to their effort? For example, they have already begun work on the TinkerPop implementation, see S2GRAPH-72 and S2Graph-80. I didn't see a JIRA ticket opened for vertex indices, but you could certainly open up and contribute to that feature as well.

duplicated vertexes and edges

To improve performance, I used the HBaseBulkLoader.addVertex ()/addEdge() to add vertices and edges, yet I found many duplicated vertexes and edges, how can I keep the uniqueness of vertex/edge ID?

How to improve the efficiency of traversal query？

Using vertex index and edge index in hgraphdb,when querying the adjacent vertices of a vertex, the more adjacent vertices, the slower the query will be. Are there any parameter settings can improve the speed of query adjacent vertices?

Update edges tables

I want to update edges table and update their weights in case of seeing duplicated edges (those which already existed). What's the best way to do that in Hgraphdb?

-Thanks in advance
Ronald

how to query vertex if I do not know what label is

Version: 2.0+

I have added vertex index on field "comm_id", in order to avoid the emergence of hot spots I use a lot of labels in the vertex index table, how to query "comm_id" if I do not know what label is.

// add index
graph.createIndex(ElementType.VERTEX, "1label", "comm_id");
graph.createIndex(ElementType.VERTEX, "2label", "comm_id");
......
graph.createIndex(ElementType.VERTEX, "xlabel", "comm_id");

// add vertex
loader.addVertex(T.id, 1000L, T.label, "1label", "comm_id", 100)
loader.addVertex(T.id, 1000L, T.label, "2label", "comm_id", 200)
......
loader.addVertex(T.id, 1000L, T.label, "xlabel", "comm_id", 300)

Using the Gremlin Console to analyse data in hgraphdb，input: g.V("006281338731").out() ; has no result

I have inserted my graph data (spark dataframe format) into HBase, by following this documenation

Then I use HGraphDB in the Gremlin console, and run the following:
graph=HBaseGraph.open("test3","10.9.11.1", "/hbase")
g=graph.traversal()
g.E()

the I can get the edges：
==>e[006281338731:15980203][006281338731-contacts->15980203]
==>e[00821065463:15754630][00821065463-contacts->15754630]
==>e[00827048583:18941000][00827048583-contacts->18941000]
................

when I input:
g.V("006281338731")
get the vertex:
==>v[006281338731]

but when I input：
g.V("006281338731").out()
there is no result

obviously, there exists en edge [006281338731-contacts->15980203]
so I don't understand why the input g.V("006281338731").out() doesn't get the edge.

Can you help me with the problem? Is there anything wrong when I inserted data into Hbase?

here is the process of inserting data:

1. Define catalog

def vertexCatalog =
s"""{
"table":{"namespace":"testGraph", "name":"vertices","tableCoder":"org.apache.spark.sql.execution.datasources.hbase.types.HGraphDB", "version":"2.0"},
"rowkey":"key",
"columns":{
"id":{"cf":"rowkey", "col":"key", "type":"string"},
"label":{"cf":"f", "col":"~l", "type":"string"},
"name":{"cf":"f", "col":"name", "type":"string"}}}""".stripMargin

def edgeCatalog =
s"""{
"table":{"namespace":"testGraph", "name":"edges","tableCoder":"org.apache.spark.sql.execution.datasources.hbase.types.HGraphDB", "version":"2.0"},
"rowkey":"key",
"columns":{
"id":{"cf":"rowkey", "col":"key", "type":"string"},
"relationship":{"cf":"f", "col":"~l", "type":"string"},
"src":{"cf":"f", "col":"~f", "type":"string"},
"dst":{"cf":"f", "col":"~t", "type":"string"},
"create_time":{"cf":"f", "col":"create_time", "type":"string"}}}""".stripMargin

2. generate the DataFrame
case class Vertex(id: String, name: String, label: String)
case class Edge(id: String, src: String, create_time: String, dst: String, relationship: String)

val DataDF=sc.textFile("hdfs:///node.txt").map(_.split(",")).map(p => Vertex(p(0),p(1),p(2))).toDF()

3. inserted the dataframe into hbase
DataDF.write.options(Map(HBaseTableCatalog.tableCatalog -> vertexCatalog, HBaseTableCatalog.newTable -> "5")).format("org.apache.spark.sql.execution.datasources.hbase").save()

Creating too many threads to hbase

I created a connection from Flink to HgraphDB and load graph data to be processed by Flink Gelly algorithms. But I faced a strange problem when loading data and running algorithms: memory consumption increasingly goes up even after completing jobs. I monitored the heap and realized that there are many connections threads to HgraphDB and even they go up over time. Here is a sample thread name:

"hconnection-0x342a5ef0-shared--pool1-t1353"

All new threads are suspended. I make connection as follow:

HBaseGraphConfiguration hconf = HGraphDBConfiguration.createHGraphDBConfig(props);
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Tuple3<String, String, Integer>> edges = env.createInput(
new HBaseEdgeInputFormat<>(hconf, edgeProperty),
TypeInformation.of(new TypeHint<Tuple3<String, String, Integer>>() {
}));
Do you have any idea why it happens?
Thanks
-Ronald

Gremlin console from docker

Hello. I try to run hgraphdb gremlin console on docker. There is the problem:

d.kataev:graph$ docker run -it jbmusso/gremlin-console:latest
Dec 29, 2017 12:03:53 AM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
gremlin> :install org.apache.hbase hbase-client 1.2.0
==>Loaded: [org.apache.hbase, hbase-client, 1.2.0]
gremlin> :install org.apache.hbase hbase-common 1.2.0
==>Loaded: [org.apache.hbase, hbase-common, 1.2.0]
gremlin> :install org.apache.hadoop hadoop-common 2.5.1
==>Loaded: [org.apache.hadoop, hadoop-common, 2.5.1]
gremlin> :install io.hgraphdb hgraphdb 1.0.2
==>Loaded: [io.hgraphdb, hgraphdb, 1.0.2]
gremlin> :plugin use io.hgraphdb
==>io.hgraphdb could not be found - use ':plugin list' to see available plugins
gremlin>

[question] Does hgraphdb support bulkload

Can we create Hgraphdb tables on existing HBase data?
Is there a bulk load API interface to ingest huge amount of data into Hgraphdb?

The speed of HBaseBulkLoader.addVertex() become slow as vertices increases

I try to import a graph containing ten million vertices into the database. To improve performance, I used the HBaseBulkLoader.addVertex () to add vertices. At first, the speed will be very fast, but as the number of vertices increases, the speed become too slow.
I am a novice in using hbase and hgarphdb. I am confused why it becomes slow, and what can I do to optimize it?

Using the Gremlin Console to analyse data in hgraphdb，inputg.V("031183162611").out() ,has no result

Handle Exception if an edge does not have a property

Hi,
When I get edges based on one property in flink, all edges should have this property and any of edges do not have that property an Exception will be thrown.
Is it possible that we can handle this exception and just return edges which have this property and ignore other edges?
I used this commands:

env.createInput(new HBaseEdgeInputFormat<>(hconf, property), typeHint)

mapreduce index populate

run a MapReduce job using the hbase command:
hbase io.hgraphdb.mapreduce.index.PopulateIndex
-t vertex -l person -p name -op /tmp -ca gremlin.hbase.namespace=testgraph
Error:java.io.IOException:Type mismatch in value from map:expected org.apache.hadoop.hbase.KeyValue,received org.apache.hadoop.hbase.client.Result

How to increase the performance of inserting edges in different proecesses?

In general,vertex and edge data are separated in different files,so they are inserted in different processes.
Firstly it need to query source and destination vertex,and then add the edge.
The time to query a vertex may take more than 100 ms,this will greatly affect the efficiency of insertion.
If the amount of edge data is particularly large, the insertion time will be particularly long.
How to improve the performance of inserting edges in this case?

HBaseBulkLoader loader = new HBaseBulkLoader(graph);
Vertex src = graph.vertex((long)1);
Vertex dst = graph.vertex((long)2);
loader.addEdge(src, dst, "edge", "P3", "V3");

Add Travis CI support

It would be good to add support for Travis CI on the repo, which should be pretty easy given that it's a Java project using Maven. The default script should just work; the docs can be found on this page.

Issue accessing Hgraphdb from Gremlin console

Followed steps from https://github.com/rayokota/hgraphdb#setup

Configuration cfg = new HBaseGraphConfiguration()
                .setInstanceType(HBaseGraphConfiguration.InstanceType.DISTRIBUTED)
                .setGraphNamespace("mygraph")
                .setCreateTables(true)
                .setRegionCount(3)
                .set("hbase.zookeeper.quorum", "127.0.0.1")
                .set("zookeeper.znode.parent", "/hbase");
        HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);

and inserted data into Hbase tables

mygraph:edgeIndices
mygraph:edges
mygraph:indexMetadata
mygraph:vertexIndices
mygraph:vertices

Can not connect Gremlin console to Hgraphdb back-end:
Gremlin console version: 3.3.2

Any help is appreciated.

Log

plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: io.hgraphdb
plugin activated: tinkerpop.tinkergraph
gremlin> :install org.apache.hbase hbase-client 1.4.1
==>a module with the name hbase-client is already installed
gremlin> :install org.apache.hbase hbase-common 1.4.1
==>a module with the name hbase-common is already installed
gremlin> :install org.apache.hadoop hadoop-common 2.5.1
==>a module with the name hadoop-common is already installed
gremlin> :install io.hgraphdb hgraphdb 1.0.9
==>a module with the name hgraphdb is already installed
gremlin> :plugin use io.hgraphdb
==>io.hgraphdb activated
gremlin> graph = HBaseGraph.open("mygraph", "127.0.0.1", "/hbase")
WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  - Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
java.io.IOException: java.lang.reflect.InvocationTargetException
Type ':help' or ':h' for help.
Display stack trace? [yN]

io.hgraphdb.HBaseGraphException: java.io.IOException: java.lang.reflect.InvocationTargetException
	at io.hgraphdb.HBaseGraphUtils.getConnection(HBaseGraphUtils.java:67)
	at io.hgraphdb.HBaseGraph.<init>(HBaseGraph.java:127)
	at io.hgraphdb.HBaseGraph.<init>(HBaseGraph.java:118)
	at io.hgraphdb.HBaseGraph.open(HBaseGraph.java:102)
	at io.hgraphdb.HBaseGraph$open.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:144)
	at groovysh_evaluate.run(groovysh_evaluate:3)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:71)
	at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:196)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.super$3$execute(GremlinGroovysh.groovy)
	at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.apache.tinkerpop.gremlin.console.GremlinGroovysh.execute(GremlinGroovysh.groovy:72)
	at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:122)
	at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:95)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:130)
	at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:59)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:145)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:165)
	at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:89)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:146)
	at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:236)
	at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:453)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
	at io.hgraphdb.HBaseGraphUtils.getConnection(HBaseGraphUtils.java:63)
	... 49 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
	... 52 more
Caused by: java.lang.IncompatibleClassChangeError: Class org.apache.hadoop.hbase.protobuf.generated.ClusterIdProtos$ClusterId$Builder does not implement the requested interface com.google.protobuf.Message$Builder
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.mergeFrom(ProtobufUtil.java:3470)
	at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:69)
	at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:75)
	at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:945)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:721)
	... 57 more

Will a gremlin traversal make use of created Indexes?

Hi Robert,

For an index created like:

graph.createIndex(ElementType.VERTEX, "test", "key");

Will a gremlin traversal such as:

graph.traversal().V().has("test", "key", "value");

use this index? Or is it only used when doing something like:

graph.verticesByLabel("test", "key", "value");

Connection pool

Is there any way to set connection pool size?

why getStartKey(regionCount) is Integer.MAX_VALUE / regionCount

hi @rayokota, I have a doubt that why getStartKey(regionCount) is Integer.MAX_VALUE / regionCount, but not 0

admin.createTable(tableDescriptor, getStartKey(regionCount), getEndKey(regionCount), regionCount);

private static byte[] getStartKey(int regionCount) {
        return Bytes.toBytes((Integer.MAX_VALUE / regionCount));
    }
private static byte[] getEndKey(int regionCount) {
        return Bytes.toBytes((Integer.MAX_VALUE / regionCount * (regionCount - 1)));
    }

using Spark GraphFrames

How should your fork of the Spark-on-HBase Connector be compiled to a jar file?
Alternatively, if it's still compatible, how should the HGraphDB.scala serde be included in the current Spark-on-HBase Connector file available on the Maven repo?

Any release notes?

We use hgraphdb in production environment, can you write some release notes?

why io.hgraphdb.groovy.plugin.HBaseGremlinPlugin is drop in 1.0.x?

when i test hgraph in gremlin console, this step: :plugin use io.hgraphdb isn't work, and i found io.hgraphdb.groovy.plugin.HBaseGremlinPlugin is drop in 1.0.x, why?

GraphFactory could not instantiate this Graph implementation?

hi, @rayokota
With version 0.4.1,the connection to HBase is normal when hbase.zookeeper.quorum was set one ip address, but abnormal when hbase.zookeeper.quorum was set multiple ip address.

HBaseGraphConfiguration cfg = new HBaseGraphConfiguration()
.setInstanceType(HBaseGraphConfiguration.InstanceType.DISTRIBUTED)
.setGraphNamespace("testgraph")
.setCreateTables(true)
.set("hbase.zookeeper.quorum", "192.168.178.91:2181,192.168.178.92:2181,192.168.178.93:2181")
.set("zookeeper.znode.parent", "/hbase_lions");

HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);

errors:

2016-12-08 10:25:29 INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=[192.168.178.91:2181, 192.168.178.92:2181, 192.168.178.93:2181] sessionTimeout=180000 watcher=hconnection-0x12d3a4e90x0, quorum=[192.168.178.91:2181, 192.168.178.92:2181, 192.168.178.93:2181], baseZNode=/hbase_lions
Exception in thread "main" java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class io.hgraphdb.HBaseGraph]
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:82)
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:70)
	at com.jd.risk.hbase.demo.HGraphDBDemo.main(HGraphDBDemo.java:23)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:78)
	... 7 more
Caused by: io.hgraphdb.HBaseGraphException: java.io.IOException: java.lang.reflect.InvocationTargetException
	at io.hgraphdb.HBaseGraphUtils.getConnection(HBaseGraphUtils.java:62)
	at io.hgraphdb.HBaseGraph.<init>(HBaseGraph.java:156)
	at io.hgraphdb.HBaseGraph.<init>(HBaseGraph.java:135)
	at io.hgraphdb.HBaseGraph.open(HBaseGraph.java:123)
	... 12 more
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
	at io.hgraphdb.HBaseGraphUtils.getConnection(HBaseGraphUtils.java:59)
	... 15 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
	... 18 more
Caused by: java.lang.NumberFormatException: For input string: "2181]"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at org.apache.zookeeper.client.ConnectStringParser.<init>(ConnectStringParser.java:72)
	at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:443)
	at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:541)
	at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
	at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:879)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:635)
	... 23 more

[Question] Spark DataFrame Bulkload to HgraphDB

This is more of a question than an issue.

Is there any example that we can refer to bulk load data from spark dataframe/rdd to Hgraphdb?

Please redirect me to appropriate usergroup or Gitter if they exist.

HgraphDB connection Failed to connect to Hbase

Hi,
I've set up Hbase in cluster and wanted to connect it using Java API and via HgraphDB. But I faced the following error:

"2018-01-13 17:26:16 INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=server3.asha.fdi:2181 sessionTimeout=180000 watcher=hconnection-0xd706f190x0, quorum=server3.asha.fdi:2181, baseZNode=/hbase
2018-01-13 17:26:16 INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 192.168.7.73/192.168.7.73:2181. Will not attempt to authenticate using SASL (unknown error)
2018-01-13 17:26:16 INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 192.168.7.73/192.168.7.73:2181, initiating session
2018-01-13 17:26:16 INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server 192.168.7.73/192.168

.7.73:2181, sessionid = 0x160efc25c360008, negotiated timeout = 90000
2018-01-13 17:27:05 INFO  org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=10, retries=31, started=48617 ms ago, cancelled=false, msg="

I followed instructions of Hbase manual line by line and I can create table and insert data via shell without any problem. But for Java API, no luck. I attached my configuration file,
. Thanks in advance.

-Best
Ronald
conf.xml.zip

How to set vertex or edge TTL?

In my department, HBase table permissions are controlled, can not be easily added or deleted.In this case, if I want to periodically clear the edge data or vertex data, I can not use ‘truncate table’, only set TTL for each data. So i modify the code and add a line of code in function "public Iterator constructInsertions()":

io.hgraphdb.mutators.EdgeIndexWriter
io.hgraphdb.mutators.EdgeWriter

put.setTTL(3600*24*7);

Is there any other better way?

Using graph.allEdges() function will be an error when there are no data in edges table.

There are no any data in "edges" table，in the process of querying edges data will be an error.

 Iterator<Edge> it = graph.allEdges();
 while (it.hasNext()) {
      Edge e = it.next();
      System.out.println("edge id:" + e.id());
}

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Dec 28 09:54:26 CST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60306: row '' on table 'jrdm:edges' at region=jrdm:edges,,1482764104465.beb99461af011509f2f262c9de7fa0d1., hostname=mjq-hbase-athene-11150.hadoop.jd.local,16020,1474545988318, seqNum=1547721

	at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:97)
	at org.apache.tinkerpop.gremlin.util.iterator.MultiIterator.hasNext(MultiIterator.java:48)
	at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$5.hasNext(IteratorUtils.java:325)
	at com.jd.risk.QueryHbase.queryAllEdges(QueryHbase.java:120)
	at com.jd.risk.QueryHbase.main(QueryHbase.java:59)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Dec 28 09:54:26 CST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60306: row '' on table 'jrdm:edges' at region=jrdm:edges,,1482764104465.beb99461af011509f2f262c9de7fa0d1., hostname=mjq-hbase-athene-11150.hadoop.jd.local,16020,1474545988318, seqNum=1547721

	at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
	at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
	at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
	... 4 more
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60306: row '' on table 'jrdm:edges' at region=jrdm:edges,,1482764104465.beb99461af011509f2f262c9de7fa0d1., hostname=xxxxxx.hadoop.jd.local,16020,1474545988318, seqNum=1547721
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
	at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Call to xxxxxx.hadoop.jd.local/172.28.111.50:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=11, waitTime=60001, operationTimeout=60000 expired.
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)

Hot spots in vertex index table

hi, Vertex Index Table has 20 regions, and the pre-partitioning strategy is hexStringSplit. The design principle of Rowkey is lowercase characters with lexicographical order. However, rowkey always has a fixed prefix of "4", hbase appears hot spots. It is normal when querying data.

private val VERTEX_LABEL_MAP = immutable.HashMap[Int, String](
    0 -> "0VertexLabel",
    1 -> "1VertexLabel",
    2 -> "2VertexLabel",
    3 -> "3VertexLabel",
    4 -> "4VertexLabel",
    5 -> "5VertexLabel",
    6 -> "6VertexLabel",
    7 -> "7VertexLabel",
    8 -> "8VertexLabel",
    9 -> "9VertexLabel",
    10 -> "aVertexLabel",
    11 -> "bVertexLabel",
    12 -> "cVertexLabel",
    13 -> "dVertexLabel",
    14 -> "eVertexLabel",
    15 -> "fVertexLabel",
    16 -> "0aVertexLabel",
    17 -> "1bVertexLabel",
    18 -> "3cVertexLabel",
    19 -> "4dVertexLabel"
  )
private val VERTEX_LABEL_MAP_LEN = VERTEX_LABEL_MAP.size

// add index
for (i<-0 until(VERTEX_LABEL_MAP_LEN)) {
    val vertexRegion = VERTEX_LABEL_MAP.get(i).getOrElse("0VertexLabel")
    graph.createIndex(ElementType.VERTEX, vertexRegion, "name", true)
}

// insert vertex
val testName = "test"
val labelIndex = Math.abs(LongHashFunction.farmUo().hashChars(testName)) % VERTEX_LABEL_MAP_LEN
val labelRegion = VERTEX_LABEL_MAP.get(labelIndex.toInt).getOrElse("0VertexLabel")

loader.addVertex(T.id, vertexId, T.label, labelRegion, "name", testName)

Query efficiency is still very very low although add the index for vertex and edge

Query efficiency is very very low, it need more than 30 seconds when querying only a dozen vertex data. However, Insert and query efficiency of the native HBase is very fast.

HBaseGraphConfiguration cfg = new HBaseGraphConfiguration()
                .setInstanceType(HBaseGraphConfiguration.InstanceType.DISTRIBUTED)
                .setGraphNamespace("testgraph")
                .setCreateTables(true)
                .set("hbase.zookeeper.quorum", "192.168.178.91,192.168.178.92,192.168.178.93")
                .set("zookeeper.znode.parent", "/hbase_lions");

        cfg.setElementCacheMaxSize(0);
        cfg.setRelationshipCacheMaxSize(0);
        cfg.setLazyLoading(true);
        cfg.setStaleIndexExpiryMs(0);

        HBaseGraph graph = (HBaseGraph) GraphFactory.open(cfg);
        graph.createIndex(ElementType.VERTEX, "person", "name");
        graph.createIndex(ElementType.EDGE, "knows", "since");

        Vertex v1 = graph.addVertex(T.id, 1, T.label, "person",
                "name", "test_1",
                "age", 18);
        Vertex v2 = graph.addVertex(T.id, 2, T.label, "person",
                "name", "test_2",
                "age", 19);
        Vertex v3 = graph.addVertex(T.id, 3, T.label, "person",
                "name", "test_3",
                "age", 20);
        v1.addEdge("knows", v2, T.id, "edge1", "since", LocalDate.now());
        v1.addEdge("knows", v3, T.id, "edge2", "since", LocalDate.now());

        List<Vertex> vertexes = graph.traversal().V()
                .hasLabel("person")
                .has("name", "test_1")
                .both("knows")
                .toList();
        for (Vertex v: vertexes) {
            System.out.println("id==" + v.id());
            System.out.println("label==" + v.label());
            System.out.println("name" + v.value("name"));
        }
        graph.close();

Setting Table name

Hi,

How can I set table name, while writing or reading, in Hgraphdb configuration? I couldn't be able to find any relevant property.

-thanks
Ronald

Question on graph traversal

Hi,
First of all thanks for writing this tool, I am in the process of evaluating it for a project which involves some basic graph processing.
I have a question about one call - HBaseGraph#allVertices(Object fromId, int limit) - I am trying to use this to start from an existing Vertex, and get all other Vertices connected to it. Is this the intended usage of this call, or I should be doing this in a different way? Basically I'm looking for a Depth-First or Breadth-First traversal of the graph, and thought this would be a simpler way of doing it. if not, could you recommend how this can be done the easiest?

Batch delete in HBaseBulkLoader

My little suggestion—Most business services are incremental updates, so the batch delete command is very useful in HBaseBulkLoader.
How to delete vertex and vertex index using HBaseBulkLoader?

How to parallelize hbase ops?

First of all, thanks for this great work!
We were using Janusgraph to create and query our graph database.
A short time ago, we came up with your project as a solution to our needs, because we only use hbase as storage. We have updated some of your code in order to meet our user requirements which were all went so good.
But now, we are struggling about the traversal speed of the application. While we query a vertex neighbor edges and vertices, firstly we scan hbase (edge indices) with the vertex id provided (which we need to parallelize those scan ops). After getting edge and vertex IDs as a result to those scans, we have to get corresponding edge/vertex objects with some "get"s from hbase (which we also need to parallelize). Janusgraph had a parameter "storage.backend.parallel-ops" which parallelize backend operations by the means of core-size etc.
Does your project have a parallelization property like Janus, or do we have to implement it? If it is the latter case, what is your suggestion for us as a starting point?

Insert array list as edge property in HgraphDB

Hi,
I insert an array list as the edge property with [(String,String),(String,String),...] format.
but when the size of this array increases, I get this error:

java.lang.ArrayIndexOutOfBoundsException: 4096

Is there any limitation?

load hgraphdb into gremlin server

hi,

i'm trying to currently load the hgraphdb into the gremlin server. i have successfully loaded it into the gremlin console and create the relevant tables into hbase, but for sure i'm doing something wrong related to the server part.

the error i get is:

[WARN] GraphManager - Graph [graph] configured at [conf/tinkergraph-empty.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not find [io.hgraphdb] - Ensure that the jar is in the classpath
java.lang.RuntimeException: GraphFactory could not find [io.hgraphdb] - Ensure that the jar is in the classpath
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:63)
at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:104)
at org.apache.tinkerpop.gremlin.server.GraphManager.lambda$new$0(GraphManager.java:55)
at java.util.LinkedHashMap$LinkedEntrySet.forEach(LinkedHashMap.java:671)
at org.apache.tinkerpop.gremlin.server.GraphManager.(GraphManager.java:53)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.(ServerGremlinExecutor.java:83)
at org.apache.tinkerpop.gremlin.server.GremlinServer.(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:344)

before executing the gremlin server i'm doing:

export HADOOP_CONF_DIR=/etc/hadoop/conf
export CLASSPATH=$HADOOP_CONF_DIR:/opt/gremlin-console323/ext/hgraphdb/lib:/opt/gremlin-console323/ext/hgraphdb/lib/hgraphdb-0.4.14.jar
export HADOOP_GREMLIN_LIBS=/opt/gremlin-console-323/ext/hadoop-gremlin/lib:/opt/gremlin-console323/ext/hgraphdb/lib

and the config part (which for sure is wrong) look like:

gremlin-server-modern.yaml
...

tinkerpop.tinkergraph
tinkerpop.hadoop
io.hgraphdb
tinkerpop.gephi
...

tinkergraph-empty.properties

#gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
#gremlin.tinkergraph.vertexIdManager=LONG
gremlin.graph=io.hgraphdb
storage.backend=hgraphdb
storage.hostname=localhost

any pointers?

thanks!
T

why the edgeindices record auto deleted when the related vertex is not existed.

my scence：
1.when i executed the code, 5 tables(vertices,edges,vertexIndices,edgeIndices,indexMetadata) were created in hbase.
2.when i deleted a vertex record from table xxx:vertices through hbase shell by hand (not through hgraph api), and later executed some other related command in gremlin-console( e.g. g.V(xxx).out(), some related with the deleted vertex ).
3.so in the end, i found record in table xxx:edgeIndices which related to the deleted vertex was missing, but the table xxx:edges record had not changed.
so, i want konw where is the "removing" code which delete record in th behind?
thanks very much!