GithubHelp home page GithubHelp logo

graphframes / graphframes Goto Github PK

View Code? Open in Web Editor NEW
972.0 58.0 232.0 2.37 MB

Home Page: http://graphframes.github.io/graphframes

License: Apache License 2.0

Scala 69.48% Shell 4.32% Python 23.80% Makefile 2.05% Dockerfile 0.27% HTML 0.08%

graphframes's Introduction

graphframes

Build Status codecov.io

GraphFrames: DataFrame-based Graphs

This is a package for DataFrame-based graphs on top of Apache Spark. Users can write highly expressive queries by leveraging the DataFrame API, combined with a new API for motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine.

You can find user guide and API docs at https://graphframes.github.io/graphframes.

Building and running unit tests

To compile this project, run build/sbt assembly from the project home directory. This will also run the Scala unit tests.

To run the Python unit tests, run the run-tests.sh script from the python/ directory. You will need to set SPARK_HOME to your local Spark installation directory.

Spark version compatibility

This project is compatible with Spark 2.4+. However, significant speed improvements have been made to DataFrames in more recent versions of Spark, so you may see speedups from using the latest Spark version.

Contributing

GraphFrames is collaborative effort among UC Berkeley, MIT, and Databricks. We welcome open source contributions as well!

Releases:

See release notes.

graphframes's People

Contributors

alexott avatar aray avatar bryant1410 avatar danielfx90 avatar darthsuogles avatar denisgarci avatar eejbyfeldt avatar enricomi avatar felixcheung avatar jkbradley avatar johnnyleitrim avatar lu-wang-dl avatar menglewis avatar mengxr avatar mrbago avatar nchammas avatar phi-dbq avatar profes avatar rekhajoshm avatar rxin avatar shagunsodhani avatar tartakynov avatar thunterdb avatar tsivula avatar weichenxu123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphframes's Issues

Py4JJavaError: An error occurred while calling o57.find.

I am using graphframes:graphframes:0.1.0-spark1.6 from the pyspark interface with current master build of Spark. I get the following error when I am trying to use the g.find functions and other functions as shown in the example notebook at: http://go.databricks.com/hubfs/notebooks/3-GraphFrames-User-Guide-python.html

In [16]: motifs = g.find("(a)-[e]->(b); (b)-[e2]->(a)")
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-16-ac1d920bb1a7> in <module>()
----> 1 motifs = g.find("(a)-[e]->(b); (b)-[e2]->(a)")

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in find(self, pattern)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o57.find.
: java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
        at org.graphframes.GraphFrame.findSimple(GraphFrame.scala:370)
        at org.graphframes.GraphFrame.find(GraphFrame.scala:263)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)

In [17]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:paths = g.find("(a)-[e]->(b)")\
:  .filter("e.relationship = 'follow'")\
:  .filter("a.age < b.age")
:# The `paths` variable contains the vertex information, which we can extract:
:e2 = paths.select("e.src", "e.dst", "e.relationship")
:
:# In Spark 1.5+, the user may simplify the previous call to:
:# val e2 = paths.select("e.*")
:
:# Construct the subgraph
:g2 = GraphFrame(g.vertices, e2)
:--
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-17-7411759ff966> in <module>()
----> 1 paths = g.find("(a)-[e]->(b)")  .filter("e.relationship = 'follow'")  .filter("a.age < b.age")
      2 # The `paths` variable contains the vertex information, which we can extract:
      3 e2 = paths.select("e.src", "e.dst", "e.relationship")
      4 
      5 # In Spark 1.5+, the user may simplify the previous call to:

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in find(self, pattern)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o57.find.
: java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
        at org.graphframes.GraphFrame.findSimple(GraphFrame.scala:370)
        at org.graphframes.GraphFrame.find(GraphFrame.scala:263)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)

Also, when using the function BFS, ConnectedComponents, LabelPropagation, triangleCount, shortestPaths, pageRank, stronglyConnectedComponents, I get the following errors about the Methods not implemented.

In [18]: paths = g.bfs("name = 'Esther'", "age < 32")
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-18-90070d1d699a> in <module>()
----> 1 paths = g.bfs("name = 'Esther'", "age < 32")

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in bfs(self, fromExpr, toExpr, edgeFilter, maxPathLength)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o101.run.
: java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
        at org.graphframes.GraphFrame.findSimple(GraphFrame.scala:370)
        at org.graphframes.GraphFrame.find(GraphFrame.scala:263)
        at org.graphframes.lib.BFS$.org$graphframes$lib$BFS$$run(BFS.scala:159)
        at org.graphframes.lib.BFS.run(BFS.scala:126)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [19]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:filteredPaths = g.bfs(
:  fromExpr = "name = 'Esther'",
:  toExpr = "age < 32",
:  edgeFilter = "relationship != 'friend'",
:  maxPathLength = 3)
:--
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-19-9a217d29ca2a> in <module>()
      3   toExpr = "age < 32",
      4   edgeFilter = "relationship != 'friend'",
----> 5   maxPathLength = 3)

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in bfs(self, fromExpr, toExpr, edgeFilter, maxPathLength)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o122.run.
: java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
        at org.graphframes.GraphFrame.findSimple(GraphFrame.scala:370)
        at org.graphframes.GraphFrame.find(GraphFrame.scala:263)
        at org.graphframes.lib.BFS$.org$graphframes$lib$BFS$$run(BFS.scala:159)
        at org.graphframes.lib.BFS.run(BFS.scala:126)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [20]: result = g.connectedComponents()
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-20-7eb76cabdc93> in <module>()
----> 1 result = g.connectedComponents()

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in connectedComponents(self)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o141.run.
: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrame.map(Lscala/Function1;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/RDD;
        at org.graphframes.GraphFrame.toGraphX(GraphFrame.scala:136)
        at org.graphframes.GraphFrame.cachedGraphX$lzycompute(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedGraphX(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedTopologyGraphX$lzycompute(GraphFrame.scala:432)
        at org.graphframes.GraphFrame.cachedTopologyGraphX(GraphFrame.scala:431)
        at org.graphframes.lib.ConnectedComponents$.run(ConnectedComponents.scala:50)
        at org.graphframes.lib.ConnectedComponents.run(ConnectedComponents.scala:43)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [21]: g.vertices
Out[21]: DataFrame[id: string, name: string, age: bigint]

In [22]: g.vertices.show()
+---+-------+---+
| id|   name|age|
+---+-------+---+
|  a|  Alice| 34|
|  b|    Bob| 36|
|  c|Charlie| 30|
|  d|  David| 29|
|  e| Esther| 32|
|  f|  Fanny| 36|
|  g|  Gabby| 60|
+---+-------+---+


In [23]: result = g.stronglyConnectedComponents(maxIter=10)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-23-9cbad8f66c11> in <module>()
----> 1 result = g.stronglyConnectedComponents(maxIter=10)

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in stronglyConnectedComponents(self, maxIter)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o163.run.
: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrame.map(Lscala/Function1;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/RDD;
        at org.graphframes.GraphFrame.toGraphX(GraphFrame.scala:136)
        at org.graphframes.GraphFrame.cachedGraphX$lzycompute(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedGraphX(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedTopologyGraphX$lzycompute(GraphFrame.scala:432)
        at org.graphframes.GraphFrame.cachedTopologyGraphX(GraphFrame.scala:431)
        at org.graphframes.lib.StronglyConnectedComponents$.org$graphframes$lib$StronglyConnectedComponents$$run(StronglyConnectedComponents.scala:52)
        at org.graphframes.lib.StronglyConnectedComponents.run(StronglyConnectedComponents.scala:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [24]: result = g.labelPropagation(maxIter=5)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-24-d841648f0e34> in <module>()
----> 1 result = g.labelPropagation(maxIter=5)

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in labelPropagation(self, maxIter)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o185.run.
: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrame.map(Lscala/Function1;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/RDD;
        at org.graphframes.GraphFrame.toGraphX(GraphFrame.scala:136)
        at org.graphframes.GraphFrame.cachedGraphX$lzycompute(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedGraphX(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedTopologyGraphX$lzycompute(GraphFrame.scala:432)
        at org.graphframes.GraphFrame.cachedTopologyGraphX(GraphFrame.scala:431)
        at org.graphframes.lib.LabelPropagation$.org$graphframes$lib$LabelPropagation$$run(LabelPropagation.scala:63)
        at org.graphframes.lib.LabelPropagation.run(LabelPropagation.scala:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [25]: results = g.pageRank(resetProbability=0.15, tol=0.01)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-25-7ba4099c0dbc> in <module>()
----> 1 results = g.pageRank(resetProbability=0.15, tol=0.01)

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in pageRank(self, resetProbability, sourceId, maxIter, tol)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o208.run.
: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrame.map(Lscala/Function1;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/RDD;
        at org.graphframes.GraphFrame.toGraphX(GraphFrame.scala:136)
        at org.graphframes.GraphFrame.cachedGraphX$lzycompute(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedGraphX(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedTopologyGraphX$lzycompute(GraphFrame.scala:432)
        at org.graphframes.GraphFrame.cachedTopologyGraphX(GraphFrame.scala:431)
        at org.graphframes.lib.PageRank$.runUntilConvergence(PageRank.scala:153)
        at org.graphframes.lib.PageRank.run(PageRank.scala:102)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [26]: results = g.shortestPaths(landmarks=["a", "d"])
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-26-05cc91bf2d89> in <module>()
----> 1 results = g.shortestPaths(landmarks=["a", "d"])

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in shortestPaths(self, landmarks)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o231.run.
: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrame.map(Lscala/Function1;Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/RDD;
        at org.graphframes.GraphFrame.toGraphX(GraphFrame.scala:136)
        at org.graphframes.GraphFrame.cachedGraphX$lzycompute(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedGraphX(GraphFrame.scala:438)
        at org.graphframes.GraphFrame.cachedTopologyGraphX$lzycompute(GraphFrame.scala:432)
        at org.graphframes.GraphFrame.cachedTopologyGraphX(GraphFrame.scala:431)
        at org.graphframes.lib.ShortestPaths$.org$graphframes$lib$ShortestPaths$$run(ShortestPaths.scala:69)
        at org.graphframes.lib.ShortestPaths.run(ShortestPaths.scala:59)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


In [27]: results = g.triangleCount()
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-27-8e965378aa62> in <module>()
----> 1 results = g.triangleCount()

/content/tmp/spark-e344f1b3-a40f-488a-9cef-57049b7b3a04/userFiles-54d9528e-17fa-4a53-907e-dc4eca1da328/graphframes_graphframes-0.1.0-spark1.6.jar/graphframes/graphframe.pyc in triangleCount(self)

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
    833         answer = self.gateway_client.send_command(command)
    834         return_value = get_return_value(
--> 835             answer, self.gateway_client, self.target_id, self.name)
    836 
    837         for temp_arg in temp_args:

/content/SOFTWARE/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/content/SOFTWARE/spark/python/lib/py4j-0.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    308                 raise Py4JJavaError(
    309                     "An error occurred while calling {0}{1}{2}.\n".
--> 310                     format(target_id, ".", name), value)
    311             else:
    312                 raise Py4JError(

Py4JJavaError: An error occurred while calling o252.run.
: java.lang.NoSuchMethodError: scala.collection.immutable.$colon$colon.hd$1()Ljava/lang/Object;
        at org.graphframes.GraphFrame.findSimple(GraphFrame.scala:370)
        at org.graphframes.GraphFrame.find(GraphFrame.scala:263)
        at org.graphframes.lib.TriangleCount$.org$graphframes$lib$TriangleCount$$run(TriangleCount.scala:58)
        at org.graphframes.lib.TriangleCount.run(TriangleCount.scala:39)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:290)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)


Copy general graph docs from GraphX

This issue is for copying text from GraphX's user guide to GraphFrames' user guide for graphs in general. There is a separate issue for copying text related to specific algorithms.

Create a simple Java example in the documentation

The documentation currently only provides Scala and Python examples, but I can't find any Java examples online.

For instance these two links work fine:
http://go.databricks.com/hubfs/notebooks/3-GraphFrames-User-Guide-scala.html
http://go.databricks.com/hubfs/notebooks/3-GraphFrames-User-Guide-python.html

But what I assume should be the Java examples, gives a 403 Forbidden response:
http://go.databricks.com/hubfs/notebooks/3-GraphFrames-User-Guide-java.html

When I attempt to create a GraphFrame instance in Java the compiler tells me the constructor has only 'private access':
DataFrame v = sqlContext.createDataFrame(list1, schema1);
DataFrame e = sqlContext.createDataFrame(list2, schema2);
GraphFrame g = new GraphFrame(v, e);

invalid dependency when converting from Graphx

scala version 2.11.7
spark-2.0.0-bin-hadoop2.7
graphframes-0.2.0-spark2.0-s_2.11.jar

$ ./bin/spark-shell --master local[4] --jars /Downloads/graphframes-0.2.0-spark2.0-s_2.11.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/09/14 09:29:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/14 09:29:38 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.0.1.26:4040
Spark context available as 'sc' (master = local[4], app id = local-1473870577895).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.graphframes._
import org.graphframes._

scala> import org.apache.spark.graphx._
import org.apache.spark.graphx._

scala> import org.apache.spark.graphx.util.GraphGenerators
import org.apache.spark.graphx.util.GraphGenerators

scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD

scala> val myVertices = sc.makeRDD(Array((1L, "Ann"),(2L, "Bill"),(3L, "Charles"),(4L, "Diane"),(5L, "Went to gym this morning")))
myVertices: org.apache.spark.rdd.RDD[(Long, String)] = ParallelCollectionRDD[0] at makeRDD at <console>:32

scala> val myEdges = sc.makeRDD(Array(Edge(1L, 2L, "is-friends-with"),Edge(2L, 3L, "is-friends-with"),Edge(3L, 4L, "is-friends-with"),Edge(4L, 5L, "Likes-status"),Edge(3L, 5L, "Wrote-status")))
myEdges: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[String]] = ParallelCollectionRDD[1] at makeRDD at <console>:32

scala> val myGraph = Graph(myVertices, myEdges)
myGraph: org.apache.spark.graphx.Graph[String,String] = org.apache.spark.graphx.impl.GraphImpl@38f3dbbf

scala> val gf = GraphFrame.fromGraphX(myGraph)
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term typesafe in package com,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term scalalogging in value com.typesafe,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access type LazyLogging in value com.slf4j,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.

Graph Partitioning

How to do graph partition in GraphFrames similar to the partitionBy feature in GraphX? Can we use the Dataframe's repartition feature in 1.6 to provide a graph partitioning in graphFrames?

wrap power iteration clustering

MLlib implements power iteration clustering. We can add a wrapper in GraphFrames. For example, g.powerIterationClustering.k(10).maxIter(5).run() returns a vertex DataFrame with cluster assignments. Note that we fixed a bug in PIC recently. So we might need to copy the implementation from Spark master before the next Spark release, as we did for PageRank.

Will GraphFrames become part of Spark?

DataFrames are becoming increasingly central to Spark, so it raises the question: Will GraphFrames become part of the main Spark project, alongside GraphX, or will it continue as a separate library?

I think a line in the README commenting on this would be helpful.

(another?) invalid dependency

Environment

  • scala 2.11.8
  • spark-core 2.0.0
  • spark-sql 2.11
  • graphframes 0.2.0-spark2.0-s_2.11

In addition to the logging dependency error in #109, we're getting

Error:scalac: missing or invalid dependency detected while loading class file 'GraphFrame.class'.
Could not access type DataFrame in package org.apache.spark.sql.package,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'GraphFrame.class' was compiled against an incompatible version of org.apache.spark.sql.package.

Any ideas? It appears that our DataFrame class is in org.apache.spark.sql, not org.apache.spark.sql.package (which doesn't appear to exist).

Copy algorithm docs from GraphX

Current state: We copied the Scala docstrings from GraphX. We did not copy text from the user guide.

This issue is for copying text from the user guide for the standard library of graph algorithms.

Api for graph operations is inconsistent with PageRank

The PageRank algorithm returns a graph, while the other similar graph operations (degree, connected components etc.) all return a DataFrame with 2 columns, the vertices and the corresponding statistic for that vertex (i.e. which component/degree etc.).

It seems odd that similar algorithms don't have a consistent return type, is there a reason for this behavior?

SVD++ should support source DataFrame of other column types

It seems GraphFrames SVD++ fails if edges are not exactly of types "long", "long", "double" (scala.MatchError in patten matching on this line)

Perhaps we could support other variations, such as:

  • "long", "long", "float"
  • "int", "int", "float"
  • "int", "int", "double"

Clean up SVDPlusPlus API

We should make SVDPlusPlus easier to use. This means improving the API:

  • parameter names
  • returned model parameters
  • making predictions with the learned model

We should also improve the documentation.

Subgraph selection helper method

It would be nice to have a helper method for selecting subgraphs. I'm imagining something using the stdlib API:

def selectVertexSubgraph(expr): GraphFrame
def selectEdgeSubgraph(expr): GraphFrame
def selectSubgraph(vertexExpr, edgeExpr): GraphFrame

where expr is a Column or String for filtering.

These methods should ensure consistency between the resulting vertex, edge DataFrames. E.g., if a subset of vertices are selected, then any edges connected with a dropped vertex should be dropped.

When selecting subsets of edges, it might also be nice to have options for choosing what to do with the vertices: Drop any vertices not connected to a selected edge, or keep all vertices?

Scala documentation out of date

The Scala example seems a little out-of-date, as it calls the 'numIter()' method on PageRank, when in fact that method seems to be now called 'maxIter()', i.e. change this:

val results = g.pageRank.resetProbability(0.01).numIter(20).run()

...to this:

val results = g.pageRank.resetProbability(0.01).maxIter(20).run()

Also when running this example by cutting-and-pasting into 'spark-shell', I needed to manually import the GraphFrames class before instantiating 'g', i.e.:

import org.graphframes.GraphFrame

Get all neighbors

Is there any method similar to degrees(), which can be access using the python api, to not only get the number of edges but the id of them?

Thank you!

Get all neighbors

Is there any method similar to degrees(), which can be access using the python api, to not only get the number of edges but the id of them?

Thank you!

Number of Connected Components

Hello,

how large can a Graphframe be in order to calculate its connected components? I need to scale for thousand of edges in very strong machines.

Thanks

SLF4J Logging Error

Hi,
I tried creating a graph frame, but it gave me this ClassDefNotFound error:

java.lang.NoClassDefFoundError: com/typesafe/scalalogging/slf4j/LazyLogging

I am using GraphFrames 0.2.0 for Spark 2.0 and Scala 2.11. Any ideas what could be causing this? Thanks!

release graphframes for scala 2.11

Currently graphframes is only released for Scala 2.10

It would be cool if like for the other spark components there could also be a release for Scala 2.11

Thanks a lot.

some problems when using graphframe API find()

When I use graphframes API find, I meet up with some problems. For example, graphframes g, var motif=g.find((a)-[e1]->(b);(a)-[e2]->(d);(c)-[e3]->(b);(c)-[e4]->(d)),I can get a dataframe as a result. But in the result vertex a and vertex c may be the same, if I want all the vertexes which have different names are different vertexes,except I can use motif=motif.filter("XXX"),what other methods can I use?

Consistency checks upon construction

When GraphFrames are constructed, we do not check that the vertices DataFrame contains all vertices from the edges DataFrame. We should do that somehow (ideally lazily).

This will likely be a problem for subgraph selection.

I'll create a separate issue for better subgraph selection methods.

Graphframes => Apache Spark 2.0 Compatible

I'm playing around with getting this set up. It seems like there are some pretty significant code changes, not just remove .map on DataFrames.

Questions:
[ ] What framework should we use for logging internally since org.apache.spark.Logging no longer exists?

Tasks:
[ ] Fix LogInfo
[ ] Fix Logging
[ ] Fix .map calls
[ ] Fix callUDF calls.

How to use graphframes in Jupyter notebook by referencing graphrames.jar

I'd like to user it locally in Jupyter notebook. I've downloaded the graphrames.jar and created PYSPARK_SUBMIT_ARGS variable that references the jar.
The import from graphframes import * works but fails on call g = GraphFrame(v, e)
Py4JJavaError: An error occurred while calling o57.loadClass.
: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI

Operating system: Windows

Heterogeneous vertices?

val v = sqlContext.createDataFrame(List(
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30)
)).toDF("id", "name", "age")

What if I wanted to say, for example, that these people work at a company? Say we didn't care about the "age" of the company, just who the employees are.

So how would this data be added as a Vertex, or is it even possible?

("companyA", "Foobar, Inc.")

The edge is straightforward

("a", "companyA", "works_at")

To clarify, this isn't possible because the Company has no "age", so the schema can't be applied

val v = sqlContext.createDataFrame(List(
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("companyA", "FooBar, Inc.")
)).toDF("id", "name", "age")

SLF4J error

I am running Scala 2.10.4 with Spark 1.5.0-cdh5.5.2 and am getting the following error when running a GraphFrames job:

scala

> val g = GraphFrame(v, e)
error: bad symbolic reference. A signature in Logging.class refers to type LazyLogging
in package com.typesafe.scalalogging.slf4j which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling Logging.class.

I am starting my spark-shell with the following command:

spark-shell --jars /data/spark-jars/scalalogging-slf4j_2.10-1.1.0.jar,/data/spark-jars/graphframes-0.2.0-spark1.5-s_2.10.jar

I have tried different versions of scalalogging, but nothing seems to work.

Thanks for the help.

log4j.properties file ignored

When GraphFrames jar/package is loaded, Spark will ignore log4j.properties in SPARK_CONF_DIR. Perhaps src/main/resources/log4j.properties is overriding it?

@thunterdb seems to be working on a solution in #55.

Sort columns from motif finding

Motif finding outputs columns in an arbitrary order, but we should sort the columns to match the order of vertices and edges specified in the motif. That way, if a user writes "(a)-[e]->(b); (b)-[e2]->(c)", then the output columns will be ordered as expected: a, e, b, e2, c.

connectedComponents() raises lots of warnings that say "block locks were not released by TID = ..."

Trying to run a simple connectedComponents() analysis on an example dataset, even the one from the quick start, yields a flurry of warnings (several dozen?) like this:

16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 358:
[rdd_95_5]
16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 353:
[rdd_95_0]
16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 359:
[rdd_95_6]
...

And this is for a graph with literally 3-4 vertices and edges.

Is this an issue? Would it cause performance issues at scale? (Here's a related question on Stack Overflow.)

I'm running Python 2.7, Spark 2.0.1, and GraphFrames 0.2.

Clarify special columns in API docs

We should make it clear what happens when a vertex or edge DataFrame contains a column with a special name. E.g., what happens when you call PageRank on a graph whose vertices already have a "pagerank" column? We currently throw an error, but we should be more explicit about this, both in runtime checks in the code and in the API docs.

More scalable connected components implementation

There have been many reports of the connected components algorithm in GraphX and GraphFrames not scaling. @mengxr has a prototype of a better algorithm. This issue is for tracking adding it to GraphFrames master.

Subtasks:

  • Implementation: #119
  • Improved unit tests: #121
  • GraphX legacy support: #122
  • Python API: #123
  • (optional) checkpoint interval param: #124
  • handle skewness in assigning long IDs

How to use is with python3?

When I runt this import with pyspark with python3.51, it errors:

from graphframes import *
ZipImportError: can't find module 'graphframes'

but is success in pyspark with python 2.7.1

Motif resulting in empty DataFrame

Hi,
I am trying to use the motif functionality within Graphframes. I tried a bunch of different different motifs, including those in the user guide, on a variety of different GraphFrames, but to no avail. I even tried the simple motif "(a)-[e]->(b)", but the resulting DataFrame was empty. If I am understanding this correctly, it should return all a,e, and b such that a has an edge to b. This was the command that I used.

g.find("(a)-[e]->(b)")

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.