GithubHelp home page GithubHelp logo

rzs840707 / neo4j-graph-algorithms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from neo4j-contrib/neo4j-graph-algorithms

0.0 3.0 0.0 40.37 MB

Efficient Graph Algorithms for Neo4j

Home Page: https://neo4j.com/docs/graph-algorithms/current/

License: GNU General Public License v3.0

Java 100.00%

neo4j-graph-algorithms's Introduction

Efficient Graph Algorithms for Neo4j

Build Status

This library provides efficiently implemented, parallel versions of common graph algorithms for Neo4j 3.x, exposed as Cypher procedures.

You can find the documentation here: https://neo4j.com/docs/graph-algorithms/current/

Algorithms

Graph algorithms are used to compute metrics for graphs, nodes, or relationships.

They can provide insights on relevant entities in the graph (centralities, ranking), or inherent structures like communities (community-detection, graph-partitioning, clustering).

Many graph algorithms are iterative approaches that frequently traverse the graph for the computation using random walks, breadth-first or depth-first searches, or pattern matching.

Due to the exponential growth of possible paths with increasing distance, many of the approaches also have high algorithmic complexity.

Fortunately, optimized algorithms exist that utilize certain structures of the graph, memoize already explored parts, and parallelize operations. Whenever possible, we’ve applied these optimizations.

Centralities

These algorithms determine the importance of distinct nodes in a network:

Community detection

These algorithms evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart:

Path finding

These algorithms help find the shortest path or evaluate the availability and quality of routes:

Similarity

These algorithms help calculate the similarity of nodes:

Preprocessing

These are utility functions and procedures that transform data for use further along the data pipeline:

These procedures work either on the whole graph, or on a subgraph filtered by label and relationship-type. You can also use filtering and projection using Cypher queries.

Feedback

We’d love your feedback, so please try out these algorithms and let us know how well they work for your use-case. Also please note things that are missing from the installation instructions or documentation.

Please raise GitHub issues for anything you encounter or join the neo4j-users Slack group and ask in the #neo4j-graph-algorithm channel.

Installation

Download graph-algorithms-algo-[version].jar from the matching release and copy it into your $NEO4J_HOME/plugins directory.

Because the algorithms use the lower level Kernel API to read from, and to write to Neo4j, for security purposes you will also have to enable them in the configuration:

  1. Add the following to your $NEO4J_HOME/conf/neo4j.conf file:

    dbms.security.procedures.unrestricted=algo.*
  2. Restart Neo4j

  3. To see a list of all the algorithms, run the following query:

    CALL algo.list()

You can also see the full list in the documentation.

Usage

These algorithms are exposed as Neo4j procedures. They can be called directly from Cypher in your Neo4j Browser, from cypher-shell, or from your client code.

For most algorithms there are two procedures:

  • algo.<name> - this procedure writes results back to the graph as node-properties, and reports statistics.

  • algo.<name>.stream - this procedure returns a stream of data. For example, node-ids and computed values.

    For large graphs, the streaming procedure might return millions, or even billions of results. In this case it may be more convenient to store the results of the algorithm, and then use them with later queries.

We can project the graph we want to run algorithms on with either label and relationship-type projection, or cypher projection.

+----------+label/rel type projection +-----------+
|  Neo4j   +------------------------->| Projected |  Execute algorithm
| stored   |    cypher projection     |   graph   |<-------------------
|  graph   +------------------------->|           |
+----------+                          +-----------+

The projected graph model is separate from Neo4j’s stored graph model to enable fast caching for the topology of the graph, containing only relevant nodes, relationships and weights. The projected graph model does not support multiple relationships between a single pair of nodes. During projection, only one relationship between a pair of nodes per direction (in, out) is allowed in the directed case, but two relationships are allowed for BOTH the undirected cases.

Label and relationship-type projection

We can project the subgraph we want to run the algorithm on by using the label parameter to describe nodes, and relationship-type to describe relationships.

The general call syntax is:

CALL algo.<name>('NodeLabel', "RelationshipType", {config})

For example, PageRank on DBpedia (11M nodes, 116M relationships):

CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85, write: true, writeProperty:'pagerank'});
// YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty

CALL algo.pageRank.stream('Page','Link',{iterations:5, dampingFactor:0.85})
YIELD node, score
RETURN node.title, score
ORDER BY score DESC LIMIT 10;

Huge graph projection

The default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion relationships, so if our project graph is bigger than this we need to use a huge graph projection. This can be enabled by setting graph:'huge' in the config.

The general call syntax is:

CALL algo.<name>('NodeLabel', "RelationshipType", {graph: "huge"})

For example, PageRank on DBpedia:

CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85, writeProperty:'pagerank',graph:'huge'});
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, writeProperty

Cypher projection

If label and relationship-type projection is not selective enough to describe our subgraph to run the algorithm on, we can use Cypher statements to project subsets of our graph. Use a node-statement instead of the label parameter and a relationship-statement instead of the relationship-type, and use graph:'cypher' in the config.

Relationships described in the relationship-statement will only be projected if both source and target nodes are described in the node-statement. Relationships that don’t have both source and target nodes described in the node-statement will be ignored.

We can also return a property value or weight (according to our config) in addition to the ids from these statements.

Cypher projection enables us to be more expressive in describing our subgraph that we want to analyse, but might take longer to project the graph with more complex cypher queries.

The general call syntax is:

CALL algo.<name>(
  'MATCH (n) RETURN id(n) AS id',
  "MATCH (n)-->(m) RETURN id(n) AS source, id(m) AS target",
  {graph: "cypher"})

For example, PageRank on DBpedia:

CALL algo.pageRank(
'MATCH (p:Page) RETURN id(p) as id',
'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:5, write: true});

Cypher projection can also be used to project a virtual (non-stored) graph. Here is an example of how to project an undirected graph of people who visited the same web page and run the Louvain community detection algorithm on it, using the number of common visited web pages between pairs of people as relationship weight:

CALL algo.louvain(
'MATCH (p:Person) RETURN id(p) as id',
'MATCH (p1:Person)-[:Visit]->(:Page)<-[:Visit]-(p2:Person)
RETURN id(p1) as source, id(p2) as target, count(*) as weight',
{graph:'cypher', iterations:5, write: true});

The detailed call syntax and all parameters and possible return values for each algorithm are listed in the project’s documentation

Graph loading

As it can take some time to load large graphs into the algorithm data structures, you can pre-load graphs and then later refer to them by name for several graph algorithms. After usage they can be removed from memory to free resources used:

// Load graph
CALL algo.graph.load('my-graph','Label','REL_TYPE',{graph:'heavy',..other config...})
  YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
        nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

// Info on loaded graph
CALL algo.graph.info('my-graph')
  YIELD name, type, exists, removed, nodes;

// Use graph
CALL algo.pageRank(null,null,{graph:'my-graph',...})


// Remove graph
CALL algo.graph.remove('my-graph')
  YIELD name, type, exists, removed, nodes;

Building locally

Currently aiming at Neo4j 3.x (with a branch per version):

git clone https://github.com/neo4j-contrib/neo4j-graph-algorithms
cd neo4j-graph-algorithms
git checkout 3.3
mvn clean install
cp algo/target/graph-algorithms-*.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

neo4j-graph-algorithms's People

Contributors

mknblch avatar knutwalker avatar mneedham avatar jexp avatar tomasonjo avatar davidoliversp2 avatar jjaderberg avatar fbiville avatar pstutz avatar moxious avatar ryguyrg avatar

Watchers

AndyRen avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.