timlrx / graph-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

76.0 5.0 8.0 147 KB

License: MIT License

Python 51.74% Shell 27.10% Julia 21.16%

graph-benchmarks's Introduction

Benchmark of popular graph / network packages

A comparison of 5 different packages:

NetworkX, v2.4, Python 3.8
graph-tool, v2.31, Python 3.8
Igraph, v0.8.2, Python 3.8
NetworKit, v6.1.0, Python 3.8
SNAP, v5.0.0, Python 3.7
LightGraphs, v2.0-dev, Julia 1.4

For a more detailed description of the process and results, please refer to the following blog post.

Results

The benchmark was run using Google's Compute n1-standard-16 instance (16vCPU Haswell 2.3GHz, 60 GB memory).

Each algorithm was run 100 times on the Amazon and Google dataset and 10 times on the Pokec dataset, with the exception of Networkx.

The median run time is shown in the table below. Due to differences in profiling techniques and code implementation, the results may differ. Please refer to the respective code bases for implementation details.

Setup

Setup and installation instructions can be found in setup.md.

Data

Datasets are downloaded from https://snap.stanford.edu/data/ and is stored in the data folder. Amazon refers to amazon0302, google to web-Google and pokec to soc-Pokec. A download_data.sh script is provided in the data folder to automate the download and pre-processing of the SNAP datasets.

Code

Profiling code are located in the code folder. A particular benchmark code can be run using the helper bash script run_profiler.sh [profiling code] [dataset path] [number of repetitions] [output path]. For example, to replicate the igraph benchmark on the amazon dataset with 100 repetitions run run_profiler.sh code/igraph_profile.py data/amazon0302.txt 100 output/igraph_amazon.txt.

graph-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

chao-jiang mbrukman sbromberger payne acezen shalevy1 hieutv85 aaronchenwei

graph-benchmarks's Issues

NetworKit shortest path benchmark

Hi, I've noticed that the NetworKit shortest-path benchmark is executed with

distance.BFS(g, node_index).run()

However, with this API NetworKit will also store all shortest paths from node_index to all the other nodes, which implies a significant memory and time overhead. This behavior is a bit counterintuitive, and should be better documented, one does not expect BFS to store by default all shortest paths.
Since the other tools only compute the shortest distances, for a fair comparison you should run this:

distance.BFS(g, node_index, storePaths=False).run()

Graph construction performance

Hi, I read your blog post about benchmarking graph network packages; nice work. Have you run any tests on performance of building out a graph, node by node, or do you know of any? thanks

Any thoughts on memory consumption?

I know you said that memory consumption are out of scope of your study, but I am curious about your intuition on this. I am looking for a package that's the most memory efficient. My raw list of edges (in numpy) takes 16 GB, but when creating a networkx instance from the edge list, it requires more than 64GB of memory :(

Could you include Weighted Betweenness Centrality?

I am wondering if we can infer the differences in performance while calculating Weighted Betweenness Centrality from the Shortest Path results you show.

If one algorithm is faster on the shortest path, does this mean it is faster also on betweenness?
Does the shortest path algorithm consider arc weights?

It would be great if you could include betweenness (in the version that considers arc weights) in the next benchmarks!

Use pagerank_scipy instead of pagerank [networkx]?

In terms of raw performance networkx.pagerank_scipy can be 4-5X faster than networkx.pagerank. For the google.txt file on my local machine.

In [4]: %%timeit
   ...: nx.pagerank(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
40.1 s ± 5.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %%timeit
   ...: nx.pagerank_scipy(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
8.89 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```

LightGraphs version in the benchmark

Hi @timlrx, thanks for the very interesting work. I am trying to run the benchmarks following your instructions but have some troubles running lightgraphs.jl. I am using the LightGraphs master branch as suggested in the file (and also the master branch of graph-benchmarks), but it seems to have different implementations about functions like ShortestPaths and Centrality with the ones expected in lightgraphs.jl. For example, I found LightGraphs.ShortestPaths as LightGraphs.Experimental.ShortestPaths, but could not find ThreadedBFS and Centrality. Could you provide suggestions on which version I should use to run the lightgraphs.jl benchmark? Thanks!