GithubHelp home page GithubHelp logo

raccoon33 / thundergp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xtra-computing/thundergp

0.0 0.0 0.0 2.39 MB

Fast Graph Processing Framework for HLS-based FPGAs

License: Apache License 2.0

Makefile 7.42% C 22.64% C++ 60.07% Objective-C 7.79% Shell 1.20% MATLAB 0.88%

thundergp's Introduction

logo

GitHub license GitHub issues

ThunderGP: Fast Graph Processing for HLS-based FPGAs

What's new?

ThunderGP is featured at Xilinx Apps and Libraries

Introduction

ThunderGP enables data scientists to enjoy the performance of FPGA-based graph processing without compromising programmability. To our best knowledge and experiments, this is the fastest graph processing framework on HLS-based FPGAs.

Two aspacts make the ThunderGP deliver superior performance. On the one hand, ThunderGP embraces an improved execution flow to better exploit the pipeline parallelism of FPGA and alleviate the data access amount to the global memory. On the other hand, the memory accesses are highly optimized to fully utilize the memory bandwidth capacity of the hardware platforms.

ThunderGP can run on both Xilinx and Intel platforms:

  • Check the implementation on Intel platform out.

  • On Xilinx multi-SLR based FPGAs, it is running at 250Mhz, and the performance can be up to 5300 MTEPS (million traversed edges per second), or a 2 times speedup over the state-of-the-art.

Prerequisites

  • The gcc-4.8 or above
  • Tools:
    • SDAccel 2018.3 Design Suit
    • SDAccel 2019.2 Design Suit
  • Evaluated platforms from Xilinx:
    • Xilinx Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit (SDAccel 2018.3)
    • Alveo U200 Data Center Accelerator Card (SDAccel 2019.2)
    • Alveo U250 Data Center Accelerator Card (SDAccel 2019.2)

Run the code

Currently, ThunderGP supports four build-in graph analytic algorithms, namely PR, SpMV, BFS and SSSP.
The wanted application can be implemented by passing argument app=[the wanted algorithm] to make command.
The below table is for quick reference of this argument.

Argument Accelerated algorithm
app=pr PageRank (PR)
app=spmv Sparse Matrix-vector Multiplication (SpMV)
app=bfs Breadth First Search (BFS)
app=sssp Single Source Shortest Path (SSSP)

Here is an example of implementing PR algorithm.

$ cd ./
$ make cleanall
$ make app=pr all -j # make the host execution program and FPGA execution program for pagerank application. It takes time.
$ ./host [bitfile] [graph name] #e.g., ./host_graph_fpga _x/link/int/graph_fpga.hw.xilinx_vcu1525_xdma_201830_1.xclbin wiki-talk

More details: Compiling ThunderGP

Results (performance)

Throughput (MTEPS) of different graph processing algorithms over datasets on VCU1525 platform.

Algo. rmat-21-32 rmat-24-16 web-google wiki-talk pokec live-journal twitter-2010
PR 4,274 3,797 2,502 3,138 3,790 2,860 2,438
SpMV 4,759 4,396 2,018 3,043 3,871 3,133 2,561
BFS 5,395 4,619 2,431 3,775 4,072 3,490 3,004
SSSP 3,895 3,446 1,817 2,954 3,090 2,700 2,273

Throughput (MTEPS) of different graph processing algorithms over datasets on U200 platform.

Algo. rmat-21-32 rmat-24-16 web-google wiki-talk pokec live-journal twitter-2010
PR 4,151 3,689 3,019 2,352 3,670 2,734 2,319
SpMV 4,548 4,159 2,826 1,820 3,650 2,931 2,375
BFS 5,226 4,437 3,614 2,247 3,883 3,336 2,849
SSSP 3,630 3,218 2,706 1,620 2,837 2,476 2,054

APIs (programmability)

Benefiting from the high level abstraction of HLS, our APIs natively support C/C++ languages.
ThunderGraph covers three levels of API for implementation or further exploration. APIs in L1 and L2 are for building the accelerators, and APIs of L3 are for host program. Details are as below:

Framework Overview

The Adopted Computation Model

The Gather-Apply-Scatter (GAS) model is widely used for FPGA-based graph processing frameworks as computation model due to its extensibility to various graph processing algorithms. ThunderGP adopts a simplified version of GAS model by following work On-the-fly-data-shuffling-for-OpenCL-based-FPGAs. This model updates the vertex property by propagating from source vertex to destination vertex. The input for the model is an unordered set of directed edges of the graph. Undirected edges in a graph can be represented by a pair of directed edges.

drawing

The process per iteration mainly contains three stages: Scatter, Gather, and Apply.

  • In the Scatter stage (shown in line 2 to 6), for each input edge with format <src, dst, weight>, an update tuple is generated for the destination vertex of the edge. The update tuple is of the format <dst, value>, where the dst is the destination vertex of the edge and value is generated by processing the vertex properties and edge weights.
  • In the Gather stage (shown in line 7 to 9), all the update tuples generated in the Scatter stage are accumulated to update destination vertices.
  • The final Apply stage (shown in line 10 to 12) executes an apply function on all the vertices of the graph.

The Execution Flow of ThunderGP

overview

As shown in the above diagram, The edges in one partition are streamed into Scatter stage, For each edges, the property of source vertices will be fetched from the global memory by the per-fetching and the cache module, at the same time, the property of corresponding edge, or the weight of edge is loaded from global memory in stream, then these two value go through an algorithm-specific processing which return an update of the property of the destination vertex, finally, at the end of scatter stage, this update value and the destination of this edge is combined to create a update tuple. The update tuples are streamed into the shuffle stage which dispatches the tuples to corresponding gather processing engines(PEs). The Gather PEs accumulates the update value in local on-chip memory which is caching the property of destination vertices. After all the edges in this partition are processed, the cached data in gather PEs will be aggregated to the global memory. and the Apply stage which calls algorithm-specific function updates all the vertices for the next iteration.

Future Work

  • Application wrapper for high level platform (Spark, etc.)
  • Hardware-accelerated query engine.
  • Cycle-precision software simulation for the verification of dynamic modules(Cache, etc.) and channel depth tuning.
  • Optimization for large scale graph. (distributed processing or HBM-based memory hierarchy)

Related publications

Related systems

Key members

Acknowledgement

thundergp's People

Contributors

soldierchen avatar hongshitan avatar xaccnus avatar bingshenghe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.