GithubHelp home page GithubHelp logo

0xbadcoffe / hpcg_fpga Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xilinx/hpcg_fpga

0.0 0.0 0.0 10.03 MB

License: BSD 3-Clause "New" or "Revised" License

Makefile 3.82% C++ 80.05% C 0.64% Python 10.14% Shell 1.68% MATLAB 0.07% CSS 1.25% PHP 1.70% HTML 0.66%

hpcg_fpga's Introduction

High Performance Conjugate Gradient Benchmark (HPCG) - FPGA

FPGA port from Xilinx Research Labs, forked from original HPCG Revision: 3.1 Date: March 28, 2019

Introduction

HPCG is a software package that performs a fixed number of multigrid preconditioned (using a symmetric Gauss-Seidel smoother) conjugate gradient (PCG) iterations using double precision (64 bit) floating point values.

The HPCG rating is is a weighted GFLOP/s (billion floating operations per second) value that is composed of the operations performed in the PCG iteration phase over the time taken. The overhead time of problem construction and any modifications to improve performance are divided by 500 iterations (the amortization weight) and added to the runtime.

Integer arrays have global and local scope (global indices are unique across the entire distributed memory system, local indices are unique within a memory image). Integer data for global/local indices have three modes:

  • 32/32 - global and local integers are 32-bit
  • 64/32 - global integers are 64-bit, local are 32-bit
  • 64/64 - global and local are 64-bit.

These various modes are required in order to address sufficiently big problems if the range of indexing goes above 2^31 (roughly 2.1B), or to conserve storage costs if the range of indexing is less than 2^31.

The HPCG software package requires the availibility on your system of an implementation of the Message Passing Interface (MPI) if enabling the MPI build of HPCG, and a compiler that supports OpenMP syntax. An implementation compliant with MPI version 1.1 is sufficient.

FPGA Implementation

The FPGA Implementation of HPCG focuses on implementing all the different kernels inside the benchmark in our design: Dot Product Multiplication (DP), Sparse Matrix Vector Multiplication (SPMV), Symmetric Gauss-Seidel Smoother (SYMGS) and Vector Vector Co-efficient Multiplication (WAXPBY).

Building for the FPGA

The FPGA implementation of HPCG has been developed keeping the same calculations as the original software version of HPCG implementing specific optimizations for multiple FPGA execution.
All the source code of the optimized version of HPCG is located in the src-fpga directory.
All the code has been designed to target the Alveo U280 FPGA, and can be run using one or multiple boards.
Execution of the benchmark requires XRT 2020.1, also Vitis 2020.1 is necessary to build both the kernels and the host executable.
To build kernels simply navigate to the src-fpga directory and choose the desired precision for example:
cd src-fpga/double
From there type:
make all TARGET=hw The Makefile looks for the Xilinx Alveo U280 platform (DSA version 2019.3) in the default installation path (/opt/xilinx/platforms).
If the path to the DSA is different, the user needs to specify the path when building the kernels in the following way:
PLATFORM=/path_to_platform/xilinx_u280_xdma_201920_3.xpfm
To build the host executable go in the home directory and choose which precision you used to build the binaries so that the host files are build accordingly, for example:
make arch=FPGA_DOUBLE.
This will generate the host executable that will target double precision operations.
Inside the src-fpga/ directory multiple versions of the benchmark are present.
We implemented the Benchmark using different degrees of precision to test the performance of the FPGA.
If one desires to build for a lower precision than the one used normally in the Benchmark both host executable and FPGA build need to be adjusted accordingly.
To build for a lower precision simply go in the directory inside src-fpga and build for the FPGA in the same way as before:
cd src-fpga/half
make all TARGET=hw
This will build the kernels using half precision operations.
To build the host executable accordingly simply go into the home directory of the repository and type: make arch=FPGA_HALF

Executing on the FPGA

To execute the benchmark position yourself on the home directory of the repository and type:
mpirun -n 1 /bin/xhpcg 64 64 64
The syntax of the host executable is the following:
mpirun -n <number_of_mpi_nodes> <path_to_hpcg_executable> <dim_x> <dim_y> <dim_z> To execute the benchmark on multiple nodes use a config file, example configfiles are stored in the configFile directory.
IP addresses of the other host machines need to be specified inside the configFile for proper execution.

Results

Performance obtained using the Alveo U280, and the Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz with various dataset sizes and different data precision.

Problem Size Double GFlop/s Single GFlop/s Half GFlop/s
16^3 106.1 215.4 409.4
32^3 107.1 215.7 418.0
64^3 103.9 213.5 430.9
128^3 107.2 217.9 422.3
256^3 108.3 211.3 426.6

Performance comparison of HPCG across different platforms, using the same dataset size with different data precision.

Platform Problem Size Double GFlop/s
Intel Xeon 6234 256^3 1.03
NVIDIA Tesla K40 256^3 33.0
NVIDIA Tesla M40 256^3 41.5
NVIDIA Tesla P100 256^3 100.0
NVIDIA Tesla V100 256^3 145.9
Alveo U280 256^3 108.3

Installation

See the file INSTALL in this directory.

Valid Runs

HPCG can be run in just a few minutes from start to finish. However, official runs must be at least 1800 seconds (30 minutes) as reported in the output file. The Quick Path option is an exception for machines that are in production mode prior to broad availability of an optimized version of HPCG 3.0 for a given platform. In this situation (which should be confirmed by sending a note to the HPCG Benchmark owners) the Quick Path option can be invoked by setting the run time parameter equal to 0 (zero).

A valid run must also execute a problem size that is large enough so that data arrays accessed in the CG iteration loop do not fit in the cache of the device in a way that would be unrealistic in a real application setting. Presently this restriction means that the problem size should be large enough to occupy a significant fraction of main memory, at least 1/4 of the total.

Future memory system architectures may require restatement of the specific memory size requirements. But the guiding principle will always be that the problem size should reflect what would be reasonable for a real sparse iterative solver.

hpcg_fpga's People

Contributors

maherou avatar luszczek avatar lwisteria avatar mneilly avatar hpc-ken avatar christopheredsall avatar huanghua1994 avatar jspark1105 avatar albertozeni avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.