GithubHelp home page GithubHelp logo

mr-brillianter / fpga_dpu Goto Github PK

View Code? Open in Web Editor NEW

This project forked from markcxli/fpga_dpu

0.0 0.0 0.0 6.1 MB

This project is to implement YOLO v3 on Xilinx FPGA with DPU

Makefile 0.26% Python 0.87% C 92.39% Shell 0.16% Cuda 5.82% C++ 0.50%

fpga_dpu's Introduction

Hardware Acceleration for Machine Learnning

A image identifier implemented with FGPA to achieve fast real-time multiple object detection.

TEAM MEMBERS: Zuxiong Tan, Samyak Jain, Chenxi Li

Project Goals:

  1. Find a state-of-art multiple object detection model
  2. Measure its performance on GPU for inferencing
  3. Deploy the model on FPGA DPU achieving real-time measurement
  4. Measure the inferencing performance
  5. Compare performances
  • Make roofline plot
  • Calculate memory bandwidths for the DL program on GPU and FPGA

What is DPU

  • The Xilinx® Deep Learning Processor Unit (DPU) is a programmable engine optimized for convolutional neural networks. The unit includes a high performance scheduler module, a hybrid computing array module, an instruction fetch unit module, and a global memory pool module. The DPU uses a specialized instruction set, which allows for the efficient implementation of many convolutional neural networks. Some examples of convolutional neural networks which have been deployed include VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, and many others.
  • The DPU IP can be implemented in the programmable logic (PL) of the selected Zynq®-7000 SoC or Zynq UltraScale+™ MPSoC devices with direct connections to the processing system (PS). The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A program running on the application processing unit (APU) is also required to service interrupts and coordinate data transfers. https://www.xilinx.com/products/design-tools/ai-inference/ai-developer-hub.html#edge

image

DPU Development Flow (Using DNNDK)

  • The DPU requires a device driver which is included in the Xilinx Deep Neural Network Development Kit (DNNDK) toolchain.
  • The DNNDK User Guide (UG1327) describes how to use the DPU with the DNNDK tools. The basic development flow is shown in the following figure. First, use Vivado to generate the bitstream. Then, download the bitstream to the target board and install the DPU driver. For instructions on how to install the DPU driver and dependent libraries, refer to the DNNDK User Guide (UG1327).https://www.xilinx.com/support/documentation/user_guides/ug1327-dnndk-user-guide.pdf

image

Similar Products:

  1. NVIDIA Deep Learning Accelerator(NVDLA):
  • This is a free and open architecture that promotes a standard way to design deep learning inference accelerators. NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices.
  • NVDLA overview: http://nvdla.org
  1. Google's Tensor Processing Unit(TPU):

Sprint 1

  • Mange to run YOLO on GPU
  • Compare YOLO's performance on GPU to on CPU
  • Get FPGA

Sprint 2 (Lots of work on reverse-enginneering darknet YOLO)

  • Refactor YOLO we got from https://pjreddie.com/darknet/yolo/
  • Rewrite YOLO with DNNDK API
  • Looked into different methods to run the given C code on an FPGA
    • Use OpenCL framework to run the code on an Intel FPGA. Can be done using the Intel FPGA SDK for OpenCL
    • Convert the code into HDL to run on a Xilinx FPGA
      • Implement DPU on vivado and run some simulation tests

Results from sprint 1

Time taken to detect obejcts on a single image

  • Prediction on BU SCC GPU 0.925530 seconds.
  • Prediction on CPU(single core). Intel Core i5: 19.457083 seconds.
  • GPU Spec:
    • Tesla P100 PCIe 16GB
    • Width: 64 bits
    • Clock: 33MHz

Sprint 3

  • Achieved object detection using Hardware Accelerator based on FPGA
  • Compare the performance and Power efficiency between FPGA, GPU and CPU

System Diagram

image Graph above shows the system diagram of the design using YOLOv2 model with darknet-19. In this design we used CPU as the co-processor and used FPGA to accelerate the calculation. The acceleration card we used is Xilinx ML Suite-Alveo U200 and we developed it on AWS(Amazon Web Services)

Performance

image

According to the graph, GPU runs 15.5 times faster than CPU, FPGA runs 4.9 times faster than CPU.

Power efficiency

image

Power efficiency = speed/power, where GPU is 5.89 times better than CPU, FPGA is 52.6 times better than CPU.

User Stories:

  • Navigation for Robots
  • Surveillance
  • Self-Driving cars Use YOLOv2 algorithm

Poster

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.