uwsampa / cse548-labs Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 31.0 1.08 MB

A repository containing homework labs for CSE548

License: MIT License

Makefile 0.43% Tcl 68.89% C 1.99% C++ 10.79% Jupyter Notebook 12.53% Python 5.38%

cse548-labs's People

Contributors

Stargazers

Watchers

Forkers

wanganran hwpeng jnclayiii nus-comparch kshitijzutshi eddieburning maheshguptav yuginhc joshuaebenezer diamondwhile seunghwancho saadmahboob danielm322 qiyangjie acapone13 thulasiramvarma abhishekpathak128

cse548-labs's Issues

[Part 1] Part ` xc7k160tfbg484-2`, chapter 2 needed to complete linked tutorial

I think (and I could have been confused about this because it was the beginning of the assignment) that the installation instructions should have us install an additional part in order to follow the recommended tutorial (chapters 6 and 7 of the Xilinx guide).

When attempting to play along with chapters 6 or 7, the project won't run. The error messages that come up reference something about a "part" not being set, which is confusing when you're just starting out. When you google for the error message, it's a bunch of forum posts with people telling people to run commands in a "tcl console," which is also confusing (if you're in the GUI and don't know what "tcl" is).

What I ended up realizing is that when you open up the tutorial files, you need to specify which part (which also I think is called an "IP" = "intellectual property" ... and I just think of as "FPGA board") you're going to use as a target to build the code. (I guess it's analogous to picking your compiler target between x86 and ARM when compiling "normal" code.)

This is covered in the Xilinx guide in chapter 2 (specifically, page 15).

However, the part they end up picking ( xc7k160tfbg484-2) isn't one that we have installed (I think). So, instead, I picked a part that seemed to be very similar to the one they recommend. However, it's different enough that when you run through chapters 6 and 7, all of the numbers and diagrams are off from what they show in the guide. This makes following through and doing the analysis a bit tricky (especially if everything is totally foreign and you're still trying to get a bearing of what the heck is going on).

So, if installing that part for free is possible, it might be good for future iterations of the lab to have that part installed and point them to chapter 2 to get it setup.

(Apologies if I'm crazy here / feel free to shame me in the comments.)

[Part 2] Additional change to `tcl/hls.tcl` needed

To test the fixed point code, in addition to changing set src_dir, we also need to change

add_files $src_dir/mmult_float.cpp

add_files $src_dir/mmult_fixed.cpp

[Part 1 A] Initial (unmodified) report different than expected

After going into zynq/hls/mmult_float and running vivado_hls -f hls.tcl, the numbers in my report, without changing anything, are different from expected (what's given in the assignment). I ran multiple times. It does seem to be using the correct device (xc7z020clg484-1). Given we're doing optimizations, does this difference matter?

It looks (from my naive investigation) like the L3 inner loop has 11 instead of 10 iteration latency, which bumps up the overall latency by 10%.

Here's what I get:

Latency

expected:

+--------+--------+--------+--------+---------+
|     Latency     |     Interval    | Pipeline|
|   min  |   max  |   min  |   max  |   Type  |
+--------+--------+--------+--------+---------+
|  209851|  209851|  209852|  209852|   none  |
+--------+--------+--------+--------+---------+

mine (about 10% slower):

    +--------+--------+--------+--------+---------+
    |     Latency     |     Interval    | Pipeline|
    |   min  |   max  |   min  |   max  |   Type  |
    +--------+--------+--------+--------+---------+
    |  230331|  230331|  230332|  230332|   none  |
    +--------+--------+--------+--------+---------+

Utilization

expected:

+-----------------+---------+-------+--------+-------+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  |
+-----------------+---------+-------+--------+-------+
|DSP              |        -|      -|       -|      -|
|Expression       |        -|      -|       0|    308|
|FIFO             |        -|      -|       -|      -|
|Instance         |        0|      5|     384|    751|
|Memory           |       16|      -|       0|      0|
|Multiplexer      |        -|      -|       -|    381|
|Register         |        -|      -|     714|      -|
+-----------------+---------+-------+--------+-------+
|Total            |       16|      5|    1098|   1440|
+-----------------+---------+-------+--------+-------+
|Available        |      280|    220|  106400|  53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%)  |        5|      2|       1|      2|
+-----------------+---------+-------+--------+-------+

mine (FF/LUT higher):

+-----------------+---------+-------+--------+-------+
|       Name      | BRAM_18K| DSP48E|   FF   |  LUT  |
+-----------------+---------+-------+--------+-------+
|DSP              |        -|      -|       -|      -|
|Expression       |        -|      -|       0|    537|
|FIFO             |        -|      -|       -|      -|
|Instance         |        0|      5|     384|    751|
|Memory           |       16|      -|       0|      0|
|Multiplexer      |        -|      -|       -|    558|
|Register         |        -|      -|     779|      -|
+-----------------+---------+-------+--------+-------+
|Total            |       16|      5|    1163|   1846|
+-----------------+---------+-------+--------+-------+
|Available        |      280|    220|  106400|  53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%)  |        5|      2|       1|      3|
+-----------------+---------+-------+--------+-------+

Loop performance

expected:

+--------------+--------+--------+----------+-----------+-----------+------+----------+
|              |     Latency     | Iteration|  Initiation Interval  | Trip |          |
|   Loop Name  |   min  |   max  |  Latency |  achieved |   target  | Count| Pipelined|
+--------------+--------+--------+----------+-----------+-----------+------+----------+
|- LOAD_OFF_1  |      10|      10|         2|          -|          -|     5|    no    |
|- LOAD_W_1    |    2580|    2580|       258|          -|          -|    10|    no    |
| + LOAD_W_2   |     256|     256|         2|          -|          -|   128|    no    |
|- LOAD_I_1    |    2064|    2064|       258|          -|          -|     8|    no    |
| + LOAD_I_2   |     256|     256|         2|          -|          -|   128|    no    |
|- L1          |  205056|  205056|     25632|          -|          -|     8|    no    |
| + L2         |   25630|   25630|      2563|          -|          -|    10|    no    |
|  ++ L3       |    2560|    2560|        10|          -|          -|   256|    no    |
|- STORE_O_1   |     136|     136|        17|          -|          -|     8|    no    |
| + STORE_O_2  |      15|      15|         3|          -|          -|     5|    no    |
+--------------+--------+--------+----------+-----------+-----------+------+----------+

mine (L1/L2/L3 are slower---might all be stemming from L3 having 11 instead of 10 iteration latency?):

        +--------------+--------+--------+----------+-----------+-----------+------+----------+
        |              |     Latency     | Iteration|  Initiation Interval  | Trip |          |
        |   Loop Name  |   min  |   max  |  Latency |  achieved |   target  | Count| Pipelined|
        +--------------+--------+--------+----------+-----------+-----------+------+----------+
        |- LOAD_OFF_1  |      10|      10|         2|          -|          -|     5|    no    |
        |- LOAD_W_1    |    2580|    2580|       258|          -|          -|    10|    no    |
        | + LOAD_W_2   |     256|     256|         2|          -|          -|   128|    no    |
        |- LOAD_I_1    |    2064|    2064|       258|          -|          -|     8|    no    |
        | + LOAD_I_2   |     256|     256|         2|          -|          -|   128|    no    |
        |- L1          |  225536|  225536|     28192|          -|          -|     8|    no    |
        | + L2         |   28190|   28190|      2819|          -|          -|    10|    no    |
        |  ++ L3       |    2816|    2816|        11|          -|          -|   256|    no    |
        |- STORE_O_1   |     136|     136|        17|          -|          -|     8|    no    |
        | + STORE_O_2  |      15|      15|         3|          -|          -|     5|    no    |
        +--------------+--------+--------+----------+-----------+-----------+------+----------+

`--num-examples` command line argument in `mnist.py` ignored

It seems like it probably should be removed (it seems like you'll always want to train on all of the training data).

[Part 2] Typo in paragraph 3: `tcl/hts.tcl` should be `tcl/hls.tcl`

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble