uwsampa / cse548-labs Goto Github PK
View Code? Open in Web Editor NEWA repository containing homework labs for CSE548
License: MIT License
A repository containing homework labs for CSE548
License: MIT License
I think (and I could have been confused about this because it was the beginning of the assignment) that the installation instructions should have us install an additional part in order to follow the recommended tutorial (chapters 6 and 7 of the Xilinx guide).
When attempting to play along with chapters 6 or 7, the project won't run. The error messages that come up reference something about a "part" not being set, which is confusing when you're just starting out. When you google for the error message, it's a bunch of forum posts with people telling people to run commands in a "tcl console," which is also confusing (if you're in the GUI and don't know what "tcl" is).
What I ended up realizing is that when you open up the tutorial files, you need to specify which part (which also I think is called an "IP" = "intellectual property" ... and I just think of as "FPGA board") you're going to use as a target to build the code. (I guess it's analogous to picking your compiler target between x86 and ARM when compiling "normal" code.)
This is covered in the Xilinx guide in chapter 2 (specifically, page 15).
However, the part they end up picking ( xc7k160tfbg484-2
) isn't one that we have installed (I think). So, instead, I picked a part that seemed to be very similar to the one they recommend. However, it's different enough that when you run through chapters 6 and 7, all of the numbers and diagrams are off from what they show in the guide. This makes following through and doing the analysis a bit tricky (especially if everything is totally foreign and you're still trying to get a bearing of what the heck is going on).
So, if installing that part for free is possible, it might be good for future iterations of the lab to have that part installed and point them to chapter 2 to get it setup.
(Apologies if I'm crazy here / feel free to shame me in the comments.)
To test the fixed point code, in addition to changing set src_dir
, we also need to change
add_files $src_dir/mmult_float.cpp
to
add_files $src_dir/mmult_fixed.cpp
After going into zynq/hls/mmult_float
and running vivado_hls -f hls.tcl
, the numbers in my report, without changing anything, are different from expected (what's given in the assignment). I ran multiple times. It does seem to be using the correct device (xc7z020clg484-1
). Given we're doing optimizations, does this difference matter?
It looks (from my naive investigation) like the L3
inner loop has 11 instead of 10 iteration latency, which bumps up the overall latency by 10%.
Here's what I get:
expected:
+--------+--------+--------+--------+---------+
| Latency | Interval | Pipeline|
| min | max | min | max | Type |
+--------+--------+--------+--------+---------+
| 209851| 209851| 209852| 209852| none |
+--------+--------+--------+--------+---------+
mine (about 10% slower):
+--------+--------+--------+--------+---------+
| Latency | Interval | Pipeline|
| min | max | min | max | Type |
+--------+--------+--------+--------+---------+
| 230331| 230331| 230332| 230332| none |
+--------+--------+--------+--------+---------+
expected:
+-----------------+---------+-------+--------+-------+
| Name | BRAM_18K| DSP48E| FF | LUT |
+-----------------+---------+-------+--------+-------+
|DSP | -| -| -| -|
|Expression | -| -| 0| 308|
|FIFO | -| -| -| -|
|Instance | 0| 5| 384| 751|
|Memory | 16| -| 0| 0|
|Multiplexer | -| -| -| 381|
|Register | -| -| 714| -|
+-----------------+---------+-------+--------+-------+
|Total | 16| 5| 1098| 1440|
+-----------------+---------+-------+--------+-------+
|Available | 280| 220| 106400| 53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%) | 5| 2| 1| 2|
+-----------------+---------+-------+--------+-------+
mine (FF
/LUT
higher):
+-----------------+---------+-------+--------+-------+
| Name | BRAM_18K| DSP48E| FF | LUT |
+-----------------+---------+-------+--------+-------+
|DSP | -| -| -| -|
|Expression | -| -| 0| 537|
|FIFO | -| -| -| -|
|Instance | 0| 5| 384| 751|
|Memory | 16| -| 0| 0|
|Multiplexer | -| -| -| 558|
|Register | -| -| 779| -|
+-----------------+---------+-------+--------+-------+
|Total | 16| 5| 1163| 1846|
+-----------------+---------+-------+--------+-------+
|Available | 280| 220| 106400| 53200|
+-----------------+---------+-------+--------+-------+
|Utilization (%) | 5| 2| 1| 3|
+-----------------+---------+-------+--------+-------+
expected:
+--------------+--------+--------+----------+-----------+-----------+------+----------+
| | Latency | Iteration| Initiation Interval | Trip | |
| Loop Name | min | max | Latency | achieved | target | Count| Pipelined|
+--------------+--------+--------+----------+-----------+-----------+------+----------+
|- LOAD_OFF_1 | 10| 10| 2| -| -| 5| no |
|- LOAD_W_1 | 2580| 2580| 258| -| -| 10| no |
| + LOAD_W_2 | 256| 256| 2| -| -| 128| no |
|- LOAD_I_1 | 2064| 2064| 258| -| -| 8| no |
| + LOAD_I_2 | 256| 256| 2| -| -| 128| no |
|- L1 | 205056| 205056| 25632| -| -| 8| no |
| + L2 | 25630| 25630| 2563| -| -| 10| no |
| ++ L3 | 2560| 2560| 10| -| -| 256| no |
|- STORE_O_1 | 136| 136| 17| -| -| 8| no |
| + STORE_O_2 | 15| 15| 3| -| -| 5| no |
+--------------+--------+--------+----------+-----------+-----------+------+----------+
mine (L1
/L2
/L3
are slower---might all be stemming from L3
having 11 instead of 10 iteration latency?):
+--------------+--------+--------+----------+-----------+-----------+------+----------+
| | Latency | Iteration| Initiation Interval | Trip | |
| Loop Name | min | max | Latency | achieved | target | Count| Pipelined|
+--------------+--------+--------+----------+-----------+-----------+------+----------+
|- LOAD_OFF_1 | 10| 10| 2| -| -| 5| no |
|- LOAD_W_1 | 2580| 2580| 258| -| -| 10| no |
| + LOAD_W_2 | 256| 256| 2| -| -| 128| no |
|- LOAD_I_1 | 2064| 2064| 258| -| -| 8| no |
| + LOAD_I_2 | 256| 256| 2| -| -| 128| no |
|- L1 | 225536| 225536| 28192| -| -| 8| no |
| + L2 | 28190| 28190| 2819| -| -| 10| no |
| ++ L3 | 2816| 2816| 11| -| -| 256| no |
|- STORE_O_1 | 136| 136| 17| -| -| 8| no |
| + STORE_O_2 | 15| 15| 3| -| -| 5| no |
+--------------+--------+--------+----------+-----------+-----------+------+----------+
It seems like it probably should be removed (it seems like you'll always want to train on all of the training data).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.