arc-research-lab / charm Goto Github PK
View Code? Open in Web Editor NEWCHARM: Composing Heterogeneous Accelerators on Versal ACAP Architecture
License: MIT License
CHARM: Composing Heterogeneous Accelerators on Versal ACAP Architecture
License: MIT License
Thank you for open-sourcing the CHARM project. This is an excellent GEMM accelerator work with outstanding performance. While we read the CHARM paper and code, we have a few questions and would really appreciate it if you can answer them.
Firstly, some questions about the paper:
... so that a tile of LHS with size (X x A x TI) x (Y x B x TK) can be reused on-chip for (Z x TJ) times
", here why it is (Z x TJ) times not (Z x C x TJ) times?1st Step: Workload Assignment
" portion of section 5.4, "... mapping an application with n kernels to num accs suffers...
", here mentions the number of accelerators at the first time, is this a user provided input parameter, or the output from CDSE?Then, as for the code from the GitHub:
input.cfg
file, KRL_TYPE
can only be 0 or 1, however, in src_gen/AIE_ArrGen/gen_graph.sh
, line 89 and src_gen/Kernel_Gen/gen_grah.sh
, line 73, it will check if kernel_type == "int32"
. If kernel type can only be either 1 or 0, why need to check if it is equal to int32
?mm_graph_x3_type1.h
, mm_graph_x3_type0.h
, mm_graph_x3_col.h
? They are not included in the top function or anywhere else.input.cfg
? How to generate different accs for different MM? How to generate accs for non-MM functions? For example, in the example/BERT
, there are files for different sizes, like mm_graph_large.h
, mm_graph_small.h
, dma_large.h
, but according to the sources in src_gen
, there is no sh
file to generate a file which ends with _large.h
or _small.h
. Would be great if you can shed some light on these parts.Thank you very much in advance and looking forward to your reply!
Dear ARC Lab,
Thanks for releasing the code open-source and the nice CHARM project. I tried replicating the pipeline view in paper DAC'23. Could you please describe how to config the parameters such as A_BRO and C_BRO to generate a 1*4 AIE array? Moreover, could you please tell me how to generate aiesim data corresponding to the customized AIE array?
Thanks !
Dear Author,
I am currently practicing small-scale matrix multiplication using a single AIE tile. I have tried mm_kernel0.cc
for various data types in the src path of your project (as mm_kernel0 includes two inputs and one output, theoretically it should be the basic unit capable of performing matrix multiplication).
I created separate projects for each data type of mm_kernel0.cc
to conduct AIE Simulation. Moreover, I generated the test data and Golden data according to the required data type and size by using Python code.
According to my AIE simulator results, the simulation data for int16
, int32
, and fp32
data types of mm_kernel0.cc
are all correct. However, the int8
data type alone does not yield the correct simulation results, which I find puzzling. For convenience, I have attached my test project for the int8
type. In this project, the input matrices used for testing are two identical 01 diagonal matrices. I transposed them for AIE input (as you know, AIE matrix multiplication requires column-wise storage of matrices). You can directly see the two transposed input matrices, along with the golden result, and the AIE output I saved, in the data/
directory. You'll notice that these two results are completely different. If you have time, you can recompile the simulation to get my results by running python int8_test_data_gen.py && make all
. You can also check the other input matrix if modifying the Python code.
I don't think there is any issue with my testing process, as I can obtain correct output results for the other three data types using the same process, with only the int8 output being incorrect. I would like to know if there are any additional conditions to be aware of to obtain the correct results for int8
type mm_kernel0
, or if there is something missing or incorrect in my process?
Thank you for your guidance.
Hi,
I am facing a problem when running command
python project_setup.py
I got the error:
ERROR: [v++ 60-602] Source file does not exist: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
I'm wondering if this has something to do with the vitis 2022.2 version I'm using, or a misconfiguration somewhere
The complete log is as follows:
`mkdir -p ./build_dir.hw.xilinx_vck190_base_202220_1
v++ -l -t hw --platform /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm --save-temps --optimize 2 --hls.jobs 8 --config ./conn.cfg --clock.defaultFreqHz 220000000 --temp_dir ./build_dir.hw.xilinx_vck190_base_202220_1 --vivado.synth.jobs 8 --vivado.impl.jobs 8 -o'build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin' _x.hw.xilinx_vck190_base_202220_1/dma.xo libadf.a | tee ./build_dir.hw.xilinx_vck190_base_202220_1/hpc_xclbin.log
Option Map File Used: '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/data/vitis/vpp/optMap.xml'
****** v++ v2022.2 (64-bit)
**** SW Build 3671529 on 2022-10-13-17:52:11
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/hw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/hw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/sw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/sw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
INFO: [v++ 60-1306] Additional information associated with this v++ link can be found at:
Reports: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/reports/link
Log files: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/logs/link
Running Dispatch Server on port: 37909
INFO: [v++ 60-1548] Creating build summary session with primary output /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin.link_summary, at Fri Jul 21 16:59:57 2023
INFO: [v++ 60-1315] Creating rulecheck session with output '/home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/reports/link/v++_link_hw_guidance.html', at Fri Jul 21 16:59:57 2023
INFO: [v++ 60-895] Target platform: /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm
INFO: [v++ 60-1578] This platform contains Xilinx Shell Archive '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/hw/hw.xsa'
ERROR: [v++ 82-4223] Output file type of .xsa is required. A different output file type has been specified: build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
INFO: [v++ 60-1653] Closing dispatch client.
v++ -p -t hw -f /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm
--package.out_dir ./package.hw
--package.rootfs /opt/xilinx-versal-common-v2022.2/rootfs.ext4
--package.kernel_image /opt/xilinx-versal-common-v2022.2/Image
--package.boot_mode=sd
--package.image_format=ext4
--package.defer_aie_run
--package.sd_file hostexe
libadf.a ./build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin -o mm_hw.xclbin
Option Map File Used: '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/data/vitis/vpp/optMap.xml'
****** v++ v2022.2 (64-bit)
**** SW Build 3671529 on 2022-10-13-17:52:11
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.
ERROR: [v++ 60-602] Source file does not exist: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
INFO: [v++ 60-1662] Stopping dispatch session having empty uuid.
INFO: [v++ 60-1653] Closing dispatch client.
make: *** [Makefile:185: package_hw] Error 1`
Thanks
Yiou
Hi,
In the paper, there are evaluations on non-GEMM operations such as softmax, layernorm, etc. However, I think the current release does not include them. I am wondering if you have an implementation or any references.
Hi,
I have seen there is a Bandwidth Profiler before DSE in fig6 of CHARM paper.
And I want to reproduce it on my vck5000 since the it has 4 DRAM bank.
But in cdse.py, the bandwidth part is fixed, how did you profile the bandwidth of the FPGA?
# One-Time Profling of DDR Bandwidth
BW_L_S=(12*DDR_BANK)*freq_rate
BW_R_S=(12*DDR_BANK)*freq_rate
BW_O_S=(8.5*DDR_BANK)*freq_rate
BW_L_DR=(8*DDR_BANK)*freq_rate
BW_R_DL=(8*DDR_BANK)*freq_rate
BW_L_DO=(8*DDR_BANK)*freq_rate
BW_R_DO=(8*DDR_BANK)*freq_rate
BW_O_D=(6*DDR_BANK)*freq_rate
BW_L_T=(7*DDR_BANK)*freq_rate
BW_R_T=(7*DDR_BANK)*freq_rate
BW_O_T=(7*DDR_BANK)*freq_rate
Thank you.
Dear ARC Lab,
Thanks for releasing the code open-source and the nice work. I tried replicating the flow as per the instructions and get the following error on executing make all PLATFORM=${PATH} EDGE_COMMON_SW_PATH=${PATH} SYSROOT_PATH={PATH}
Just checking if there is any quick hint/resolution that you are aware of?
Regards,
Rajesh
Total Number of unique Switch FIFOs: 0
Running AIE Post-Map Finalizer.
Post-Map Finalizer succeeded.
Ordered merge post process: Adding BD config broadcast nets
Running AIE Post-Map Finalizer.
Post-Map Finalizer succeeded.
Check AIE-ROUTER has run: 12 errors
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_0 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_4 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_SWITCH_S_SOUTH_CH0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_SWITCH_S_SOUTH_CH4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_0 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_4 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_SWITCH_S_SOUTH_CH0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_SWITCH_S_SOUTH_CH4_PIN exceeds limit of 100.
HDRTPinDelayHelperV10:: Clearing pin delay helper
Releasing pin delay helper for floorplan 0x6661ea70
HDRTPinDelayHelperV10:: Releasing Pin Dly Helper
NodeGraph Released.
DeviceData Released.
ERROR:MathEngineUDMRouter:###UDM Router Did NOT Finish Successfully
/tools/Xilinx/Vitis/2022.2/aietools/bin/aieir_be: line 98: kill: (-871611) - No such process
Compilation Failed
INFO: [aiecompiler 77-5805] Run completed. Additional information can be found in:
Guidance: ./Work/reports/guidance.html
INFO: [aiecompiler 77-5806] Use the vitis_analyzer tool to visualize and navigate the relevant reports. Run the following command.
vitis_analyzer ./Work/mm_top.aiecompile_summary
/tools/Xilinx/Vitis/2022.2/aietools/bin/aiecompiler: line 83: kill: (-871297) - No such process
Hi @JinmingZhuang
Great work! I am working in machine learning. But I'm a beginner to FPGA and learning about machine learning implementation on FPGA. Your work seems latest and a good start point. Since FPGA boards are not easily available to everyone, more specifically vkc190. For learning purpose, could you also make a tutorial or show some pointers for beginners like me, how to use run your code or use it in a xilinx simulator in linux/ubuntu? I'm also not sure if there is any simulator like that. Please let me know.
Thank you!
Srikanth
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.