GithubHelp home page GithubHelp logo

arc-research-lab / charm Goto Github PK

View Code? Open in Web Editor NEW
102.0 6.0 15.0 156.04 MB

CHARM: Composing Heterogeneous Accelerators on Versal ACAP Architecture

License: MIT License

Makefile 3.05% C++ 93.80% C 1.04% Shell 1.43% Python 0.68%
deeplearning fpga heterogeneous-computing design-space-exploration high-level-synthesis versalacap electronic-design-automation domain-specific-architecture acap versal

charm's Issues

Some questions about the paper and code of CHARM

Thank you for open-sourcing the CHARM project. This is an excellent GEMM accelerator work with outstanding performance. While we read the CHARM paper and code, we have a few questions and would really appreciate it if you can answer them.

Firstly, some questions about the paper:

  1. In the last paragraph of section 4.2, it says "... so that a tile of LHS with size (X x A x TI) x (Y x B x TK) can be reused on-chip for (Z x TJ) times", here why it is (Z x TJ) times not (Z x C x TJ) times?
  2. In the "1st Step: Workload Assignment" portion of section 5.4, "... mapping an application with n kernels to num accs suffers...", here mentions the number of accelerators at the first time, is this a user provided input parameter, or the output from CDSE?
  3. Also, in 1st step, it's better to give more explanation about how to reduce the time complexity as C(n-1, num-1). How does this function come?

Then, as for the code from the GitHub:

  1. In the input.cfg file, KRL_TYPE can only be 0 or 1, however, in src_gen/AIE_ArrGen/gen_graph.sh, line 89 and src_gen/Kernel_Gen/gen_grah.sh, line 73, it will check if kernel_type == "int32". If kernel type can only be either 1 or 0, why need to check if it is equal to int32?
  2. After running code generation, what is the purpose of these three files: mm_graph_x3_type1.h, mm_graph_x3_type0.h, mm_graph_x3_col.h? They are not included in the top function or anywhere else.
  3. Looks like there is no data set for testing, could you provide some data set for a demo?
  4. Our platform is VCK5000, however, when we compile the project by following the instructions, we will get some errors. We'd like to double-check that if we need more specific modifications or instructions to run the project on VCK5000?
  5. After searching in the CHARM repo, we only found the CACG (code generation) part of CHARM framework, but not the design search parts (e.g., CDSE, CDAC and CRTS). Could you please point us to the location of the source code for these parts? Additionally, how could we get the parameters in input.cfg? How to generate different accs for different MM? How to generate accs for non-MM functions? For example, in the example/BERT, there are files for different sizes, like mm_graph_large.h, mm_graph_small.h, dma_large.h, but according to the sources in src_gen, there is no sh file to generate a file which ends with _large.h or _small.h. Would be great if you can shed some light on these parts.

Thank you very much in advance and looking forward to your reply!

About pipeline view in paper DAC'23

Dear ARC Lab,
Thanks for releasing the code open-source and the nice CHARM project. I tried replicating the pipeline view in paper DAC'23. Could you please describe how to config the parameters such as A_BRO and C_BRO to generate a 1*4 AIE array? Moreover, could you please tell me how to generate aiesim data corresponding to the customized AIE array?
Thanks !

wrong output of aie simulation of int8 mm_kernel0

Dear Author,

I am currently practicing small-scale matrix multiplication using a single AIE tile. I have tried mm_kernel0.cc for various data types in the src path of your project (as mm_kernel0 includes two inputs and one output, theoretically it should be the basic unit capable of performing matrix multiplication).

I created separate projects for each data type of mm_kernel0.cc to conduct AIE Simulation. Moreover, I generated the test data and Golden data according to the required data type and size by using Python code.

According to my AIE simulator results, the simulation data for int16, int32, and fp32 data types of mm_kernel0.cc are all correct. However, the int8 data type alone does not yield the correct simulation results, which I find puzzling. For convenience, I have attached my test project for the int8 type. In this project, the input matrices used for testing are two identical 01 diagonal matrices. I transposed them for AIE input (as you know, AIE matrix multiplication requires column-wise storage of matrices). You can directly see the two transposed input matrices, along with the golden result, and the AIE output I saved, in the data/ directory. You'll notice that these two results are completely different. If you have time, you can recompile the simulation to get my results by running python int8_test_data_gen.py && make all. You can also check the other input matrix if modifying the Python code.

I don't think there is any issue with my testing process, as I can obtain correct output results for the other three data types using the same process, with only the int8 output being incorrect. I would like to know if there are any additional conditions to be aware of to obtain the correct results for int8 type mm_kernel0, or if there is something missing or incorrect in my process?

Thank you for your guidance.

mm_int8.zip

ERROR: [v++ 60-602] Source file does not exist: hw.xclbin

Hi,

I am facing a problem when running command
python project_setup.py
I got the error:
ERROR: [v++ 60-602] Source file does not exist: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
I'm wondering if this has something to do with the vitis 2022.2 version I'm using, or a misconfiguration somewhere

The complete log is as follows:

`mkdir -p ./build_dir.hw.xilinx_vck190_base_202220_1
v++ -l -t hw --platform /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm --save-temps --optimize 2 --hls.jobs 8 --config ./conn.cfg --clock.defaultFreqHz 220000000 --temp_dir ./build_dir.hw.xilinx_vck190_base_202220_1 --vivado.synth.jobs 8 --vivado.impl.jobs 8 -o'build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin' _x.hw.xilinx_vck190_base_202220_1/dma.xo libadf.a | tee ./build_dir.hw.xilinx_vck190_base_202220_1/hpc_xclbin.log
Option Map File Used: '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/data/vitis/vpp/optMap.xml'

****** v++ v2022.2 (64-bit)
**** SW Build 3671529 on 2022-10-13-17:52:11
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/hw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/hw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/sw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
objcopy: warning: /home/[myusername]/xxxx/CHARM/prj_try/.Xil/v++-37415-[myusername]/sw.o: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
INFO: [v++ 60-1306] Additional information associated with this v++ link can be found at:
Reports: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/reports/link
Log files: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/logs/link
Running Dispatch Server on port: 37909
INFO: [v++ 60-1548] Creating build summary session with primary output /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin.link_summary, at Fri Jul 21 16:59:57 2023
INFO: [v++ 60-1315] Creating rulecheck session with output '/home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/reports/link/v++_link_hw_guidance.html', at Fri Jul 21 16:59:57 2023
INFO: [v++ 60-895] Target platform: /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm
INFO: [v++ 60-1578] This platform contains Xilinx Shell Archive '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/hw/hw.xsa'
ERROR: [v++ 82-4223] Output file type of .xsa is required. A different output file type has been specified: build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
INFO: [v++ 60-1653] Closing dispatch client.
v++ -p -t hw -f /s3/Xilinx_Vitis_2022.2/Vitis/2022.2/base_platforms/xilinx_vck190_base_202220_1/xilinx_vck190_base_202220_1.xpfm
--package.out_dir ./package.hw
--package.rootfs /opt/xilinx-versal-common-v2022.2/rootfs.ext4
--package.kernel_image /opt/xilinx-versal-common-v2022.2/Image
--package.boot_mode=sd
--package.image_format=ext4
--package.defer_aie_run
--package.sd_file hostexe
libadf.a ./build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin -o mm_hw.xclbin
Option Map File Used: '/s3/Xilinx_Vitis_2022.2/Vitis/2022.2/data/vitis/vpp/optMap.xml'

****** v++ v2022.2 (64-bit)
**** SW Build 3671529 on 2022-10-13-17:52:11
** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

ERROR: [v++ 60-602] Source file does not exist: /home/[myusername]/xxxx/CHARM/prj_try/build_dir.hw.xilinx_vck190_base_202220_1/hw.xclbin
INFO: [v++ 60-1662] Stopping dispatch session having empty uuid.
INFO: [v++ 60-1653] Closing dispatch client.
make: *** [Makefile:185: package_hw] Error 1`

Thanks
Yiou

non-GEMM operations

Hi,

In the paper, there are evaluations on non-GEMM operations such as softmax, layernorm, etc. However, I think the current release does not include them. I am wondering if you have an implementation or any references.

About MM size in paper and demo

Could you please describe what is your input size of the original model, especially for ViT and BERT.
And how do you organize the matrices to form the MxK and KxN matrices as shown in Table 5.
image

How to profile DDR bandwidth?

Hi,
I have seen there is a Bandwidth Profiler before DSE in fig6 of CHARM paper.
And I want to reproduce it on my vck5000 since the it has 4 DRAM bank.
But in cdse.py, the bandwidth part is fixed, how did you profile the bandwidth of the FPGA?

# One-Time Profling of DDR Bandwidth
    BW_L_S=(12*DDR_BANK)*freq_rate
    BW_R_S=(12*DDR_BANK)*freq_rate
    BW_O_S=(8.5*DDR_BANK)*freq_rate

    BW_L_DR=(8*DDR_BANK)*freq_rate
    BW_R_DL=(8*DDR_BANK)*freq_rate

    BW_L_DO=(8*DDR_BANK)*freq_rate
    BW_R_DO=(8*DDR_BANK)*freq_rate
    BW_O_D=(6*DDR_BANK)*freq_rate

    BW_L_T=(7*DDR_BANK)*freq_rate
    BW_R_T=(7*DDR_BANK)*freq_rate
    BW_O_T=(7*DDR_BANK)*freq_rate

Thank you.

[AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_0 exceeds

Dear ARC Lab,
Thanks for releasing the code open-source and the nice work. I tried replicating the flow as per the instructions and get the following error on executing make all PLATFORM=${PATH} EDGE_COMMON_SW_PATH=${PATH} SYSROOT_PATH={PATH}
Just checking if there is any quick hint/resolution that you are aware of?

Regards,
Rajesh

Total Number of unique Switch FIFOs: 0
Running AIE Post-Map Finalizer.
Post-Map Finalizer succeeded.
Ordered merge post process: Adding BD config broadcast nets
Running AIE Post-Map Finalizer.
Post-Map Finalizer succeeded.
Check AIE-ROUTER has run: 12 errors
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_0 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_4 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_PL_TO_AIE_4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_SWITCH_S_SOUTH_CH0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B3_CORE_X24Y0/AIE_SWITCH_S_SOUTH_CH4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_0 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_4 exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_PL_TO_AIE_4_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_SWITCH_S_SOUTH_CH0_PIN exceeds limit of 100.
ERROR: [AIE-ROUTER-3] Demand of 200 at node AIE_INTF_B0_CORE_X25Y0/AIE_SWITCH_S_SOUTH_CH4_PIN exceeds limit of 100.
HDRTPinDelayHelperV10:: Clearing pin delay helper
Releasing pin delay helper for floorplan 0x6661ea70
HDRTPinDelayHelperV10:: Releasing Pin Dly Helper
NodeGraph Released.
DeviceData Released.
ERROR:MathEngineUDMRouter:###UDM Router Did NOT Finish Successfully
/tools/Xilinx/Vitis/2022.2/aietools/bin/aieir_be: line 98: kill: (-871611) - No such process
Compilation Failed
INFO: [aiecompiler 77-5805] Run completed. Additional information can be found in:
Guidance: ./Work/reports/guidance.html

INFO: [aiecompiler 77-5806] Use the vitis_analyzer tool to visualize and navigate the relevant reports. Run the following command.
vitis_analyzer ./Work/mm_top.aiecompile_summary
/tools/Xilinx/Vitis/2022.2/aietools/bin/aiecompiler: line 83: kill: (-871297) - No such process

Vkc190 Simulator?

Hi @JinmingZhuang
Great work! I am working in machine learning. But I'm a beginner to FPGA and learning about machine learning implementation on FPGA. Your work seems latest and a good start point. Since FPGA boards are not easily available to everyone, more specifically vkc190. For learning purpose, could you also make a tutorial or show some pointers for beginners like me, how to use run your code or use it in a xilinx simulator in linux/ubuntu? I'm also not sure if there is any simulator like that. Please let me know.

Thank you!
Srikanth

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.