文章目录
参考 https://www.bilibili.com/video/BV17J41197rt
hls数据类型
需要包含头文件,该头文件实现任意精度定点数。
#include "ap_fixed.h"
电路设计中,常常需要一个十位的数据类型,也就是ap_int<10>
下图左侧是c语言常规数据类型,右侧是hls中等价的数据类型
c语言数据类型 | hls数据类型 |
---|---|
char | ap_int<8> |
short | ap_int<16> |
int | ap_int<32> |
long | ap_int<64> |
新建hls工程
新建工程
给工程起名,选择存储路径,点击next
添加设计文件,并制定顶层函数。第一次建工程,肯定是无设计文件,那就啥都不要做,点击Next。
添加C语言仿真文件,没有就不添加了,完成后点击Next。
时钟周期(单位是ns)一般不要设置地太小,以免综合失败。配置Uncertainty,默认为空。
FPGA芯片选项栏,我选的pynq-Z2开发板,该开发板vivado不自带,需要从网上下载,然后把开发板文件放到指定文件夹。vivado 中手动添加 pynq类型板
点击finish,新建工程完成,其中source文件夹放设计文件,testbench放测试文件。可参考矩阵乘法示例。
另外,时钟周期是可以修改的
solution右键选择”Solution Setting“,点击下图2处,然后在3处就可以修改时钟周期了。
矩阵乘法示例
1.新建工程
文件结构
2.编码
matrix.h文件
#ifndef __MATRIX_MUL__
#include "ap_fixed.h"
void matrix_mul(ap_int<8> A[4][4],ap_int<8> B[4][4],ap_int<16> C[4][4]);
#endif
matrix.c文件
#include "matrix_mul.h"
void matrix_mul(ap_int<8> A[4][4],ap_int<8> B[4][4],ap_int<16> C[4][4])
{
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)//可以这样看:固定一行,循环列
{
C[i][j]=0;
for(int k=0;k<4;k++)
{
C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
}
}
}
testbench main.c文件
#include "matrix_mul.h"
#include <iostream>
int main()
{
ap_int<8> A[4][4];ap_int<8> B[4][4];ap_int<16> C[4][4];
for(int i=0;i<4;i++)
{
for(int j=0;j<4;j++)
{
A[i][j]=i*4+j;
B[i][j]=A[i][j];
}
}
matrix_mul(A,B,C);
for(int i=0;i<4;i++)
for(int j=0;j<4;j++)
std::cout<<"C["<<i<<","<<j<<"]"<<C[i][j]<<std::endl;
return 0;
}
3.编译
点击ctrl+B 编译,产生exe文件说明编译成功
编译结果:
22:02:50 **** Incremental Build of configuration Debug for project matrix ****
make all
Building file: D:/xxxx/hls/matrix/matrix/main.cpp
Invoking: GCC C++ Compiler
g++ -DAESL_TB -D__llvm__ -D__llvm__ -ID:/xxxx/hls/matrix -ID:/Xilinx201901/Vivado/2019.1/win64/tools/systemc/include -ID:/Xilinx201901/Vivado/2019.1/win64/tools/auto_cc/include -ID:/Xilinx201901/Vivado/2019.1/include/etc -ID:/Xilinx201901/Vivado/2019.1/include -ID:/Xilinx201901/Vivado/2019.1/include/ap_sysc -O0 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"testbench/main.d" -MT"testbench/main.o" -o "testbench/main.o" "D:/10GraduationProject/hls/matrix/matrix/main.cpp"
Finished building: D:/xxxx/hls/matrix/matrix/main.cpp
Building target: a.exe
Invoking: GCC C++ Linker
g++ -LD:/Xilinx201901/Vivado/2019.1/win64/tools/systemc/lib -o "a.exe" ./testbench/main.o ./source/matrix_mul.o
Finished building target: a.exe
22:02:53 Build Finished (took 3s.81ms)
4.Run C Simulation
点击Run C Simulation
点击OK
Simulation 结果:
Starting C synthesis ...
Starting C simulation ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/10GraduationProject/hls/matrix/matrix/solution1/csim.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Thu Mar 11 21:58:01 +0800 2021
INFO: [HLS 200-10] In directory 'D:/xxxx/hls/matrix'
Sourcing Tcl script 'D:/xxxx/hls/matrix/matrix/solution1/csim.tcl'
INFO: [HLS 200-10] Opening project 'D:/xxxx/hls/matrix/matrix'.
INFO: [HLS 200-10] Opening solution 'D:/xxxx/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [SIM 211-2] *************** CSIM start ***************
INFO: [SIM 211-4] CSIM will launch GCC as the compiler.
Compiling ../../../main.cpp in debug mode
Compiling ../../../matrix_mul.cpp in debug mode
Generating csim.exe
In file included from D:/Xilinx201901/Vivado/2019.1/include/floating_point_v7_0_bitacc_cmodel.h:143:0,
from D:/Xilinx201901/Vivado/2019.1/include/hls_fpo.h:186,
from D:/Xilinx201901/Vivado/2019.1/include/hls_half.h:44,
from D:/Xilinx201901/Vivado/2019.1/include/etc/ap_private.h:90,
from D:/Xilinx201901/Vivado/2019.1/include/ap_common.h:641,
from D:/Xilinx201901/Vivado/2019.1/include/ap_fixed.h:54,
from ../../../matrix_mul.h:2,
from ../../../main.cpp:1:
D:/Xilinx201901/Vivado/2019.1/include/gmp.h:62:0: warning: "__GMP_LIBGMP_DLL" redefined
#define __GMP_LIBGMP_DLL 0
In file included from D:/Xilinx201901/Vivado/2019.1/include/hls_fpo.h:186:0,
from D:/Xilinx201901/Vivado/2019.1/include/hls_half.h:44,
from D:/Xilinx201901/Vivado/2019.1/include/etc/ap_private.h:90,
from D:/Xilinx201901/Vivado/2019.1/include/ap_common.h:641,
from D:/Xilinx201901/Vivado/2019.1/include/ap_fixed.h:54,
from ../../../matrix_mul.h:2,
from ../../../main.cpp:1:
D:/Xilinx201901/Vivado/2019.1/include/floating_point_v7_0_bitacc_cmodel.h:135:0: note: this is the location of the previous definition
#define __GMP_LIBGMP_DLL 1
In file included from D:/Xilinx201901/Vivado/2019.1/include/floating_point_v7_0_bitacc_cmodel.h:143:0,
from D:/Xilinx201901/Vivado/2019.1/include/hls_fpo.h:186,
from D:/Xilinx201901/Vivado/2019.1/include/hls_half.h:44,
from D:/Xilinx201901/Vivado/2019.1/include/etc/ap_private.h:90,
from D:/Xilinx201901/Vivado/2019.1/include/ap_common.h:641,
from D:/Xilinx201901/Vivado/2019.1/include/ap_fixed.h:54,
from ../../../matrix_mul.h:2,
from ../../../matrix_mul.cpp:1:
D:/Xilinx201901/Vivado/2019.1/include/gmp.h:62:0: warning: "__GMP_LIBGMP_DLL" redefined
#define __GMP_LIBGMP_DLL 0
In file included from D:/Xilinx201901/Vivado/2019.1/include/hls_fpo.h:186:0,
from D:/Xilinx201901/Vivado/2019.1/include/hls_half.h:44,
from D:/Xilinx201901/Vivado/2019.1/include/etc/ap_private.h:90,
from D:/Xilinx201901/Vivado/2019.1/include/ap_common.h:641,
from D:/Xilinx201901/Vivado/2019.1/include/ap_fixed.h:54,
from ../../../matrix_mul.h:2,
from ../../../matrix_mul.cpp:1:
D:/Xilinx201901/Vivado/2019.1/include/floating_point_v7_0_bitacc_cmodel.h:135:0: note: this is the location of the previous definition
#define __GMP_LIBGMP_DLL 1
C[0,0]56
C[0,1]62
C[0,2]68
C[0,3]74
C[1,0]152
C[1,1]174
C[1,2]196
C[1,3]218
C[2,0]248
C[2,1]286
C[2,2]324
C[2,3]362
C[3,0]344
C[3,1]398
C[3,2]452
C[3,3]506
INFO: [SIM 211-1] CSim done with 0 errors.
INFO: [SIM 211-3] *************** CSIM finish ***************
Finished C simulation.
和预期结果一致。
5.C Synthesis
Synthesis把C语言模块转变为VHDL RTL,生成相应的硬件电路。
点击C Synthesis
第一次跑会报错,点击ok。
原因是没有指定顶层函数,如果matrix_mul.cpp中有很多函数,软件是不知道谁是顶层函数的,当然只有一个函数时也需要指定顶层函数。
选择顶层函数
选择1matrix,右键,选择2。
然后选择下图1,在2处选择顶层函数。
选择顶层函数,这里只有一个函数,选择matrix_mul(matrix_mul.cpp),点击ok。
再次点击C Synthesis
Synthesis结果:
Starting C synthesis ...
Starting C synthesis ...
D:/Xilinx201901/Vivado/2019.1/bin/vivado_hls.bat D:/xxxx/hls/matrix/matrix/solution1/csynth.tcl
INFO: [HLS 200-10] Running 'D:/Xilinx201901/Vivado/2019.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user 'summer' on host 'desktop-dck1feh' (Windows NT_amd64 version 6.2) on Thu Mar 11 22:40:08 +0800 2021
INFO: [HLS 200-10] In directory 'D:/xxxx/hls/matrix'
Sourcing Tcl script 'D:/10GraduationProject/hls/matrix/matrix/solution1/csynth.tcl'
INFO: [HLS 200-10] Opening project 'D:/xxxx/hls/matrix/matrix'.
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.cpp' to the project
INFO: [HLS 200-10] Adding design file 'matrix/matrix_mul.h' to the project
INFO: [HLS 200-10] Adding test bench file 'matrix/main.cpp' to the project
INFO: [HLS 200-10] Opening solution 'D:/xxxx/hls/matrix/matrix/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020-clg400-1'
INFO: [SYN 201-201] Setting up clock 'default' with a period of 30ns.
INFO: [SCHED 204-61] Option 'relax_ii_for_timing' is enabled, will increase II to preserve clock frequency constraints.
INFO: [HLS 200-10] Analyzing design file 'matrix/matrix_mul.cpp' ...
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:02 ; elapsed = 00:00:25 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:02 ; elapsed = 00:00:25 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:02 ; elapsed = 00:00:26 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:02 ; elapsed = 00:00:26 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:03 ; elapsed = 00:00:26 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-111] Finished Architecture Synthesis Time (s): cpu = 00:00:03 ; elapsed = 00:00:26 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'matrix_mul' ...
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-42] -- Implementing module 'matrix_mul'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Elapsed time: 26.189 seconds; current allocated memory: 100.559 MB.
INFO: [HLS 200-434] Only 0 loops out of a total 3 loops have been pipelined in this design.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Elapsed time: 0.106 seconds; current allocated memory: 100.851 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'matrix_mul'
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/A_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/B_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'matrix_mul/C_V' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on function 'matrix_mul' to 'ap_ctrl_hs'.
INFO: [SYN 201-210] Renamed object name 'matrix_mul_mac_muladd_8s_8s_16ns_16_1_1' to 'matrix_mul_mac_mubkb' due to the length limit 20
INFO: [RTGEN 206-100] Generating core module 'matrix_mul_mac_mubkb': 1 instance(s).
INFO: [RTGEN 206-100] Finished creating RTL model for 'matrix_mul'.
INFO: [HLS 200-111] Elapsed time: 0.132 seconds; current allocated memory: 101.159 MB.
INFO: [HLS 200-111] Finished generating all RTL models Time (s): cpu = 00:00:04 ; elapsed = 00:00:28 . Memory (MB): peak = 178.863 ; gain = 87.129
INFO: [VHDL 208-304] Generating VHDL RTL for matrix_mul.
INFO: [VLOG 209-307] Generating Verilog RTL for matrix_mul.
INFO: [HLS 200-112] Total elapsed time: 27.841 seconds; peak allocated memory: 101.159 MB.
Finished C synthesis.
Synthesis结果分析
Latency是两次矩阵乘法之间的延时,Interval是做一次矩阵乘法所用时间。
未使用流水线时,很容易理解Latency=Interval,一次矩阵乘法结束后才能进行下一次运算。
资源占用,这里资源占用来量很小。
端口信息,ABC数组默认存在存储器ap_memory中,以下ap_clk、ap_rst、ap_start、ap_done、ap_idle、ap_ready都是控制信号。
A_V_ce0,A_V_ce1这里表明A是双端口。
6.优化
该数组一共进行了4x4x4=64次乘法,但是从下图可知,接近三个周期才能完成一次乘法。于是需要进行优化。
见链接https://blog.csdn.net/weixin_40162095/article/details/114695910