基于片上系统的可配置卷积神经网络加速器的设计与实现

张立国; 杨红光; 金梅; 申前

文章摘要

张立国,杨红光,金梅,申前.基于片上系统的可配置卷积神经网络加速器的设计与实现[J].高技术通讯(中文),2024,34(7):744~754

基于片上系统的可配置卷积神经网络加速器的设计与实现

Design and implementation of configurable CNN accelerator based on SoC

DOI：10. 3772 / j. issn. 1002-0470. 2024. 07. 008

中文关键词: 卷积神经网络（CNN）；现场可编程门阵列（FPGA)； CNN加速器；可配置；异构加速

英文关键词: convolutional neural network（CNN）, field programmable gate array（FPGA）, CNN accelerator, configurable, isomerization acceleration

基金项目:

作者	单位
张立国	（燕山大学电气工程学院秦皇岛 066004）
杨红光
金梅
申前

摘要点击次数: 4062

全文下载次数: 2898

中文摘要:

针对现阶段卷积神经网络(CNN)加速器的设计只能部署在单一现场可编程门阵列（FPGA）平台、不支持硬件平台升级迭代的问题，设计了一种基于片上系统（SoC）的可配置CNN加速器。该加速器具备以下2个特点：（1）在电路设计中将数据位宽、中间缓存空间大小、乘法器阵列(MAC)并行度作为一种可选配置参数，通过调整资源使用量，使得该加速器能够适配不同FPGA硬件；（2）提出了动态数据复用的策略，通过对比数据传输过程中不同复用方式下的总参数量差异，动态地选择复用方法，以减少数据传输的等待时间，提高乘法器阵列利用率。该方案在ZCU104板卡上进行了实验，实验结果表明，当数据位宽选择8、乘法器阵列并行度选择1024、核心运算模块工作在180MHz时，卷积运算阵列峰值吞吐量为180GOPs，功耗为3.75W，能效比达到47.97GOPs·W-1，对于VGG16网络，其卷积层的平均乘法器阵列利用率达到84.37%。

英文摘要:

A configurable convolutional neural network（CNN） accelerator based on system of chip (SoC) is designed to address the issue that the current design of CNN accelerators can only be deployed within a single field programmable gate array（FPGA）and cannot be used across platforms. The accelerator has two characteristics. First, in the circuit design, data bit width, intermediate buffer space size, and multiply accumulate (MAC) array parallelism are optional configuration parameters. By adjusting the resource utilization, the accelerator can adapt to different FPGA hardware. Second, a dynamic data reuse strategy is proposed to reduce the waiting time for data transmission and improve the utilization of the MAC array by dynamically selecting the reuse method based on the difference in total parameter amounts between different reuse methods during data transmission. The scheme is tested on the ZCU104 board, and the experimental results show that when the data bit width is 8, the multiplier array parallelism is 1024, and the core operation module works at 180MHz, the peak throughput of the convolution operation array is 180GOPs, with a power consumption of 3.75W, and an energy efficiency ratio of 47.97GOPs·W-1. For the VGG16 network, the average MAC utilization rate of its convolutional layers reaches 84.37%.

查看全文查看/发表评论下载PDF阅读器

关闭