基于软硬件协同加速的关系网络推理优化方法

张志超* ** *** ****; 王剑* ** ***; 章隆兵* ** ***; 肖俊华* ** ***

文章摘要

张志超* ** *** ****,王剑* ** ***,章隆兵* ** ***,肖俊华* ** ***.基于软硬件协同加速的关系网络推理优化方法[J].高技术通讯(中文),2022,32(4):327~336

基于软硬件协同加速的关系网络推理优化方法

Relation network inference optimization method based on software and hardware co-acceleration

DOI：

中文关键词: 关系网络；软硬件协同加速；卷积神经网络；异构多核

英文关键词: relation network, software and hardware co-acceleration, convolutional neural network, heterogeneous multi-core

基金项目:

作者	单位
张志超* * ****
王剑* *
章隆兵* *
肖俊华* *

摘要点击次数: 2509

全文下载次数: 1582

中文摘要:

针对数据中心基于图形处理器（GPU）平台的关系网络推理计算中存在的低效能问题，本文提出了一种基于软硬件协同加速的关系网络优化方法。该方法采用基于GPU提取的支持集特征池与现场可编程门阵列（FPGA）推理异构协同的方式处理关系网络的推理计算，在高效能计算的同时保持关系网络的推理计算与GPU平台一致的准确率。利用基于高级综合（HLS）优化浮点卷积神经网络的计算方式，提高关系网络的处理能效。利用多运算单元异构多核处理的方式，满足FPGA时序收敛的同时，提升FPGA片上吞吐能力。本文在FPGA平台上实现了关系网络推理运算单元，在Omniglot数据集上构建的加速器功耗为15.867W，相对于GPU加速比为1.4~17.2；在miniImageNet数据集上构建的加速器功耗为12.359W，相对于GPU加速比为1.5~3.4。本文方法与同类FPGA加速浮点卷积神经网络相比，达到了最优的计算效能。实验数据表明，该方法有效利用了软硬件协同计算以及FPGA可重构计算的优势，降低了软硬件协同开发的耦合度，在保持关系网络推理计算准确率的同时，提升了关系网络推理的计算效能。

英文摘要:

Aiming at the problem of low efficiency in relation network inference computing based on graphics processing unit (GPU) platform, this paper proposes a relation network optimization method based on software and hardware co-acceleration. In this method, the inference calculation of the relation network is processed by means of heterogeneous collaboration between the feature pool of support set extracted by GPU and the inference of field programmable gate array (FPGA). The inference calculation of the relation network and the GPU platform are maintained with the same accuracy while the calculation is efficient. The processing energy efficiency of the relation network is improved by using the high-level synthesis (HLS) optimized floating point convolutional neural network. The heterogeneous multi-core processing method of multiple computing units is used to satisfy the convergence of FPGA timing sequence and improve the throughput capacity of FPGA chip. In this paper, a relation network inference operation unit is implemented on FPGA platform. The power consumption of the accelerator built on Omniglot dataset is 15.867W, and the acceleration ratio relative to GPU is 1.4~17.2. The power consumption of the accelerator built on the miniImagenet dataset is 12.359W, and the acceleration ratio relative to the GPU is 1.5~3.4. Compared with similar FPGA accelerated floating-point convolutional neural networks, the proposed method achieves the optimal computational performance. The experimental data show that this method effectively utilizes the advantages of software and hardware collaborative computing and FPGA reconfigurable computation, reduces the coupling degree of software and hardware collaborative development, and improves the computational efficiency of relation network inference while maintaining the accuracy of relation network inference calculation.

查看全文查看/发表评论下载PDF阅读器

关闭