Adaptive implementation of multi-branch convolution with fusion coefficients based on reconfigurable array

Liu Dongyue (刘东岳)*; Jiang Lin**; Wang Mei* **; Li Yuancheng***; Hao Juan****

文章摘要

Liu Dongyue (刘东岳)*,Jiang Lin**,Wang Mei* **,Li Yuancheng***,Hao Juan****.[J].高技术通讯(英文),2026,32(1):39~48

Adaptive implementation of multi-branch convolution with fusion coefficients based on reconfigurable array

DOI：10. 3772 / j. issn. 1006-6748. 2026. 01. 005

中文关键词:

英文关键词: reconfigurable array processor, structural re-parameterization, model compression, fusion coefficients, edge-side inference acceleration, hardware-software co-optimization

基金项目:

Author Name	Affiliation
Liu Dongyue (刘东岳)*	(* School of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) ( School of Information Science and Technology, Northwest University, Xi’an 710069, P. R. China) (* College of Artificial Intelligence and Computer Science, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) (**** College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710699, P. R. China)
Jiang Lin**
Wang Mei* **
Li Yuancheng***
Hao Juan****

Hits: 20

Download times: 24

中文摘要:

英文摘要:

Reconfigurable array architecture has become an important hardware platform for edge-side deployment of convolutional neural networks due to their high parallelism and flexible programmability.However, traditional multi-branch convolutional networks suffer from computational redundancy,high memory access overhead, and inefficient branch fusion. Therefore, this paper proposes an adaptive multi-branch convolutional module (AMBC) that integrates software-hardware co-optimization. During training, the learnable fusion coefficients are introduced to enable adaptive fusion of multi-scale features, while in the inference phase, the multiple branches and their normalization parameters are merged with the fusion coefficients into a single 3 × 3 convolutional kernel through operator fusion. On the SIREA-288 reconfigurable platform, compared with unoptimized multi-branch networks, the proposed AMBC reduces external memory accesses by 47. 91% and inference latency by 47. 20% , achieving a 1. 90 × speedup. This approach maximizes the utilization of the reconfigurable logic while minimizing both reconfiguration and data-movement overheads in edge inference.

View Full Text View/Add Comment Download reader