文章摘要
Liu Dongyue (刘东岳)*,Jiang Lin**,Wang Mei* **,Li Yuancheng***,Hao Juan****.[J].高技术通讯(英文),2026,32(1):39~48
Adaptive implementation of multi-branch convolution with fusion coefficients based on reconfigurable array
  
DOI:10. 3772 / j. issn. 1006-6748. 2026. 01. 005
中文关键词: 
英文关键词: reconfigurable array processor, structural re-parameterization, model compression, fusion coefficients, edge-side inference acceleration, hardware-software co-optimization
基金项目:
Author NameAffiliation
Liu Dongyue (刘东岳)* (* School of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) (** School of Information Science and Technology, Northwest University, Xi’an 710069, P. R. China) (*** College of Artificial Intelligence and Computer Science, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) (**** College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) 
Jiang Lin**  
Wang Mei* **  
Li Yuancheng***  
Hao Juan****  
Hits: 20
Download times: 24
中文摘要:
      
英文摘要:
      Reconfigurable array architecture has become an important hardware platform for edge-side deployment of convolutional neural networks due to their high parallelism and flexible programmability.However, traditional multi-branch convolutional networks suffer from computational redundancy,high memory access overhead, and inefficient branch fusion. Therefore, this paper proposes an adaptive multi-branch convolutional module (AMBC) that integrates software-hardware co-optimization. During training, the learnable fusion coefficients are introduced to enable adaptive fusion of multi-scale features, while in the inference phase, the multiple branches and their normalization parameters are merged with the fusion coefficients into a single 3 × 3 convolutional kernel through operator fusion. On the SIREA-288 reconfigurable platform, compared with unoptimized multi-branch networks, the proposed AMBC reduces external memory accesses by 47. 91% and inference latency by 47. 20% , achieving a 1. 90 × speedup. This approach maximizes the utilization of the reconfigurable logic while minimizing both reconfiguration and data-movement overheads in edge inference.
View Full Text   View/Add Comment  Download reader
Close

分享按钮