| Liu Dongyue (刘东岳)*,Jiang Lin**,Wang Mei* **,Li Yuancheng***,Hao Juan****.[J].高技术通讯(英文),2026,32(1):39~48 |
|
| Adaptive implementation of multi-branch convolution with fusion coefficients based on reconfigurable array |
| |
| DOI:10. 3772 / j. issn. 1006-6748. 2026. 01. 005 |
| 中文关键词: |
| 英文关键词: reconfigurable array processor, structural re-parameterization, model compression, fusion coefficients, edge-side inference acceleration, hardware-software co-optimization |
| 基金项目: |
| Author Name | Affiliation | | Liu Dongyue (刘东岳)* | (* School of Electrical and Control Engineering, Xi’an University of Science and Technology, Xi’an 710699, P. R. China)
(** School of Information Science and Technology, Northwest University, Xi’an 710069, P. R. China)
(*** College of Artificial Intelligence and Computer Science, Xi’an University of Science and Technology, Xi’an 710699, P. R. China)
(**** College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710699, P. R. China) | | Jiang Lin** | | | Wang Mei* ** | | | Li Yuancheng*** | | | Hao Juan**** | |
|
| Hits: 20 |
| Download times: 24 |
| 中文摘要: |
| |
| 英文摘要: |
| Reconfigurable array architecture has become an important hardware platform for edge-side deployment of convolutional neural networks due to their high parallelism and flexible programmability.However, traditional multi-branch convolutional networks suffer from computational redundancy,high memory access overhead, and inefficient branch fusion. Therefore, this paper proposes an adaptive multi-branch convolutional module (AMBC) that integrates software-hardware co-optimization. During training, the learnable fusion coefficients are introduced to enable adaptive fusion of multi-scale features, while in the inference phase, the multiple branches and their normalization parameters are merged with the fusion coefficients into a single 3 × 3 convolutional kernel through operator fusion. On the SIREA-288 reconfigurable platform, compared with unoptimized multi-branch networks, the proposed AMBC reduces external memory accesses by 47. 91% and inference latency by 47. 20% , achieving a 1. 90 × speedup. This approach maximizes the utilization of the reconfigurable logic while minimizing both reconfiguration and data-movement overheads in edge inference. |
|
View Full Text
View/Add Comment Download reader |
| Close |
|
|
|