SHAN Rui (山 蕊),LI Xiaoshuo,GAO Xu,HUO Ziqing.[J].高技术通讯(英文),2024,30(2):211~220 |
|
Design and implementation of dual-mode configurable memory architecture for CNN accelerator |
|
DOI:10. 3772 / j. issn. 1006-6748. 2024. 02. 012 |
中文关键词: |
英文关键词: distributed memory structure , neural network accelerator , reconfigurable array processor ,configurable memory structure |
基金项目: |
Author Name | Affiliation | SHAN Rui (山 蕊) | ( School of Electronic Engineering , Xin University of Posts and Telecommunications , Xi ’an 710121 , P. R. China) | LI Xiaoshuo | | GAO Xu | | HUO Ziqing | |
|
Hits: 464 |
Download times: 381 |
中文摘要: |
|
英文摘要: |
With the rapid development of deep learning algorithms , the computational complexity and functional diversity are increasing rapidly. However , the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse. Analyzing the algorithmic characteristics of convolutional neural network (CNN) , it is found that the access characteristics of convolution (CONV) and fully connected (FC) operations are very different. Based on this feature , a dual-mode reconfigurable distributed memory architecture for CNN accelerator is designed. It can be configured in Bank mode or first input first output (FIFO) mode to accommodate the access needs of different operations. At the same time , a programmable memory control unit is designed , which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay. The proposed architecture is verified and tested by parallel implementation of some CNN algorithms. The experimental results show that the peak bandwidth can reach 13 . 44 GB · s - 1 at an operating frequency of 120 MHz. This work can achieve 1 . 40 , 1 . 12 , 2 . 80 and 4 . 70 times the peak bandwidth compared with the existing work. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|