Design of a clustered data-driven array processor for computer vision

Shan Rui（山蕊）*; Deng Junyong*; Jiang Lin**; Zhu Yun*; Wu Haoyue*; He Feilong*

文章摘要

Shan Rui（山蕊）*,Deng Junyong*,Jiang Lin**,Zhu Yun*,Wu Haoyue*,He Feilong*.[J].高技术通讯(英文),2020,26(4):424~434

Design of a clustered data-driven array processor for computer vision

DOI：10.3772/j.issn.1006-6748.2020.04.010

中文关键词:

英文关键词: array processor, data-driven, adjacent interconnection, distributed memory, computer vision (CV)

基金项目:

Author Name	Affiliation
Shan Rui（山蕊）*	(*School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
Deng Junyong*	(*School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
Jiang Lin**	(**Integrated Circuit Laboratory, Xi’an University of Science and Technology, Xi’an 710054, P.R.China)
Zhu Yun*	(*School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
Wu Haoyue*	(*School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)
He Feilong*	(*School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China)

Hits: 3632

Download times: 3600

中文摘要:

英文摘要:

Computer vision (CV) is widely expected to be the next big thing in emerging applications. So many heterogeneous architectures for computer vision emerge. However, plenty of data need to be transferred between different structures for heterogeneous architecture. The long data transfer delay becomes the mainly problem to limit the processing speed for computer vision applications. For reducing data transfer delay and fasting computer vision applications, a clustered data-driven array processor is proposed. A three-level pipelining processing element is designed which supports two-buffer data flow interface and 8 bits, 16 bits, 32 bits subtext parallel computation. At the same time, for accelerating transcendental function computation, a four-way shared pipelining transcendental function accelerator is designed, which is based on Y-intercept adjusted piecewise linear segment algorithm. A distributed shared memory structure based on unified addressing is also employed. To verify efficiency of architecture, some image processing algorithms are implemented on proposed architecture. Simultaneously the proposed architecture has been implemented on Xilinx ZC 706 development board. The same circuitry has been synthesized using SMIC 130nm CMOS technology. The circuitry is able to run at 100MHz. Area is 26.58mm2.

View Full Text View/Add Comment Download reader