Duan Bo (段勃),Wang Wendi,Tan Guangming,Meng Dan.[J].高技术通讯(英文),2014,20(4):333~345 |
|
Single-particle 3D reconstruction on specialized stream architecture and comparison with GPGPUs |
|
DOI:10.3772/j.issn.1006-6748.2014.04.001 |
中文关键词: |
英文关键词: Stream architecture, general purpose graphic processing unit (GPGPU), field programmable gate array (FPGA), cryo-EM |
基金项目: |
Author Name | Affiliation | Duan Bo (段勃) | | Wang Wendi | | Tan Guangming | | Meng Dan | |
|
Hits: 875 |
Download times: 655 |
中文摘要: |
|
英文摘要: |
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built. Due to the advances in heterogeneous architectures recently, there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design. This paper quantitatively analyzes the workload, computational intensity and memory performance of a single-particle 3D reconstruction application, called EMAN, and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations. Then it exploits the trend towards FPGA-based accelerator design, which is achieved by offloading computing-intensive kernels to dedicated hardware modules. Furthermore, a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns. This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU. The CUDA version on a GTX480 shows a speedup of about 6 times. The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times. Meanwhile, measured in terms of power efficiency, the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times, respectively. |
View Full Text
View/Add Comment Download reader |
Close |
|
|
|