吴婧雅,卢文岩,鄢贵海,李晓维.基于FPGA的软硬件协同的多表哈希连接加速器[J].高技术通讯(中文),2023,33(11):1123~1135 |
基于FPGA的软硬件协同的多表哈希连接加速器 |
FPGA-based accelerator by software-hardware co-design for multi Hash join |
|
DOI:10. 3772/ j. issn. 1002-0470. 2023. 11. 001 |
中文关键词: 现场可编程门阵列(FPGA); 多表连接; 哈希连接; 软硬件协同 |
英文关键词: field programmable gate array(FPGA), multi-table join, Hash join, hardware-software co-design |
基金项目: |
作者 | 单位 | 吴婧雅 | (处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) | 卢文岩 | | 鄢贵海 | | 李晓维 | |
|
摘要点击次数: 809 |
全文下载次数: 630 |
中文摘要: |
多表连接操作难以实现硬件加速。一方面,多表连接请求中表的数目不确定且连接方式多变,这种灵活的计算请求与固定的硬件行为之间存在矛盾;另一方面,多表连接的中间结果随表的增加而扩充,数据结构的管理和维护也要求更高的硬件开销。为支持灵活高效的多表连接计算,本文提出一种软硬件协同的优化方法。软件部分,将多表连接抽象为正向和反向2种计算模式并支持不同方式的多表连接。硬件设计采用访存和计算协同优化的方法:设计一种规则的硬件哈希表结构以提高内存访存带宽;设计支持正反向计算的同构专用计算引擎,配置多数据通道和指令控制系统实现高效的并行运算,提升多表哈希连接的计算效率。实验结果表明,相比中央处理器(CPU)执行表连接操作,单计算引擎能够提升性能9.2~11.0倍。通过多路并行的技术,实现8路并行的多表哈希引擎,能够充分利用板卡片外(DDR)内存带宽,实现相比CPU超过71.1倍的性能提升。 |
英文摘要: |
It is hard to implement multi-table Hash join on hardware accelerators. On one hand, multi-table join has an indefinite number of tables and various connection modes. The flexibility in multi-table Hash join is in contradiction with fixed hardware architectures. On the other hand, the capacity of intermediate results expands with the number of tables increasing. The capability of data management and monitoring asks for higher hardware overhead. To enable flexible and efficient multi-table Hash join, a software-hardware co-optimization methodology is proposed. Software subsystem abstracts multi-table Hash join into forward and reverse computation modes, and agilely organizes Hash join processes. Additionally, the memory access and computing are collaboratively optimized in hardware design. A regular hardware Hash table is designed to improve memory bandwidth. Meanwhile, a homogeneous computing engine is designed to perform both forward and reverse computation. To further improve the efficiency of Hash join, multi data channels and an instruction control system are configured. The experiment results showed that a single computing engine could improve the performance of multi-table Hash join 9.2-11.0 times higher than contral processing unit (CPU). Furthermore, the 8-way parallel multi-table Hash join engines could make full use of DDR bandwidth resources and get 71.1 times performance of CPU. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|