基于块坐标下降法的外存异步图计算系统

赵程; 张志斌; 郭嘉丰; 刘丁玮

文章摘要

赵程,张志斌,郭嘉丰,刘丁玮.基于块坐标下降法的外存异步图计算系统[J].高技术通讯(中文),2022,32(8):825~835

基于块坐标下降法的外存异步图计算系统

Block coordinate descent based asynchronous out-of-core graph computing system

DOI：10.3772/j.issn.1002-0470.2022.08.005

中文关键词: 图计算系统；外存；异步并行计算；块坐标下降(BCD)；图划分

英文关键词: graph computing system, out-of-core, asynchronous parallel computing, block coordinate descent (BCD), graph partition

基金项目:

作者	单位
赵程	（中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190）（中国科学院大学北京 100049）
张志斌	（中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190）（中国科学院大学北京 100049）
郭嘉丰	（中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190）（中国科学院大学北京 100049）
刘丁玮	（中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190）（中国科学院大学北京 100049）

摘要点击次数: 4720

全文下载次数: 3559

中文摘要:

现有外存图计算系统中，外存I/O带宽不足成为性能瓶颈。使用整体并行模型将导致冗余计算，使用异步并行模型则将引入额外的优先级计算开销和负载不均衡。本文提出了基于块坐标下降法(BCD)的外存异步图计算系统(BCDG)，设计了一次选择多轮优先的调度策略，降低了优先级计算的平均开销；设计了基于优先级的块预取策略，解决了优先选择会破坏顺序执行流水线的问题；设计了计算调度分离的划分策略，实现了均衡地按边计算和按点调度。实验结果表明，相较于目前最先进的外存图计算系统GridGraph和Lumos，所提系统平均性能分别提升10.30倍与8.72倍。整体计算过程中，中央处理器（CPU）等待外存I/O的时间仅占10%~30%。

英文摘要:

For existing out-of-core graph computing systems, insufficient external memory I/O bandwidth is a performance bottleneck. Using a bulk synchronous parallel model will lead to redundant computation, while using an asynchronous parallel mode introduces additional priority computation overhead and load imbalance. In this paper, a block coordinate descent (BCD) based asynchronous out-of-core graph computing system (BCDG) is proposed. A scheduling strategy named one-selection multi-round priority is designed to reduce the average overhead of priority computation; a priority-based block prefetching strategy is designed to solve the problem that priority selection destroys the sequential execution pipeline; a partitioning strategy to separate computation and scheduling is designed to achieve balanced edge-by-edge computation and ensure vertex-by-vertex scheduling. The experimental results show that compared with two state-of-the-art out-of-core graph computing systems, GridGraph and Lumos, the proposed system outperforms them by 10.30 times and 8.72 times on average respectively. During the overall computation, the central processing unit (CPU) spends only 10%-30% of its time waiting for disk I/O.

查看全文查看/发表评论下载PDF阅读器

关闭