文章摘要
周叔欣* ** ***,张见齐****,王焕东****,章隆兵* ** ***.基于行内局部性的内存控制器端预取[J].高技术通讯(中文),2024,34(3):248~255
基于行内局部性的内存控制器端预取
Memory controller-side prefetching based on intra-row locality
  
DOI:10. 3772 / j. issn. 1002-0470. 2024. 03. 003
中文关键词: 内存控制器; 预取; 局部性
英文关键词: memory controller, prefetch, locality
基金项目:
作者单位
周叔欣* ** *** (*处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190) (**中国科学院计算技术研究所北京 100190) (***中国科学院大学北京 100049) (****龙芯中科技术股份有限公司北京 100190) 
张见齐****  
王焕东****  
章隆兵* ** ***  
摘要点击次数: 603
全文下载次数: 493
中文摘要:
      本文提出一种基于行内局部性的内存控制器端预取。采用位图的数据结构记录行内每个数据块的状态;并且对每一行进行区域划分,量化每个区域的访问局部性;根据区域内的局部性高低决定预取的激进程度。对于局部性较低的区域,预取区域内未被访问过的数据块;对于局部性较高的区域,同时采用跨区域的预取。通过动态调整区域规模的大小来适应局部性程度的变化。上述预取方法在龙芯3A6000处理器上实现并评测,评测程序采用SPEC CPU2006访存密集型应用。评测结果显示本文的预取方法将每周期指令数(IPC)平均提升6.51%,将单线程IPC最高提升46.80%(bwaves),将双核四线程IPC最高提升26.22%(lbm)。
英文摘要:
      This paper proposes a memory controller-side prefetching based on intra-row locality. The data structure of the bitmap is used to record the state of each data block in the row. And each row is divided into regions, and the access locality of each region is quantified. The aggressiveness of prefetching depends on the locality in the region. For areas with low locality, unaccessed data blocks in the area will be prefetched, and for areas with high locality, cross-area prefetch will be adopted at the same time. It adapts to changes in the degree of locality by dynamically adjusting the size of the region scale. The above prefetching method is implemented and evaluated on the Loongson 3A6000 processor using SPEC CPU2006 memory-intensive applications. The evaluation results show that the prefetching method in this paper improves the instruction per clock cycle (IPC) by 6.51% on average (up to 46.80% for single-thread, up to 26.22% for dual-core four-thread).
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮