文章摘要
丁峰,李曦.针对深度学习中不规则内存访问的高吞吐内存管理单元[J].高技术通讯(中文),2024,34(7):714~725
针对深度学习中不规则内存访问的高吞吐内存管理单元
HTMMU:a memory management unit for irregular memory access in deep learning
  
DOI:10. 3772 / j. issn. 1002-0470. 2024. 07. 005
中文关键词: 内存管理单元(MMU); 地址转换; 不规则访存; 深度学习; 高吞吐
英文关键词: memory management unit (MMU), address translation, irregular memory access, deep learning, high-throughput
基金项目:
作者单位
丁峰 (中国科学技术大学计算机科学与技术学院合肥 230026) 
李曦  
摘要点击次数: 319
全文下载次数: 319
中文摘要:
      人工智能应用的多样化与复杂化导致了算法模型的不规则内存访问,即集中突发的访问请求与稀疏的访问地址,从而给智能应用在内存资源严格受限的移动端设备的部署带来了挑战。这种不规则的内存访问导致了现有架构中内存管理单元(MMU)的地址转换面临低吞吐和长延迟的问题,使其成为系统访存通路的瓶颈。针对上述问题,本文提出了一种新的高吞吐MMU架构方案(HTMMU),通过多流并行,加强冗余请求的过滤,合理地分配有限的片上存储资源等手段,从而能高吞吐、低延迟地处理不规则访问的地址转换,提升系统访存效率。实验结果表明,在处理人工智能算法内突发的稀疏访存时,相较于当前主流MMU设计方案,HTMMU平均获得了2.43倍的性能提升,而平均访问延迟降低为原先的34.1%,同时将额外面积开销控制在3.0%以内。
英文摘要:
      The diversification and complexity of artificial intelligence applications lead to irregular memory access pattern. The irregular memory access pattern can be defined as bursty and sparse memory access requests, which brings great challenges to the deployment of intelligent applications on mobile devices with strictly limited memory resources. This irregular memory access pattern has caused the memory management unit (MMU) in existing architectures to face the problems of low throughput and long latency, making it a bottleneck of the system. To solve this problem, this paper proposes a novel MMU architecture called high-throughput MMU (HTMMU). HTMMU uses multi-stream parallelism, enhances filtering of redundant requests and allocates limited on-chip memory more reasonably to improve system memory access efficiency. Experimental results show that when dealing with the irregular memory accesses in artificial intelligence algorithms, compared with the current MMU design, HTMMU achieves 2.43 times speedup averagely, and reduces the average latency by 65.9% with less than 3.0% area overhead.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮