| 兰彦志* **,李欣宇* **,牛根***,曾露***,张福新* **.基于定制化中间表示和向量化汇编的动态翻译优化方法[J].高技术通讯(中文),2026,36(4):354~363 |
| 基于定制化中间表示和向量化汇编的动态翻译优化方法 |
| Optimization method for dynamic translation based on customized intermediate representation and vectorized assembly |
| |
| DOI:10. 3772 / j. issn. 1002 - 0470. 2026. 04. 003 |
| 中文关键词: 二进制翻译;翻译时间;中间表示;向量化;龙架构 |
| 英文关键词: binary translation, translation time, intermediate representation, vectorization, LoongArch |
| 基金项目: |
| 作者 | 单位 | | 兰彦志* ** | (*处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190)
(**中国科学院大学北京 100049)
(***龙芯中科技术有限公司北京 100190) | | 李欣宇* ** | | | 牛根*** | | | 曾露*** | | | 张福新* ** | |
|
| 摘要点击次数: 33 |
| 全文下载次数: 27 |
| 中文摘要: |
| 二进制翻译技术是实现跨指令集软件兼容的重要手段,广泛应用于软件迁移、硬件模拟和调试验证等领域。传统研究主要聚焦于提升翻译后代码质量,对翻译过程性能的优化关注较少。然而,随着即时编译技术的普及以及客户程序代码规模的不断增长,翻译过程的性能瓶颈愈发显著。为应对这一挑战,本文深入分析了现有二进制翻译系统的翻译特点,设计了一种定制化的中间表示,显著加快了翻译速度。此外,针对翻译过程中的汇编环节,提出了一种基于向量化的汇编优化方法,进一步缩短了翻译时间。在MIPS(microprocessor without interlocked pipeline stages)平台上的测试表明,使用定制化中间表示并结合向量化汇编优化后,SPEC CPU 2000测试基准的翻译速度达到QEMU(quick emulator)的4.3倍,Octane 2.0测试基准的整体翻译速度提升至QEMU的4.1倍,验证了所提方法的有效性。 |
| 英文摘要: |
| Binary translation is a critical technology for achieving cross-instruction set architecture software compatibility, with widespread applications in software migration, hardware emulation, and debugging. Traditional research primarily focuses on improving the quality of translated code, with relatively little attention given to optimizing the translation process itself. However, with the increasing prevalence of just-in-time compilation and the growing complexity of client program code, performance bottlenecks in the translation process have become increasingly evident. To address this challenge, this paper conducts an in-depth analysis of the translation characteristics of existing binary translation systems and designs a customized intermediate representation to significantly accelerate the translation process. Additionally, a vectorization-based assembly optimization method is proposed to further reduce translation time. Experiments on the MIPS(microprocessor without interlocked pipeline stages) platform demonstrate that, with the proposed customized IR and vectorized assembly optimization, the translation speed of our optimized translator reaches 4.3 times that of QEMU(quick emulator) on the SPEC CPU2000 benchmark, and 4.1 times that of QEMU on the Octane 2.0 benchmark. These results validate the effectiveness of the proposed methods. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |
|
|
|