连家诚* ** ***,郝一帆**,张曦珊** ***,支天**,孙广中*.Lite-IJformer:面向长序列Transformer的轻量化方法[J].高技术通讯(中文),2025,35(2):167~174 |
Lite-IJformer:面向长序列Transformer的轻量化方法 |
Lite-IJformer: liteweight method for long sequence Transformers |
|
DOI:10. 3772 / j. issn. 1002-0470. 2025. 02. 006 |
中文关键词: Transformer; 自注意力; 线性化方法; 降维 |
英文关键词: Transformer, self-attention, linearization method, dimension reduction |
基金项目: |
作者 | 单位 | 连家诚* ** *** | (*中国科学技术大学计算机科学与技术学院合肥 230026)
(**中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190)
(***中科寒武纪科技股份有限公司北京 100191) | 郝一帆** | | 张曦珊** *** | | 支天** | | 孙广中* | |
|
摘要点击次数: 7 |
全文下载次数: 7 |
中文摘要: |
针对面向长序列的Transformer计算复杂度高的问题,本文提出了一种Transformer轻量化方法Lite-IJformer。其核心思路分为2步:(1)对自注意力(self-attention)进行线性化,将Transformer的计算复杂度从输入序列的平方降至线性;(2)基于低秩矩阵分解理论对KV矩阵乘法进行降维,进一步减少计算规模。在长序列竞技基准测试上的实验表明,当输入长度为1000~2000时,线性化可以将self-attention计算量降低13~26倍,将Transformer的推理速度提升4.75~5.72倍而无精度损失;在经过降维后,self-attention的计算量进一步减少了17.0%,模型推理速度提升了1.17倍,精度损失在0.5%以内。 |
英文摘要: |
Aiming at the high computational complexity of long sequence Transformers, this paper proposes a lightweight method called Lite-IJformer. The core idea of the proposed method consists of two steps: (1) linearize the self-attention to reduce its computation complexity from quardratic to linear; (2) based on the low-rank matrix decomposition theory, reduce the dimension of KV matrix multiplication to further reduce the calculation scale. Experiments on long range arena (LRA) benchmark show that when the length of input sequences is 1000-2000, linearization can reduce the computational amount of self-attention by 13-26 times, and improve the inference speed by 4.75-5.72 times without precision loss. After dimension reduction, the computational amount of self-attention is further reduced by 17.0%, and the inference speed of model is increased by 1.17 times, with a precision loss within 0.5%. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |