| 李晨昊,李琳,邱强,张志斌,郭嘉丰,程学旗.推理时间约束的结构化剪枝[J].高技术通讯(中文),2026,36(4):331~339 |
| 推理时间约束的结构化剪枝 |
| Inference time constrained structured pruning |
| |
| DOI:10. 3772 / j. issn. 1002 - 0470. 2026. 04. 001 |
| 中文关键词: 模型剪枝; 模型压缩; 性能模型; 时间约束 |
| 英文关键词: model pruning, model compression, performance models, time constraints |
| 基金项目: |
| 作者 | 单位 | | 李晨昊 | (中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190)
(中国科学院大学北京 100049) | | 李琳 | | | 邱强 | | | 张志斌 | | | 郭嘉丰 | | | 程学旗 | |
|
| 摘要点击次数: 40 |
| 全文下载次数: 37 |
| 中文摘要: |
| 结构化剪枝将神经网络中整组权重捆绑删除来压缩和加速模型。大多数已有的剪枝方法将模型修剪到预定的稀疏度,即修剪固定比例的权重,而不是以优化推理时间为目标。然而,稀疏度与推理时间非线性对应,修剪到指定稀疏度的方法不能直接适配以推理时间为约束的应用场景。为了解决这个问题,提出一种新颖的推理时间约束的结构化剪枝方法(inference time constrained pruning,ITCP),可以自动搜索实现所需推理加速的剪枝方案,同时最小化模型精度损失。ITCP首先将推理时间约束的剪枝任务建模为一个约束优化问题,即在指定推理时间内,最大化性能分数,然后用动态规划算法高效求解。此外,还构建性能模型来快速估计不同稀疏度下模型的推理时间。在CIFAR 10、CIFAR 100和ImageNet数据集上的剪枝实验表明,ITCP在保证加速前提下,相比基线剪枝策略实现了更高的准确性。 |
| 英文摘要: |
| Structured pruning compresses and accelerates neural networks by removing groups of weights in a structured manner. Most existing pruning methods target a predefined sparsity level, i.e., pruning a fixed proportion of weights, rather than directly optimizing inference latency. However, the relationship between sparsity and inference time is highly nonlinear, making sparsity-targeted pruning methods unsuitable for deployment scenarios with explicit latency constraints. To address this issue, we propose a novel structured pruning method, termed inference time constrained pruning (ITCP), which automatically searches for a pruning scheme that satisfies a desired inference-time budget while minimizing accuracy degradation. Specifically, ITCP formulates latency-constrained pruning as a constrained optimization problem, where the objective is to maximize a performance score under a given inference-time constraint, and solves it efficiently using dynamic programming. In addition, a performance model is developed to rapidly estimate the inference time of models at different sparsity levels. Experimental results on CIFAR-10, CIFAR-100, and ImageNet demonstrate that, under the same acceleration requirements, ITCP consistently achieves higher accuracy than baseline pruning strategies. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |
|
|
|