| 马丽娜* ** ***,严龙*,曹华伟*,梁彦* **,叶笑春*,范东睿* **.面向边缘异构算力的高通量视频分析优化研究[J].高技术通讯(中文),2026,36(1):53~66 |
| 面向边缘异构算力的高通量视频分析优化研究 |
| High-throughput video processing based on heterogeneous computing nodes |
| |
| DOI:10. 3772 / j. issn. 1002 - 0470. 2026. 01. 005 |
| 中文关键词: 高通量计算; 视频处理; 边缘计算; 异构硬件; 解码加速; 推理加速 |
| 英文关键词: high-throughput computing, video processing, edge computing, heterogeneous hardware, decoding acceleration, inference acceleration |
| 基金项目: |
| 作者 | 单位 | | 马丽娜* ** *** | (*处理器芯片全国重点实验室(中国科学院计算技术研究所)北京 100190)
(**中国科学院大学北京 100049)
(***北京睿芯高通量科技有限公司北京 100090) | | 严龙* | | | 曹华伟* | | | 梁彦* ** | | | 叶笑春* | | | 范东睿* ** | |
|
| 摘要点击次数: 25 |
| 全文下载次数: 14 |
| 中文摘要: |
| 大数据时代,视频数据占据了数据流量的82%以上,是名副其实的大数据。如何快速有效地从视频数据中获取价值信息以支持视频驱动的信息服务系统具有十分重要的价值。为了提高视频数据的并发处理能力、降低带宽成本,当前视频分析系统通常部署在靠近数据源头的边缘计算中心,依靠集成异构硬件的边缘计算处理方式来提高处理效果,但相关工作未能充分发挥异构加速芯片的能力。本文针对上述问题,提出了面向异构硬件加速设备的高通量视频分析方法。通过采用解码优化策略和多发射异步执行策略,该方法能够充分利用异构芯片资源,实现了单芯片解码速度提升1.49倍,推理速度提升1.44倍。此外,本文提出的优化策略确保了良好的线性扩展性。在一个由12颗解码芯片和18颗推理芯片组成的有限算力的边缘异构平台上,分别实现了17.71倍解码加速、25.52倍推理加速以及33.22倍的视频内容分析全流程加速效果。 |
| 英文摘要: |
| In the era of big data, video accounts for a significant proportion of up to 82%, which is a veritable big data. Consequently, extracting value information from video data quickly and effectively to support video-driven information service systems is of great value. In order to improve the processing capability of video data, existing video analysis systems usually rely on edge computing processing methods via integrating heterogeneous hardware to improve the processing capabilities. However, these approaches often fail to fully utilize the potential of heterogeneous acceleration chips. To address these challenges, this paper proposes a high-throughput video acceleration analysis method for heterogeneous hardware acceleration devices. By employing a decoding optimization strategy and a multi-emission asynchronous execution optimization strategy, this method maximizes the utilization of heterogeneous chip resources. The proposed approach achieved decoding acceleration of 1.49 times and an inference acceleration of 1.44 times on a single chip. Moreover, the optimization strategy ensures linear expansion capability, and can achieve 17.71 times of decoding acceleration and 25.52 times of the inference acceleration on edge heterogeneous platforms with limited computing power. Specifically, these platforms consist of 12 decoding chips and 18 inference chips.Finally, the whole video analysis processing is accelerated by 33.22 times on the limited platform. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |
|
|
|