基于深度强化学习的MPTCP动态编码调度系统

廖彬彬* **; 张广兴*; 刁祖龙*; 谢高岗***

文章摘要

廖彬彬* **,张广兴*,刁祖龙*,谢高岗***.基于深度强化学习的MPTCP动态编码调度系统[J].高技术通讯(中文),2022,32(7):727~736

基于深度强化学习的MPTCP动态编码调度系统

A dynamic coding and scheduling system of MPTCP based on deep reinforcement learning

DOI：10.3772/j.issn.1002-0470.2022.07.007

中文关键词: 多路径传输控制协议（MPTCP）；动态前向纠错编码；数据包调度；深度强化学习

英文关键词: multi path transport control protocol (MPTCP), dynamic forward error correction coding, packet scheduling, deep reinforcement learning

基金项目:

作者	单位
廖彬彬* **	（中国科学院计算技术研究所北京 100190）（中国科学院大学北京 100190）（**中国科学院计算机网络信息中心北京 100190）
张广兴*	（中国科学院计算技术研究所北京 100190）（中国科学院大学北京 100190）（**中国科学院计算机网络信息中心北京 100190）
刁祖龙*	（中国科学院计算技术研究所北京 100190）（中国科学院大学北京 100190）（**中国科学院计算机网络信息中心北京 100190）
谢高岗***	（中国科学院计算技术研究所北京 100190）（中国科学院大学北京 100190）（**中国科学院计算机网络信息中心北京 100190）

摘要点击次数: 3184

全文下载次数: 2274

中文摘要:

通过在多宿主设备上使用多个网络接口，现有的多路径传输控制协议（MPTCP）能够实现跨物理链路的吞吐量聚合和单路径故障的连通性恢复，并极大地改善了传统单路径传输控制协议（TCP）的网络服务质量（QoS）。然而，当MPTCP多条TCP子流中的任意一条出现严重的时延抖动、网络拥塞或数据包丢失等性能瓶颈时，这些高延迟或高损耗的子流将会阻塞其他子流的数据传输，使得MPTCP的整体传输性能远远低于预期。已有的研究表明，使用数据包调度器或编码器的方法能够有效地缓解这类网络异构性造成的负面影响。但是针对动态多变的异构网络环境，如何设计出高效且自适应的数据包调度程序和编码算法则变得尤为重要。基于已有的MPTCP动态前向纠错编码和数据包按比率分配思想，本文提出了使用深度强化神经网络的MPTCP动态多路径编码调度器（DMES）。利用Transformer神经网络和深度强化学习的智能体，DMES通过观测当前MPTCP网络环境中动态TCP子流组成的网络状态空间，并根据实时的多子流状态梯度搜索最佳的动作集合，以最大化反馈函数中定义的MPTCP整体性能。实验结果表明，相较于目前最先进的解决办法，DMES能更加适应动态多变的网络环境。尤其在高丢包和多子流的情况下，DMES将异构网络导致的MPTCP接收端乱序队列（OQS）降低到24.6%以上，并且能够在提升18.3%的有效吞吐量的同时将MPTCP的应用延迟降低12.2%左右。

英文摘要:

By using multiple network interfaces on multi-hosted devices, the existing multi path transport control protocol (MPTCP) transport protocol can achieve throughput aggregation across physical links and connectivity recovery from the single-path failures, and greatly improve the network quality of service (QoS) of the traditional single path transport control protocol (TCP). However, when any one of MPTCP’s multiple TCP subflows suffers serious performance bottlenecks such as delay jitter, network congestion or packet loss, these high delay or high loss subflows will block the data transmission of other subflows, which makes the overall performance of MPTCP far lower than expected. Existing studies have shown that using packet scheduler or encoder can effectively alleviate the negative effects caused by the heterogeneity of such networks. However, it becomes important to design an efficient and adaptive packet scheduler and coding algorithm in the dynamic and changeable heterogeneous network environment. Based on the existing MPTCP dynamic coding and splitting ratio, a dynamic multi path encoding schecluler (DMES) is proposed by using deep reinforcement learning. By using the transformer network and deep reinforcement learning agent, DMES observes the state space of current MPTCP network environment and searches the best action, so as to maximize the overall performance of MPTCP. The experimental results show that compared with the most advanced solutions, DMES can better adapt to the dynamic and changeable network environment. Especially in the case of high packet loss and multiple subflows, DMES can reduce the out-of-order queues (OQS) by up to 24.6%, and increase the goodput by 18.3% while reducing the application delay by about 12.2%.

查看全文查看/发表评论下载PDF阅读器

关闭