不确定环境下的深度强化学习编队避障控制

禹鑫燚; 杜丹枫; 欧林林

文章摘要

禹鑫燚,杜丹枫,欧林林.不确定环境下的深度强化学习编队避障控制[J].高技术通讯(中文),2022,32(8):836~844

不确定环境下的深度强化学习编队避障控制

Formation control without collision in uncertain environment based on deep reinforcement learning

DOI：10.3772/j.issn.1002-0470.2022.08.006

中文关键词: 深度强化学习；避障；编队控制；多智能体；神经网络

英文关键词: deep reinforcement learning, collision avoidance, formation control, multi-agent, neural network

基金项目:

作者	单位
禹鑫燚	（浙江工业大学信息工程学院杭州 310023）
杜丹枫	（浙江工业大学信息工程学院杭州 310023）
欧林林	（浙江工业大学信息工程学院杭州 310023）

摘要点击次数: 4879

全文下载次数: 3968

中文摘要:

多智能体编队避障控制的目的在于保持多智能体队形的同时完成避障。针对复杂环境的随机性和不确定性，提出了一种不确定环境下的深度强化学习编队避障控制方法。首先，设计了价值评估网络来增加多智能体编队过程中触碰障碍物或者到达期望位置这些特殊动作的经验，使智能体更快地理解环境规则。其次，在智能体选择动作时，基于贪心策略，对动作选择策略进行改进以提高智能体的学习效率。再次，设计了样本存储空间，在增加样本的利用率的同时提高模型训练效率，并且在决策阶段结合多步学习算法使价值估计更准确。最后，将提出的方法与其他算法进行了对比实验。仿真结果表明提出的方法能使多个智能体在维持队形的同时进行避障，并且有效地提高了智能体学习效率。

英文摘要:

The purpose of multi-agent formation control is to avoid obstacles while maintaining the formation. For the randomness and uncertainty of the complex environment, a formation and obstacle avoidance control method in uncertain environment based on deep reinforcement learning is proposed in the paper. Firstly, a value evaluation network is designed to increase the experience of special actions, such as touching obstacles or reaching the desired location, so that the agents can understand environmental rules faster. Secondly, when the agents select actions, the action selection strategy is improved based on the greedy strategy, which increases the learning efficiency of the agents. Then, the sample storage space is designed to increase the efficiency of model training while increasing the utilization of samples. And the multi-step learning algorithm is combined to make the value estimation more accurate in the decision-making stage. Finally, the proposed method is compared with other algorithms. The simulation results demonstrate that the proposed method can realize the multi-agent formation control without collision. The algorithm proposed in the paper improves learning rate of multi-agents effectively.

查看全文查看/发表评论下载PDF阅读器

关闭