基于SAC深度强化学习算法的充电枪寻孔策略研究

徐建明; 陈阜; 董建伟

文章摘要

徐建明,陈阜,董建伟.基于SAC深度强化学习算法的充电枪寻孔策略研究[J].高技术通讯(中文),2023,33(1):63~71

基于SAC深度强化学习算法的充电枪寻孔策略研究

Research on hole-finding strategy of charging gun based on SAC deep reinforcement learning algorithm

DOI：10. 3772/ j. issn. 1002-0470. 2023. 01. 006

中文关键词: 机器人寻孔；深度强化学习；柔性行动者评价者(SAC)算法；神经网络；力控制

英文关键词: robot hole searching, deep reinforcement learning, soft actor-critic (SAC), neural network, force control

基金项目:

作者	单位
徐建明	(浙江工业大学信息工程学院杭州 310023)
陈阜	(浙江工业大学信息工程学院杭州 310023)
董建伟	(浙江工业大学信息工程学院杭州 310023)

摘要点击次数: 1322

全文下载次数: 1307

中文摘要:

针对机器人自动化充电任务中的寻孔操作，研究基于柔性行动者评价者（SAC）深度强化学习算法的机器人寻孔策略。设计一个基于actor-critic框架、以枪头位姿、接触力信息为输入、末端枪头坐标系XY平面动作为输出的策略控制器。该策略控制器共有5个神经网络，分别为actor网络、2个目标critic网络、2个critic网络；actor网络负责输出寻孔动作，目标critic网络负责输出下一寻孔状态下寻孔动作的价值评估，critic网络负责输出当前寻孔状态下寻孔动作的价值评估。基于double-Q trick方法使用2个目标critic网络输出价值中的较小值和2个critic网络输出价值中的较小值来分别更新critic网络和actor网络，以训练策略控制器。采用力位混合控制结构，将actor网络输出的XY平面位移动作转换成期望平动速度，与Z轴力跟踪导纳控制输出的期望速度合成机器人期望速度引导充电枪寻孔。仿真和实验验证了所提方法的有效性。

英文摘要:

Aiming at the hole-finding operation in robot automatic charging task, the hole-finding strategy of robot based on soft actor-critic (SAC) deep reinforcement learning algorithm is studied. Based on actor-critic framework, the strategy takes the pose and contact force information of the gun head as input and the XY planes motion of the end-gun head coordinate system as output. The strategy controller has five neural networks, which are actor network, two target critic networks, and two critic networks. The actor network is responsible for outputting the searching action, the target critic network is responsible for outputting the value evaluation of the searching action at the next state, and the critic network is responsible for outputting the value evaluation of the searching action at the current state. Based on the double-Q trick method, the smaller value of the output values of the two target critic networks and the two critic networks are used to update the critic network and the actor network respectively, thereby training the strategy controller. Using the force and position hybrid control structure, the XY planes displacement motion output by the actor network is converted into the expected translation speed, which is combined with the expected speed output by the Z-axis force tracking admittance control to guide the charging gun to find holes. The effectiveness of the proposed method is verified by simulation and experiment.

查看全文查看/发表评论下载PDF阅读器

关闭