基于语言类任务的概念化强化学习框架

彭少辉* ** ***; 胡杏*; 支天*

文章摘要

彭少辉* ** ***,胡杏*,支天*.基于语言类任务的概念化强化学习框架[J].高技术通讯(中文),2024,34(6):555~566

基于语言类任务的概念化强化学习框架

Conceptual reinforcement learning for language-assisted tasks

DOI：10. 3772 / j. issn. 1002-0470. 2024. 06. 001

中文关键词: 深度强化学习(DRL)；语言类强化学习任务；文本游戏；表示学习；互信息优化

英文关键词: deep reinforcement learning (DRL), language-assisted reinforcement learning task, text game, representation learning, mutual information

基金项目:

作者	单位
彭少辉* *	(中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190) (中国科学院大学北京 100049) (**中科寒武纪科技股份有限公司北京 100080)
胡杏*
支天*

摘要点击次数: 5231

全文下载次数: 2113

中文摘要:

语言类强化学习任务可以促进强化学习策略的泛化性，其关键问题是自动化学习观测和语言描述的通用表示。现有方法往往隐式学习联合表示，不可避免地引入训练集中的虚假相关信息，进而损伤策略的泛化性和训练效率。针对这一问题，本文提出了概念化强化学习框架（CRL），其利用概念化这种从实体提取相似性生成抽象表示的认知方式，通过基于注意力机制的概念编码器和限制性损失函数显式地学习概括且抽象的概念化表示作为强化学习策略的输入。本文在常用的语言条件任务和文本游戏任务上验证了CRL的有效性，结果显示概念化表示大幅提升了策略的训练效率（最多70%）和泛化性能（最多30%），并有效提升了策略的可解释性。

英文摘要:

Language-assisted tasks are proposed to facilitate the generalization ability of reinforcement learning policy. The key question is to learn the general representation across different scenarios. Existing studies often implicitly learn the joint representation, which may include spurious correlation information and consequently compromise policy’s generalization performance and training efficiency. To address this issue, a conceptual reinforcement learning framework (CRL) is proposed, which exploits the motivation of human cognition that extracts similarits from numerous instances to generate conceptual abstraction, and incorporates a multi-level attention encoder and restricted loss functions to learn compact and invariant conceptual representation for the policy. Evaluated in challenging language-assisted tasks, the results demonstrate that CRL significantly improves the policy’s training efficiency (up to 70%) and generalization ability (up to 30%). Additionally, the conceptual representation also shows better interpretability than other representations.

查看全文查看/发表评论下载PDF阅读器

关闭