基于深度学习的自然资源政策文本分类研究

胡容波* ** ***; 郭诚* ***; 王锦浩* ***; 方金云*

文章摘要

胡容波* ** ***,郭诚* ***,王锦浩* ***,方金云*.基于深度学习的自然资源政策文本分类研究[J].高技术通讯(中文),2023,33(7):692~703

基于深度学习的自然资源政策文本分类研究

Research on classification of natural resources policy text based on deep learning

DOI：10. 3772/ j. issn. 1002-0470. 2023. 07. 003

中文关键词: 政策文本；文本分类；深度学习；自然资源；延迟决策；数据集构建

英文关键词: policy text, text classification, deep learning, natural resources, delay decision, dataset construction

基金项目:

作者	单位
胡容波* *	（中国科学院计算技术研究所北京 100190）（自然资源部信息中心北京 100036）（**中国科学院大学北京 100190）
郭诚* ***
王锦浩* ***
方金云*

摘要点击次数: 5406

全文下载次数: 3335

中文摘要:

政策文本分类是一项涉及自然语言处理（NLP）、机器学习、政策解析等多领域的综合性技术，在政策管理、研究以及信息服务等方面有重要应用。首先，针对目前政策文本领域公共资源较少的问题，提出结合领域知识和NLP构建政策文本分类数据集的半自动化方法，构建了句子级自然资源政策文本分类数据集；其次，挖掘政策文本自身特点，提出基于深度学习的标题信息自适应增强政策文本分类方法，并在现有主流深度学习模型上进行扩展应用；最后，在自然资源政策文本分类数据集上的实验表明，应用该方法后，5个常用深度学习分类模型的准确率获得了3%以上提升，宏平均F1值获得了5%以上提升。

英文摘要:

Policy text classification is a comprehensive technology involving natural language processing(NLP), machine learning, policy analysis and other fields, which can be applied to policy management, research, information service, etc. Firstly, aiming at the problem that there are few public datasets in the field of policy text at present, a semi-automatic method of combining domain knowledge and NLP to construct policy text classification dataset is proposed, and a sentence-level natural resource policy text classification dataset is constructed. Secondly, taking advantage of the characteristics of policy texts, a deep learning-based title adaptive enhancement policy text classification method is proposed, which is applied to the existing mainstream deep learning models. Finally, extensive experiments on the natural resource policy text classification dataset show that after adding this method, the accuracy of five commonly used deep learning classification models is improved by more than 3%, and the macro-average F1 score is improved by more than 5%.

查看全文查看/发表评论下载PDF阅读器

关闭