文章摘要
李湘东,石健,孙倩茹,贺超城.基于BERT-MLDFA的内容相近类目自动分类研究——以《中图法》E271和E712.51为例[J].数字图书馆论坛,2022,(2):18~25
基于BERT-MLDFA的内容相近类目自动分类研究——以《中图法》E271和E712.51为例
Automatic Classification Research of Similar Categories Based on BERT-MLDFA: Take E271 and E712.51 in CLC as an Example
投稿时间:2022-02-03  
DOI:10.3772/j.issn.1673-2286.2022.02.003
中文关键词: 《中图法》;深度学习;BERT;自动分类
英文关键词: Chinese Library Classification; Deep Learning; BERT; Automatic Classification
基金项目:本研究得到武汉大学青年研究中心调研课题“高校大学生‘内卷’机制的建模与仿真研究”(编号:20210407)资助。
作者单位
李湘东 武汉大学信息管理学院
武汉大学电子商务研究与发展中心 
石健 武汉大学信息管理学院 
孙倩茹 武汉大学信息管理学院 
贺超城 武汉大学信息管理学院 
摘要点击次数: 1287
全文下载次数: 1108
中文摘要:
      针对《中图法》中具有关联度大、区分度小等特点的内容相近类目,探讨利用深度学习来提升分类效果的方法。本文构建BERT-MLDFA模型,即通过多层级注意力机制对BERT不同层参数进行动态融合,并在任务数据集上预训练,进而以《中图法》中E271和E712.51作为典型内容相近类目进行自动分类实验。结果表明:本文方法的Macro_F1值达到0.987,相较于经典机器学习方法提升2.4%,而且该方法可以捕捉内容相近类目文本之间的细微语义差别,能够较好地应用于《中图法》以及其他内容相近类目分类,具有较强普适性。
英文摘要:
      This paper discusses the method of using deep learning method to improve the classification performance of the similar categories in Chinese Library Classification which have the features of high correlation degree and low differentiation degree. This paper proposes a BERT-MLDFA model that dynamically integrates parameters of different BERT layers through multi-level attention mechanism, and further pretrains on task datasets. Then, to conduct automated classification experiments, E271 and E712.51 in Chinese Library Classification were used as typical similar categories. The results show that the Macro_F1 value of the proposed method reaches 0.987, which is 2.4% higher than that of the classical machine learning method. The method proposed in this paper can capture the subtle semantic differences between texts of similar categories, which can be applied to Chinese Library Classification and other similar categories and is universal.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮