文章摘要
傅柱,邱畅唱,刘鹏.结合半监督学习和规则校正的中文学术论文问题实体识别研究[J].数字图书馆论坛,2024,20(12):56~65
结合半监督学习和规则校正的中文学术论文问题实体识别研究
Problem Entity Recognition of Chinese Academic Paper Combining Semi-Supervised Learning and Rule Correction
投稿时间:2024-07-10  
DOI:10.3772/j.issn.16732286.2024.12.007
中文关键词: 问题实体识别;条件随机场;半监督学习;规则校正;小规模数据
英文关键词: Problem Entity Recognition; Conditional Random Field; Semi-Supervised Learning; Rule Correction; Small Scale Data
基金项目:本研究得到国家社会科学基金项目“面向AI4S的场景化智慧知识服务框架研究”(编号:24CTQ029)资助。
作者单位
傅柱 江苏科技大学经济管理学院 
邱畅唱 江苏科技大学经济管理学院 
刘鹏 江苏科技大学经济管理学院 
摘要点击次数: 71
全文下载次数: 88
中文摘要:
      为快速定位和识别学术论文中的研究问题,针对中文学术论文提出一种结合半监督学习和规则校正的问题实体识别方法。首先以条件随机场模型为基础框架,构建词性、指示词等有监督特征和相似度、重要度等无监督特征,然后对比不同特征组合下的模型识别效果,结合领域语言学规则对识别结果进行校对处理,最后以“共享经济”和“船舶建造”主题领域为例进行实证研究。所提方法的实体识别性能优于主流深度学习模型和大语言模型等预训练模型结果,在两个领域主题语料集上的F1值分别达到85.82%和86.38%,在1/2和1/4数据集上的性能优势进一步扩大,表明所提方法在不同领域小规模标注数据集上能较好识别中文学术论文的问题实体,呈现出良好的有效性和稳健性。
英文摘要:
      In order to quickly locate and identify research problems in academic papers, this paper proposes a problem entity recognition method combining semi-supervised learning and rule correction for Chinese academic papers. First, based on the framework of conditional random field model, supervised features such as parts of speech and deixis and unsupervised features such as similarity and importance are constructed. Then, the model recognition effects under different feature combinations are compared, and the recognition results are proofread according to domain linguistics rules. Finally, the subject areas of “sharing economy” and “ship construction” are taken as examples for empirical research. The entity recognition performance of the proposed method is better than that of the mainstream deep learning model and the pre-trained model such as the large language model. The F1 score of the two domain subject datasets reaches 85.82% and 86.38%, respectively, and the performance advantage on the 1/2 and 1/4 datasets is further expanded. It shows that the proposed method can identify the problem entities of Chinese academic papers well on small-scale labeled datasets in different fields, and shows good validity and robustness.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮