龙艺璇,安源,王东晋,翟夏普,伊惠芳.基于改进LDA模型的铁路领域主题发现研究[J].数字图书馆论坛,2022,(2):26~32 |
基于改进LDA模型的铁路领域主题发现研究 |
Research on Railway Field Topic Discovery Based on Improved LDA Model |
投稿时间:2022-01-20 |
DOI:10.3772/j.issn.1673-2286.2022.02.004 |
中文关键词: 主题发现;铁路领域;语义增强;LDA主题模型 |
英文关键词: Topic Discovery; Railway Field; Semantic Enhancement; LDA Topic Model |
基金项目:本研究得到中国铁道科学研究院集团有限公司科研开发项目“铁路科研知识图谱及智能知识服务体系研究”(2020YJ147)资助。 |
作者 | 单位 | 龙艺璇 | 中国铁道科学研究院科学技术信息研究所 | 安源 | 中国铁道科学研究院科学技术信息研究所 | 王东晋 | 中国铁道科学研究院科学技术信息研究所 | 翟夏普 | 中国铁道科学研究院科学技术信息研究所 | 伊惠芳 | 中国科学院文献情报中心 |
|
摘要点击次数: 1305 |
全文下载次数: 1113 |
中文摘要: |
高效挖掘海量铁路领域科研成果数据中蕴含的主要内容是铁路领域科研人员在大数据时代亟待解决的重要问题。LDA模型是用于主题发现的主流方法,但在面向多单词短语居多的铁路领域研究文献时存在使用受限的问题,因此本文提出一种LDA模型的改进算法:一方面在构建主题模型前,对文本作预处理时抽取语料中的名词短语、动词短语、名词和动词;另一方面在主题模型构建完成后,融合TextRank算法与PMI算法得出关键词组块,并以此替换LDA主题识别结果中的主题词,进一步丰富主题的语义。最后,以铁路领域“牵引供电系统”为例开展实证研究。结果表明,本文提出的改进LDA模型有助于提升铁路领域主题发现结果的可解释性与可识别性,可以为后续铁路领域科研管理中知识服务的实现提供有效的方法支持。 |
英文摘要: |
The era of big data has brought difficulties for researchers in the railway field to quickly select the main research directions, obtain international research trends, and understand international research hotspots. Efficiently excavating the main content contained in the massive scientific and technological literature in the railway field has become an important problem to be solved urgently by researchers in the railway field. In view of the fact that the topic model represented by LDA is used as the mainstream method for topic discovery, there is a problem of limited use in the face of scientific and technological literature in the railway field with many multi-word phrases. In this study, we innovatively propose a semantic enhanced LDA topic model. On the basis of in-depth preprocessing of extracting nouns phrases, verb phrases, nouns and verbs, we combine TextRank algorithm and PMI algorithm to obtain keyword chunks. We use the sorted keyword chunks to replace the topic words in the LDA topic recognition results. In this study, we conduct an empirical study on the “traction power supply system” as an example. The results show that the semantic enhanced LDA topic model proposed in this paper can help to improve the interpretability and recognizability of topic discovery results in the railway field. In addition, it can also provide effective method support for the realization of knowledge services in scientific research management in the railway field. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |