基于多元特征加权改进的TextRank关键词提取方法

余本功; 张宏梅; 曹雨蒙

文章摘要

余本功,张宏梅,曹雨蒙.基于多元特征加权改进的TextRank关键词提取方法[J].数字图书馆论坛,2020,(3):41~50

基于多元特征加权改进的TextRank关键词提取方法

Improved TextRank Keyword Extraction Method Based on Multivariate Features Weighted

投稿时间：2020-02-28

DOI：10.3772/j.issn.1673-2286.2020.03.006

中文关键词: 关键词提取；TextRank ；Word2vec；多元特征加权

英文关键词: Keyword Extraction; TextRank; Word2vec; Multivariate Feature Weighting

基金项目:本研究得到国家自然科学基金资助项目“基于制造大数据的产品研发知识集成与服务机制研究”（编号：71671057）资助。

作者	单位
余本功	合肥工业大学管理学院
张宏梅	合肥工业大学管理学院
曹雨蒙	合肥工业大学管理学院

摘要点击次数: 3260

全文下载次数: 2922

中文摘要:

现有的关键词提取方法从文档集或者单文档方面考虑词语的特征，很少考虑词语在单文档和文档集中的综合特征对关键词提取效果产生的影响，因此，本文提出多元特征加权的关键词提取方法。该方法通过Word2vec模型提取出词语在文档集中的语义关系特征与词语在单文档中的重要性特征，通过线性加权的方式计算出词语的综合影响力，用于改进TextRank模型中的概率转移矩阵，最后迭代计算选取排名靠前的词语作为文档的关键词。实验结果表明，从单文档和文档集两方面综合考虑词语的影响力，可以有效地改善关键词的提取效果。

英文摘要:

Existing keyword extraction methods take into account the characteristics of words from the document set or single document, and rarely comprehensively considered the impact of the comprehensive features of words in single document and document set on the keyword extraction effect. This paper proposed a multi-feature weighted keyword extraction method. This method used the Word2vec model to extract the semantic relationship characteristics of words in the document set, and the importance characteristics of words in a single document to calculate the comprehensive influence of the words in a linear weighting manner, which was used to improve the probability transition matrix in the TextRank model. Finally, iterative calculation selected the top-ranked words as the keywords of the document. Experimental results show that comprehensive consideration of the influence of words from both a single document and a document set can effectively improve the effect of keyword extraction.

查看全文查看/发表评论下载PDF阅读器

关闭