王新.基于神经网络的文献主题国别标引方法研究[J].数字图书馆论坛,2019,(7):39~47 |
基于神经网络的文献主题国别标引方法研究 |
A Country Topic Indexing Method Research Based on Neural Network |
投稿时间:2019-05-21 |
DOI:10.3772/j.issn.1673-2286.2019.07.006 |
中文关键词: 知识组织;主题标引;深度学习;深度卷积神经网络 |
英文关键词: Knowledge Organization; Subject Indexing; Deep Learning; Deep Convolution Neural Networks |
基金项目: |
|
摘要点击次数: 2243 |
全文下载次数: 1550 |
中文摘要: |
为解决海量文献的主题国别标引问题,探讨“互联网+大数据”时代背景下深度学习技术在知识组织领域的应用方法,本文提出基于深度卷积神经网络的文献主题国别标引方法。该方法在探讨主题国别标引任务转换为多标签分类任务的可行性基础上,首先利用自然语言处理方法将文献全文向量化,然后使用预训练的词嵌入将文献向量转换为富含词汇间语义关系的张量,再利用深层卷积神经网络对文本特征由词汇、句子、段落、篇章逐层学习自动提取,生成富含全文语义的张量,最后由全连接层学习分类权重后输出各个国别的概率,实现文献主题国别的自动标引。实验结果表明,该方法达到预期效果,具有高度精确的分类性能和良好的泛化能力,为深度学习算法在知识组织领域的应用提供了有价值的参考。 |
英文摘要: |
In order to solve the problem of country topic indexing of massive literature, and to explore the use of deep learning in the field of knowledge organization under the background of “Internet + Big Data”, this paper proposes a country topic indexing method based on deep convolutional neural network. On the basis of exploring the feasibility of converting the country topic indexing task into a multi-label classification task, this method use the natural language processing method to vectorize the full text of the document as the first step, and then use pre-trained word embedding to transform the document vector into a tensor rich in semantic relationships between words. Thirdly, using deep convolution neural networks to automatically extract text features from vocabulary, sentences, paragraphs, and chapters layer by layer, generates a volume rich in full-text semantics. Finally, the probability of the country label being output by the full connection layer. The experimental results show that the method achieves the desired effect, has a high accurate classification performance and good generalization ability, and provides a valuable reference for the application of deep learning algorithm in the field of knowledge organization. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|