一种基于最大匹配和向量空间模型的用户检索词规范化方法

何伟; 常春

文章摘要

何伟,常春.一种基于最大匹配和向量空间模型的用户检索词规范化方法[J].数字图书馆论坛,2016,(7):34~39

一种基于最大匹配和向量空间模型的用户检索词规范化方法

An Approach for Normalizing Retrieval Word Based on Maximum Matching and Vector Space Model

DOI：

中文关键词: 最大匹配;向量空间模型;规范化;叙词表

英文关键词: Maximum Matching;VSM;Normalization;Thesaurus

基金项目:本研究得到中国博士后科学基金项目“基于叙词表语义关系的智能检索模型研究”（编号2014M550791）资助。

作者	单位
何伟	中国科学技术信息研究所
常春	中国科学技术信息研究所

摘要点击次数: 3345

全文下载次数: 2192

中文摘要:

由自由词描述的用户检索词，可能会导致返回过多或过少的检索结果。有研究显示使用叙词表中的语词作为检索词，可提高网络检索系统的查准率和查全率。基于此，本文提出一种基于最大匹配和向量空间模型的用户检索词规范化方法，从词形和词义上进行规范化处理。首先使用最大匹配方法从词形上对用户检索词进行规范化；然后对用户检索词以及词形规范化后的语词构造词汇向量，计算它们间的语义相似性，从词义上实行规范化，获得最终的规范化语词。试验结果表明：本文提出的方法取得较好的效果，用户检索词返回的结果大部分都可通过规范化语词检索获得，当检索词为单个词语时，查准率超过90%。

英文摘要:

It can conduct much more or a lit le result using free terms as retrieval word. Existing research results show that it can improve the recal and precision of a retrieval system using normalized terms from control ed vocabularies. In this paper, we propose a new approach to normalize retrieval words base on maximum matching algorithm and vector space model, which deal with the retrieval words in the two aspects of morphology and semantics. This method first exploits maximum matching to normalize the retrieval words from morphology and obtain candidate words, then respectively construct the vector of the candidate word and the retrieval word to compute semantic similarity, and final y selected the most similar candidate word as the normalized word of the retrieval word. The experimental results showed that the proposed method obtained a promising result, with the precision of more than 90%on the condition that retrieval word is a single word.

查看全文查看/发表评论下载PDF阅读器

关闭