文章摘要
漆月,石璐.面向图书采选的语义化查重策略[J].数字图书馆论坛,2019,(11):61~66
面向图书采选的语义化查重策略
Semantic Duplicate Checking Strategy for Book Acquisition
投稿时间:2019-10-13  
DOI:10.3772/j.issn.1673-2286.2019.11.008
中文关键词: 采选查重;文本相似度;语义分析;评价指标体系
英文关键词: Book Duplicate-Checking; Context Similarity; Semantic Analysis; Evaluation Index System
基金项目:本研究得到重庆市教育科学“十三五”规划2019年度规划课题“面向碎片化学习的生态型智慧教学平台构建研究”(编号:2019-GX-306)资助。
作者单位
漆月 西南大学图书馆 
石璐 上海诺基亚贝尔股份有限公司研发部 
摘要点击次数: 2034
全文下载次数: 1602
中文摘要:
      现有图书馆采选查重系统只能实现对书号、题名的重复检查,但图书出版同质化日益严重,针对异号相似图书查重困难的问题,构建基于自然语言处理技术的查重策略。首先选择主题词、内容提要和目录作为图书内容特征的指标进行建模,利用Word2Vec和WMD技术实现不同长度特征文本语义化相似度计算;然后,采用AHP方法计算特征相似度的权系数,得到图书相似度的综合评价指标;最后以西南大学图书馆数据为实验对象,验证查重策略的可行性。
英文摘要:
      The existing system of library acquisition and duplicate checking can only work with same ISBN number or title. But in the case of serious homogeneity of book publishing, it is difficult to filter out books with similar contents, a method of book semantic duplication checking based on natural language processing technology is presented to solve this. Firstly, subject words, abstracts and catalogues are chosen as the evaluation elements to build model with library. Then, calculate the semantic similarity of context with Word2Vec and WMD, get the weight of similarity by AHP method. Then get comprehensive evaluation of book similarity. Finally, verify the duplication checking strategy with the library data of Southwest University.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮