生物医学文献检索方法与问答系统

潘昊杰; 周芳; 张博文; 张乐乐; 方帆; 殷绪成

文章摘要

潘昊杰,周芳,张博文,张乐乐,方帆,殷绪成.生物医学文献检索方法与问答系统[J].情报工程,2016,2(5):050-057

生物医学文献检索方法与问答系统

Query Processing in Biomedical Literature RetrievalandQuestion Answering System

DOI：

中文关键词: 生物医学文献检索，序列依赖模型，词向量，伪相关反馈，排序学习

英文关键词: Biomedical literature retrieval, sequential dependence model, word embedding, pseudo relevance feedback,learning-to-rank

基金项目:本研究得到国家自然科学基金“结合前馈和反馈机制的自然场景文本识别技术”（编号：61473036）的资助，并在此基础上展开后续理论及应用研究。

作者	单位
潘昊杰	北京科技大学计算机科学与技术系
周芳	北京科技大学计算机科学与技术系
张博文	北京科技大学计算机科学与技术系
张乐乐	北京科技大学计算机科学与技术系
方帆	北京科技大学计算机科学与技术系
殷绪成	北京科技大学计算机科学与技术系

摘要点击次数: 4012

全文下载次数: 4838

中文摘要:

如何有效的进行生物医学文献检索和信息挖掘，是计算机技术和生物信息技术研究领域中的一个经典课题。本文对生物医学文献中自然语言问题文档，片段，概念和 RDF 三元组，构建了高效的检索和问答系统。特别的，在文档检索中，我们搭建了基于顺序依赖模型，词向量，和伪相关反馈相结合的通用检索模型；同时，前 k 个文档被分离为句子和片段，并以此建立检索索引，并基于文档检索模型，完成片段检索；在概念挖掘中，提取生物医学的概念，列出相关的概念属于网络服务的五个数据库链接，通过得分排名得到最终的概念。在 CLEF BioASQ 几年的评测数据上，我们构造的检索系统都取得了不错的性能。

英文摘要:

How to effectively carry out the biomedical literature search and information mining is a classic topic in the field of computer technology and biological information technology research.This study constructed an efficient retrieval and question answering system refer to the related problem of natural language problems in biological medical literature documents,including snippets, concepts and RDF triplets.In particular, this research built a general search model based on Sequential Dependence Model, WordEmbedding and Pseudo Relevance Feedbackin the documents retrieval. Moreover, the former K documents were separated into sentences and snippets to establish the indexand complete the snippets search based on the documents retrieval model. In concepts mining, this study extracted biomedical concepts from the concepts, listed the related concepts belong to the web service of five URLs, and obtained the final concepts through the score rank. The results indicated that the retrieval systemof this study has achieved good performance based on the test data from CLEF BioASQ.

查看全文查看/发表评论下载PDF阅读器

关闭