曹 燕 何晓敏 陈 亮 毛一雷 孙 洁.相关文档探测方法在科技查新中的应用研究[J].中国科技资源导刊,2020,(1):54~61 |
相关文档探测方法在科技查新中的应用研究 |
Research on the Application of Related Document Detection Method in Sci-tech Novelty Retrieval |
投稿时间:2019-11-27 |
DOI: |
中文关键词: 科技查新;相关文档探测;条件随机场;特征选取;文本相似度;共现词汇 |
英文关键词: sci-tech novelty retrieval, related documents, conditional random field,feature selection, text
similarity, co-occurrence vocabulary |
基金项目:中信所重点工作项目“基于知识库的创新调查集成服务环境建设(一期)”(ZD2019-03)。 |
作者 | 单位 | 曹 燕 何晓敏 陈 亮 毛一雷 孙 洁 | (中国科学技术信息研究所,北京 100038) |
|
摘要点击次数: 1233 |
全文下载次数: 832 |
中文摘要: |
当前科技查新工作的特点是高人力、低效率、难复制,查新结果的质量受查新人员业务水平和领域背景知
识影响较大,纯粹依靠人工进行查新检索和对检索结果相关性判别无论是从效率还是准确率方面均无法适应科技创新
对科技查新工作的新要求。在大数据时代,计算机技术和人工智能的介入可以在一定程度上提高查新的效率和质量。
首先提出适用于科技查新业务的相关文档探测方法,将可用信息从文本相似度拓展到词汇、主题和语义维度,来捕捉
查新点和科学技术要点与相关文档的关联关系,进而抽取相关特征并将其集成到条件随机场中进行相关文档探测。然
后以全国科技查新事实型数据库为数据基础开展实验。实验表明,本文所提出的相关文档探测方法取得了较好的效
果,有助于从数据科学和人工智能的角度来理解科技查新的业务和数据,为科技查新的自动化、智能化提供相应参考。 |
英文摘要: |
The current characteristics of sci-tech novelty retrieval are high manpower, low efficiency, and
difficult to copy. The quality of the retrieval results is greatly influenced by the professional level and domain
background of the search staff. Purely relying on manual search and correlation discrimination of search
results cannot meet the new requirements of scientific and technological innovation in terms of efficiency and
accuracy. In the era of big data, the involvement of computer technology and artificial intelligence can improve
the efficiency and quality of new search to a certain extent. This paper proposes a method for detecting related
documents that is suitable for new technology search business, extending the available information from text
similarity to vocabulary, thematic and semantic dimensions to capture the relationship between the search
points and scientific and technological points and related documents, and then extracts relevant features and
integrates them into conditional random fields for related document detection. Experiments are carried out on
the basis of the national science and technology novelty fact database. The experiments show that the document
detection method proposed in this paper has achieved good results, which is helpful for understanding the
new business and data of scientific and technological search from the perspective of data science and artificialintelligence, and provides a corresponding reference for the new automation and intelligence of scientific and
technological search. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |