从人工密集型到计算密集型：NSTL数据库建设模式转型之路

鲜国建; 罗婷婷; 赵瑞雪; 张建勇; 杨增秀

文章摘要

鲜国建,罗婷婷,赵瑞雪,张建勇,杨增秀.从人工密集型到计算密集型：NSTL数据库建设模式转型之路[J].数字图书馆论坛,2020,(7):52~59

从人工密集型到计算密集型：NSTL数据库建设模式转型之路

Research and Practice of the NSTL Database Construction Mode Transformation: From Labor Intensive to Computing Intensive

投稿时间：2020-04-10

DOI：10.3772/j.issn.1673-2286.2020.07.007

中文关键词: 数据库建设；业务流程再造；多源异构数据融合；科技文献大数据

英文关键词: Database Construction; Business Process Reengineering; Multi-source Heterogeneous Data Fusion; Literature Big Data

基金项目:本研究得到国家科技图书文献中心专项“多来源文摘数据融合研究与系统建设”（编号：2019XM46）和中国农业科学院科技创新工程项目“农业科技大数据融合计算关键技术”（编号：CAAS-ASTIP-2016-AII）资助。

作者	单位
鲜国建	中国农业科学院农业信息研究所农业农村部农业大数据重点实验室
罗婷婷	中国农业科学院农业信息研究所
赵瑞雪	中国农业科学院农业信息研究所农业农村部农业大数据重点实验室
张建勇	中国科学院文献情报中心
杨增秀	机械工业信息研究院

摘要点击次数: 3445

全文下载次数: 2641

中文摘要:

近年来，在国家科技图书文献中心（National Science and Technology Library，NSTL）业务流程再造总体规划指导下，NSTL数据库建设模式发生了深刻变化与全面转型。本文总结梳理了由“全自主加工”到“自主加工+第三方数据利用”，再到当前“多源异构文摘数据深度融合利用”的发展脉络，展现了NSTL数据库建设模式正从人工密集型向计算密集型转型的特点，而近十年来文摘数据加工与第三方数据利用情况印证了这一发展历程。在此基础上，本文以期刊文摘数据加工为例，重点探讨多源异构文摘数据深度融合利用模式，包括基本原则、总体框架、规则设计与算法实现、融合系统设计与实现。最后指出，NSTL数据库建设最终将实现从人工密集型到计算密集型、从加工流程驱动向多源大数据驱动的全面转型发展，也将为NSTL构建下一代新型、智能化的知识发现服务体系，提供坚实的数字科技文献大数据支撑。

英文摘要:

Following the guidance of business process reengineering plan of the NSTL, the construction mode of NSTL database has undergone profound changes and comprehensive transformation. This paper summarizes the development stages, from “full independent digitalization processing” to “independent digitalization processing + third-party data utilization”, and then to the current “deep fusion and utilization of data of multi-source heterogeneous abstractions”. It shows that the NSTL database construction model is transforming from Labor intensive to computing intensive. In the past ten years, the processing of literature digitalization and the utilization of third-party metadata also confirm this development process. On this basis, this paper takes the journal article metadata processing as an example, and focuses on the deep fusion utilization mode of data from multi-source heterogeneous metadata, including basic principles, overall framework, rules design, algorithm implementation, fusion system design and implementation. Finally, it is pointed out that the construction of NSTL database will eventually realize the comprehensive transformation and development from labor intensive to computing intensive, from workflow-driven to multi-source big data-driven, and it will also provide a solid digital literature big data in science and technology domain.

查看全文查看/发表评论下载PDF阅读器

关闭