全景式多路径知识图谱构建研究——以水稻粒型基因领域为例

曹雨晴; 鲜国建; 黄永文; 陈博立; 李娇; 罗婷婷; 孙坦

文章摘要

曹雨晴,鲜国建,黄永文,陈博立,李娇,罗婷婷,孙坦.全景式多路径知识图谱构建研究——以水稻粒型基因领域为例[J].数字图书馆论坛,2022,(4):25~34

全景式多路径知识图谱构建研究——以水稻粒型基因领域为例

Research on the Construction of Panorama Domain Knowledge Graph: Using a Case Study of Grain Shape Gene in Rice

投稿时间：2022-04-10

DOI：10.3772/j.issn.1673-2286.2022.04.004

中文关键词: 全景式本体模式；知识图谱；水稻粒型基因；多路径知识抽取；知识发现

英文关键词: Panoramic View of Ontology Schema; Knowledge Graph; Rice Grain Shape Gene; Multiple Approaches for Knowledge Extraction; Knowledge Discovery

基金项目:本研究得到国家社会科学基金一般项目“科技论文全景式摘要知识图谱构建与应用研究”（编号：19BTQ061）资助。

作者	单位
曹雨晴	中国农业科学院农业信息研究所
鲜国建	中国农业科学院农业信息研究所农业农村部农业大数据重点实验室
黄永文	中国农业科学院农业信息研究所国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室
陈博立	中国农业科学院农业信息研究所
李娇	中国农业科学院农业信息研究所国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室
罗婷婷	中国农业科学院农业信息研究所国家新闻出版署农业融合出版知识挖掘与知识服务重点实验室
孙坦	农业农村部农业大数据重点实验室中国农业科学院

摘要点击次数: 1926

全文下载次数: 4042

中文摘要:

本文基于通用数据资源（科技文献、科研活动等）和专业领域知识资源（如组学科研数据），以水稻粒型基因领域为例，探索具有一定普适性，能兼顾知识覆盖广度和深度（全景式），并可充分继承整合多源异构数据和知识（多路径）的领域知识图谱构建方法。首先，继承复用权威学术论文中专家先验知识和多种领域本体，自顶向下设计构建全景式水稻粒型基因知识图谱模式层的本体模型；其次，通过图数据抽取、结构化及半结构化转换映射和非结构化文本抽取等多路径实现图谱数据实例填充，并基于数据挖掘发现的新实体及其语义关系，进行自底向上的本体模型迭代完善；再次，通过实体消歧、实体链接等实现多源知识关联融合，并基于Neo4j数据库实现图谱数据持久化存储；最后，对领域知识图谱驱动下的典型知识关联与发现服务应用场景进行展望。实验结果表明，本文研究提出的全景式、多路径领域知识图谱构建方法，具有一定集成性和通用性，可为细分垂直领域的知识图谱构建提供参考。

英文摘要:

In this paper, based on general resources (scientific literature, scientific research activities, etc.) and specialized domain knowledge resources (such as multi-omics scientific data), we explore a universal constructing method of domain knowledge graph, considering the balance of the breadth and depth of knowledge coverage and can fully inherit and integrate multiple sources of heterogeneous data and knowledge, and take the deep integration of multi-omics scientific data and scientific literature to verify the effectiveness. Firstly, on the basis of the prior expert knowledge in the authoritative academic literature, the model layer is carried out panoramically from up to bottom, by reusing multiple domain ontologies. Secondly, the data layer is filled by the multi-channel data extraction, including graph knowledge extraction, semi-structured knowledge extraction and unstructured text extraction. Besides, based on new entity pairs and their semantic relationships found from the data, the model layer is refined iteratively from bottom to up. Then, we realized the knowledge fusion through a series of ways including entity discrimination and entity linkage, and made Neo4j to approve the persistent storage. The experimental results show that the panoramic and multi-path domain knowledge graph construction method we proposed in this research has characteristics of integration and generality, and provide reference for the knowledge graph construction in segmented vertical fields. Finally, we explored the typical application of domain knowledge graph through preliminary implementation of knowledge discovery services in retrospect and prospect.

查看全文查看/发表评论下载PDF阅读器

关闭