文章摘要
韦向峰,缪建明,张全,袁毅.英汉双语富媒体知识图谱构建工程研究——以 CNS 英文期刊为例[J].情报工程,2023,9(5):084-096
英汉双语富媒体知识图谱构建工程研究——以 CNS 英文期刊为例
Research on the Construction of English-Chinese Bilingual Rich Media Knowledge Graph: A Case Study of CNS English Journal
  
DOI:10.3772/j.issn.2095-915X.2023.05.007
中文关键词: 富媒体;知识图谱;实体抽取;实体对齐;语步识别
英文关键词: Rich media; knowledge graph; entity extraction; entity alignment; moves recognition
基金项目:2022 年富媒体数字出版内容组织与知识服务重点实验室开放基金“基于英文科技出版物的跨语言富媒体知识工程研究”(ZD2022-10/01)。
作者单位
韦向峰 1. 中国科学院声学研究所 北京 100190;2. 富媒体数字出版内容组织与知识服务重点实验室 北京 100038 
缪建明 3. 中国兵器工业信息中心 北京 100089 
张全 1. 中国科学院声学研究所 北京 100190 
袁毅 1. 中国科学院声学研究所 北京 100190 
摘要点击次数: 663
全文下载次数: 887
中文摘要:
      [目的/意义]研究自动构建英汉双语富媒体知识图谱的方法和过程,为跨语言多模态知识图谱的自动构建提供借鉴参考,对及时获取最新英文科研成果、科技情报监测等具有重要意义。[方法/过程]采用自顶向下和自底向上相结合的方法,先从顶层设计要抽取的主要实体、属性和关系,从底层非结构化文本数据进行分析抽取细粒度的实体和属性,对有歧义实体和跨语言实体进行实体对齐,对跨媒体的实体进行实体链接,用图数据库实现知识图谱的存储及应用。[局限]未来需进一步提高细粒度实体的抽取正确率,对音视频媒体进行特征提取和内容自动识别。[结果/结论]以 CNS(Cell、Nature、Science)等英文科技期刊网站为例,通过数据抓取、实体抽取、属性抽取、知识融合、跨媒体链接等过程,实现了英汉双语富媒体知识图谱的构建、存储和可视化展示。
英文摘要:
      [Objective/Significance] It is of great significance for scientific and technological information monitoring and obtaining the latest English scientific research results in time, with researching the method and process of automatically constructing the English-Chinese rich media knowledge graph. It is also a meaningful experience for constructing cross-language and cross-media knowledge graph. [Methods/Processes] The approach that combines top-down and bottom-up methods is employed, starting with top-level design for extracting primary entities, attributes, and relationships. For fine-grained entities and attributes, analysis and extraction are performed from the bottom-up analyzing unstructured textual data. Ambiguous entities and cross-lingual entities require entity alignment, while cross-media entities require entity linking. By using a graph database, teh storage and its application of the knowledge graph can be implemented. [Limitations] Future works include further improving the accuracy of fine-grained entity extraction, extracting features and automatically recognizing content for audio and video media. [Results/Conclusions] Taking CNS (Cell, Nature, Science) and other English scientific and technological journal websites as an example, this paper successfully constructed a bilingual English-Chinese multimedia knowledge graph through data scraping, entity extraction, attribute extraction, knowledge fusion, cross-media linking.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮