文章摘要
于诗睿,李爱花,林紫洛,唐小利.基于SPO语义结构的关键词补充主题识别及演化分析[J].数字图书馆论坛,2023,(6):13~21
基于SPO语义结构的关键词补充主题识别及演化分析
Keywords Supplementary Topic Recognition and Evolution Analysis Based on SPO Structure
投稿时间:2023-04-26  
DOI:10.3772/j.issn.1673-2286.2023.06.002
中文关键词: SPO结构;文本语义挖掘;主题识别;主题演化;引文
英文关键词: SPO Structure; Text Semantic Mining; Topic Recognition; Theme Evolution; Citation
基金项目:本研究得到中国医学科学院医学与健康科技创新工程2021年重大协同创新项目“生物医学文献信息保障与集成服务平台”(编号:2021-I2M- 1-033)资助。
作者单位
于诗睿 中国医学科学院医学信息研究所 
李爱花 中国医学科学院医学信息研究所 
林紫洛 中国医学科学院医学信息研究所 
唐小利 中国医学科学院医学信息研究所 
摘要点击次数: 700
全文下载次数: 716
中文摘要:
      弥补基于SPO(Subject-Predication-Object)语义结构进行文本主题识别及演化分析方法中部分主题信息缺失、无法识别新兴领域主题、语义信息不够具体的缺陷,提升主题识别及演化分析的效果。首先,抽取科技文献标题和摘要中的SPO语义结构,使用关键词作为补充进一步丰富语义;然后,结合社会网络分析指标、新颖性和相对增长性指标分阶段进行核心主题、新兴主题识别;最后,基于文献引用及各阶段核心主题与新兴主题的变化情况进行主题演化趋势分析。分析发现,基于SPO语义结构的关键词补充主题识别及演化分析方法在以基因编辑领域为代表的较新的领域效果更佳,3个阶段的核心主题可从技术和应用2个维度概括为ZFN、TALEN,ZFN、TALEN和CRISPR/Cas9,CRISPR/Cas和碱基编辑在基因编辑系统优化、基础科学、临床疾病治疗和生物技术四大方向的应用,新兴主题主要包括疾病诊断、高通量功能基因组学、合成生物学代谢工程领域、精准医学精准编辑、基因编辑递送工具和基因编辑中的伦理问题六大方向。提出的方法可有效识别研究领域的核心和新兴主题,并把握其主题演化趋势,相较仅基于SPO语义结构的方法,效果得到提升。
英文摘要:
      This paper aims to remedy the defects of the original topic recognition and evolution analysis method based on SPO (Subject-Predication-Object) structure, such as lack of important topic, unable to identify topics in emerging fields, and unspecific semantic information, in order to improve the effect of topic recognition and evolution analysis. First, the semantic structure of SPO is extracted from the titles and abstracts of literature, and the keywords are used as supplements to further enrich the semantic. Then, the social network analysis indicators and novelty and relative growth indicators are used to identify the core topics and emerging topics by stages. Finally, the topic evolution trend is analyzed based on the references and the changes of core themes and emerging themes in each stage. It is found that the method of keyword supplementary topic recognition and evolution analysis based on SPO semantic structure is suitable for the emerging new fields represented by gene editing. The core themes of t e three stages can be summarized from the two dimensions of technology and application as ZFN and TALEN, ZFN, TALEN, and CRISPR/Cas9, CRISPR/Cas and base editing in gene editing system optimization, basic science, clinical disease treatment, and biotechnology. Emerging topics mainly include disease diagnosis, high-throughput functional genomics, metabolic engineering in synthetic biology, precision editing in precision medicine, gene editing delivery tools, and ethical issues in gene editing. The proposed method can effectively identify the core and emerging topics in the research field, and grasp the evolution trend of their themes. Compared with the method based only on the semantic structure of SPO, the effect is improved.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮