引入词汇信息的中文医学命名识别模型研究

陈晶*; 孙亚轩** ***; 邢珂萱** ***

文章摘要

陈晶*,孙亚轩** ***,邢珂萱** ***.引入词汇信息的中文医学命名识别模型研究[J].高技术通讯(中文),2024,34(10):1058~1069

引入词汇信息的中文医学命名识别模型研究

Research on Chinese medical naming recognition model with vocabulary information

DOI：10. 3772 / j. issn. 1002-0470. 2024. 10. 005

中文关键词: 中文医学命名识别; 先验知识; 嵌入层; 门控单元; 词汇信息

英文关键词: Chinese medical naming recognition, prior knowledge, embedding layer, gated unit, vocabulary information

基金项目:

作者	单位
陈晶*	(* 广东海洋大学数学与计算机学院湛江 524088) ( 燕山大学信息科学与工程学院秦皇岛 066004) (* 河北省虚拟技术与系统集成重点实验室秦皇岛 066004)
孙亚轩 *
邢珂萱 *

摘要点击次数: 2087

全文下载次数: 1834

中文摘要:

医学领域文本存在大量的专业词汇,相比于通用领域更容易出现分词错误和未登录词的问题,其结果会导致上下文语义缺失,并影响命名实体识别(NER) 的准确率。为了解决上述问题,本文提出了引入词汇信息的基于门控循环单元的中文医学命名实体识别模型 WI-NER。首先,基于中文医学数据集的特点,描述了中文医学领域的命名实体识别的任务定义、实体位置和实体类别标签,并将模型在嵌入层对匹配专业词的字符进行特征嵌入与向量融合;其次,在上下文编码层添加词汇门控单元,利用循环神经网络的记忆与遗忘机制,自动提取实体识别所需的特征,并通过引入词汇信息和先验知识,实现了中文医学命名实体识别效果的提升;最后,对本模型在 3 个数据集上进行了实验验证,结果表明,本文提出的中文医学命名实体识别模型在准确率方面优于基线模型,达到了预期的医学领域特性。

英文摘要:

There are a large number of specialized words in medical texts, which are more prone to word segmentation er- rors and unregistered words than in general fields, resulting in the loss of contextual semantics and affecting the ac- curacy of named entity recognition (NER). In order to solve the above problems, WI-NER, a Chinese medical named entity recognition model based on gated circulation unit with lexical information, is proposed in this paper. Firstly, on the basis of the characteristics of Chinese medical data set, the task definition, entity location and entity category label of named entity recognition in Chinese medical field are described. In addition, the model performs feature embedding and vector fusion on the characters matching professional words in the embedding layer. Second- ly, a lexical gating unit is added to the context coding layer, and the features required for entity recognition are au- tomatically extracted by using the memory and forgetting mechanism of recurrent neural networks. By introducing lexical information and prior knowledge, the recognition effect of Chinese medical named entities is improved. Fi- nally, the model is verified by experiments on three datasets, and the results show that the accuracy of the Chinese medical named entity recognition model proposed in this paper is better than that of the baseline model, achieving the expected characteristics in the medical field.

查看全文查看/发表评论下载PDF阅读器

关闭