为上下文显式独立建模的中文实体识别方法

陈点* **; 曹逸轩* **; 罗平* ***

文章摘要

陈点* **,曹逸轩* **,罗平* ***.为上下文显式独立建模的中文实体识别方法[J].高技术通讯(中文),2024,34(8):787~797

为上下文显式独立建模的中文实体识别方法

Explicitly modeling the context for Chinese named-entity recognition

DOI：10. 3772 / j. issn. 1002-0470. 2024. 08. 001

中文关键词: 自然语言处理；中文命名实体识别（NER）；上下文独立建模；数据增强

英文关键词: natural language processing, Chinese named-entity recognition (NER), independent context modeling, data augmentation

基金项目:

作者	单位
陈点* **	(智能信息处理重点实验室（中国科学院计算技术研究所）北京 100190) (中国科学院大学北京 100049 ) (**鹏城实验室深圳 518066 )
曹逸轩* **
罗平* ***

摘要点击次数: 4671

全文下载次数: 3932

中文摘要:

现有中文命名实体识别（NER）模型在公开数据集上的表现相对成熟，但有研究指出，模型过度依赖实体文本的字面特征，而上下文对实体识别的影响却未得到重视。现有的模型在简单的泛化测试中表现较差，因此本文提出显式地为上下文独立建模，令模型对上下文和实体的字面信息进行区分。为此，也提出了相应的数据增强方法用于训练模型中的上下文模块、实体字面模块和综合模块。实验结果表明，本文提出的方法在不损失测试集识别效果的情况下，明显改善了模型在不变性测试中的表现，较基准模型其失败率降低了2.3%。

英文摘要:

Current Chinese named-entity recognition (NER) models have achieved remarkable results on public datasets. However, some studies suggest that they rely too heavily on literal features of entity text. Moreover, the influence of context on entity recognition has yet to be fully explored. Existing models perform poorly in simple invariance tests. To address this problem, this paper proposes explicitly modeling the context independently, enabling the model to differentiate between contextual information and the literal information of entities. Additionally, an adapted data enhancement method is introduced to train the context, surface name, and combination modules. Experimental results show that this approach significantly improves the model’s performance in the invariance test without sacrificing recognition performance, reducing the failure rate by 2.3% compared with the benchmark model.

查看全文查看/发表评论下载PDF阅读器

关闭