利用类型语义表示进行标签降噪的细粒度实体分类

席鹏弼* **; 靳小龙* **; 白硕***; 程学旗* **

文章摘要

席鹏弼* **,靳小龙* **,白硕***,程学旗* **.利用类型语义表示进行标签降噪的细粒度实体分类[J].高技术通讯(中文),2024,34(2):111~122

利用类型语义表示进行标签降噪的细粒度实体分类

Fine grained entity type classification using type semantic representation for noisy label reduction

DOI：10. 3772/ j. issn. 1002-0470. 2024. 02. 001

中文关键词: 实体分类；细粒度类型；多标签降噪；多标签分类

英文关键词: entity typing, fine grained type, multi-label noise reduction, multi-labels classification

基金项目:

作者	单位
席鹏弼* **	(中国科学院计算技术研究所网络数据科学与技术重点实验室北京 100190) (中国科学院大学计算机科学与技术学院北京 100408) (**恒生电子股份有限公司杭州 310053)
靳小龙* **
白硕***
程学旗* **

摘要点击次数: 4416

全文下载次数: 3127

中文摘要:

细粒度实体分类(FET)任务的训练数据往往利用已有知识库中的知识通过远程监督方法进行生成，生成过程中不可避免地引入多余的噪音标签。现有考虑训练数据中噪音问题的工作通常只建模训练数据和标注类型的概率分布，对细粒度类型的语义信息学习不足，造成在标注了多个细粒度类型的训练数据上选择了与实体上下文不相关的类型进行模型的学习。本文提出一种利用细粒度类型的语义表示进行标签降噪的细粒度实体分类方法。首先利用训练数据中具有唯一细粒度类型路径的数据学习一部分细粒度类型的表示，进而结合细粒度类型间的关系信息学习其他细粒度类型的表示；其次在标注了细粒度类型的训练数据中选取与实体上下文的语义信息最相似的细粒度类型为目标类型，从数据集中选择Top K个相似数据进行细粒度类型语义信息的聚合；最后在聚合信息上学习最终的细粒度实体分类模型。实验结果表明，该方法可以有效地从标注了细粒度类型的训练数据中选出与实体上下文的语义信息最相符的细粒度类型，达到提升细粒度实体分类准确率的效果。

英文摘要:

The training data of fine-grained entity typing(FET) is usually generated by the distant supervision based on knowledge base, this process inevitably introduces noise type labels. The existing work mostly models the probability distribution of the training data and annotation types, and lacks the semantic learning of fine-grained types, causing the problem of the usage of types unrelated to the entity context during models learning. This paper proposes a fine-grained entity classification method for label noise reduction based on the semantic representation of fine-grained types. First, it learns the representation of some fine-grained types from the data with a unique fine-grained type path in the training set, and learns the representation of the rest fine-grained types by the combination of the relationship information between fine-grained types. Second, select the fine-grained entity type in the training data annotation fine-grained type set that is most similar to the semantic information of the entity context as target types, then, select Top-K similar sentences from the dataset to aggregate fine-grained semantic information. Last, it learns final fine-grained entity classification model based on the aggregated information. Experimental results and analysis on datasets demonstrate that our model effectively selects the fine-grained type that best matches the semantic information of the entity context from the fine-grained types set annotated in the training data, and is able to achieve the effect of improving the accuracy of fine-grained entity.

查看全文查看/发表评论下载PDF阅读器

关闭