模式与深度学习融合抽取因果事件三元组

黄俏娟* **; 曹存根*; 陈志文* **

文章摘要

黄俏娟* **,曹存根*,陈志文* **.模式与深度学习融合抽取因果事件三元组[J].高技术通讯(中文),2024,34(9):921~934

模式与深度学习融合抽取因果事件三元组

Integration of patterns and deep learning for extracting causal event triples

DOI：10. 3772 / j. issn. 1002-0470. 2024. 09. 002

中文关键词: 因果事件三元组；词法句法模式；双向长短期记忆-条件随机场（BiLSTM-CRF）；多特征融合；深度学习

英文关键词: causal event triples, lexical-syntactic pattern, bidirectional long short-term memory-conditional random field (BiLSTM-CRF), multi-feature fusion, deep learning

基金项目:

作者	单位
黄俏娟* **	（中国科学院计算技术研究所智能信息处理重点实验室北京 100190）（*中国科学院大学北京 100049）
曹存根*
陈志文* **

摘要点击次数: 1002

全文下载次数: 856

中文摘要:

因果事件三元组对人们理解事件之间的逻辑联系至关重要。针对从文本中抽取因果事件三元组面临的缺乏高质量的数据集和因果知识覆盖范围有限的问题，本文提出了一种结合模式和深度学习的方法，从Web语料库中抽取因果事件三元组。首先，设计了反映因果关系的词法句法模式，并在Web语料库中进行匹配。其次，通过逆向文本频率和因果事件边界词策略，过滤模式匹配结果中的噪音。随后，采用规则的方法对因果事件进行规范化处理，形成了一个高质量的因果事件三元组数据集。最后，在双向长短期记忆-条件随机场（BiLSTM-CRF）模型中将字、词、词性、因果模式特征词和因果事件边界词进行了有效融合，并引入了深度学习策略。经过在因果事件三元组数据集上的训练，本文模型在抽取大规模且涵盖广泛领域知识的Web语料库的因果事件三元组任务中表现出色。实验结果表明，模型抽取因果事件三元组的F1值高达92.44%，边界词识别精确率达到94.00%。该结果证明了模式与深度学习的高效结合、构建数据集的高质量，以及该文模型在实际应用中对抽取Web语料库的因果事件三元组具有显著价值。

英文摘要:

Causal event triplets play a pivotal role in understanding logical links between events. The research combined pattern methods with deep learning to address the lack of high-quality data sets and limited coverage of causal knowledge in extracting causal event triplets from texts. Firstly, lexical-syntactic patterns, reflecting causal relationships, are created and matched within the Web corpus. Secondly, inverse document frequency and causal event boundary word strategies filter noise from the pattern matches. Then, rule-based normalisation of causal events follow, resulting in a high-quality causal event triplet dataset. Finally, in the bidirectional long short-term memory-conditional random fields (BiLSTM-CRF) model, characters, words, parts of speech, causal pattern feature words, and causal event boundary words are effectively integrated, along with the introduction of deep learning strategies. After training on the causal event triple dataset, the model performs well in extracting causal event triples from a large-scale web corpus covering broad domain knowledge. Experimental results show that the causal event triplets F1 score is 92.44% and boundary word identification precision is 94.00%. These findings validate the efficient integration of patterns with deep learning, the high quality of the dataset, and the method’s significant value in extracting causal event triplets from the Web corpus.

查看全文查看/发表评论下载PDF阅读器

关闭