面向中文电子病历的属性挖掘

费超群* ** ***; 张书涵* ** ***; 李阳阳**** *****

文章摘要

费超群* ** ***,张书涵* ** ***,李阳阳**** *****.面向中文电子病历的属性挖掘[J].高技术通讯(中文),2022,32(6):597~606

面向中文电子病历的属性挖掘

Attribute mining from Chinese electronic medical records

DOI：10.3772/j.issn.1002-0470.2022.06.005

中文关键词: 属性挖掘；电子病历（EMR）；频繁子序列挖掘；词模式；频繁词模式

英文关键词: attribute mining, electronic medical record (EMR), frequent subsequence mining, word pattern, frequent word pattern

基金项目:

作者	单位
费超群* *	(智能信息处理重点实验室北京 100190) (中国科学院计算技术研究所北京 100190) (中国科学院大学北京 100049) (管理、决策与信息系统重点实验室北京 100190) (***中国科学院数学与系统科学研究院北京 100190)
张书涵* *	(智能信息处理重点实验室北京 100190) (中国科学院计算技术研究所北京 100190) (中国科学院大学北京 100049) (管理、决策与信息系统重点实验室北京 100190) (***中国科学院数学与系统科学研究院北京 100190)
李阳阳** ***	(智能信息处理重点实验室北京 100190) (中国科学院计算技术研究所北京 100190) (中国科学院大学北京 100049) (管理、决策与信息系统重点实验室北京 100190) (***中国科学院数学与系统科学研究院北京 100190)

摘要点击次数: 4412

全文下载次数: 3119

中文摘要:

电子病历(EMR)的属性挖掘任务旨在从一组同一科室下的病历文本中抽取该科室医学检查项目。传统的频繁项或序列挖掘技术并不能直接用于该任务。本文提出一种新的不需要人工干预的属性挖掘框架，并借助无标注技术来处理这一难题，即将属性挖掘问题形式化为半结构化的频繁子序列挖掘任务，并提出一种有效的算法从电子病历中挖掘候选的词模式。在中文电子病历上进行的各项综合实验，证明了本文提出的方法可以有效处理属性挖掘任务。

英文摘要:

The task of mining frequent attributes from the electronic medical record (EMR) aims at extracting medical examination items from a group of diagnosis records produced by the same clinic. The traditional frequent item set or sequence mining techniques can not be directly applied to the case. This paper proposes a tag free technique to overcome the problem and presents a novel extraction framework which can avoid manual tagging. Namely, it formulates the attribute mining problem as a task of semi-structured frequent subsequence mining problem and proposes an effective algorithm to mine candidate word patterns from written EMRs. Comprehensive experimental results on Chinese EMRs show that the proposed method can tackle attribute mining problem effectively.

查看全文查看/发表评论下载PDF阅读器

关闭