一种有效的数据流最大频繁模式挖掘算法

毛伊敏; 杨路明; 李宏; 陈志刚; 刘立新

文章摘要

毛伊敏,杨路明,李宏,陈志刚,刘立新.一种有效的数据流最大频繁模式挖掘算法[J].高技术通讯(中文),2010,20(3):246~252

一种有效的数据流最大频繁模式挖掘算法

An efficient algorithm for mining maximal frequent itemsets over data streams

DOI：

中文关键词: 数据挖掘，数据流，界标窗口，频繁项集，最大频繁项集

英文关键词: data mining， data stream， landmark window， frequent itemsets， maximal frequent itemsets

基金项目:国家自然科学基金(60573127)资助项目

作者	单位
毛伊敏	中南大学信息科学与工程学院；江西理工大学应用科学学院赣州
杨路明	中南大学信息科学与工程学院
李宏	中南大学信息科学与工程学院
陈志刚	中南大学信息科学与工程学院
刘立新	中南大学信息科学与工程学院

摘要点击次数: 6522

全文下载次数: 4818

中文摘要:

针对频繁项集挖掘存在数据和模式冗余的问题，对数据流最大频繁项集挖掘算法进行了研究。针对目前典型的数据流最大频繁模式挖掘算法DSM MFI存在消耗大量存储空间及执行效率低等问题，提出了一种挖掘数据流界标窗口内最大频繁项集的算法MMFI DS，该算法首先采用SEFI tree存储包含在不断增长的数据流中相关最大频繁项集的重要信息，同时删除SEFI tree中大量不频繁项目，然后使用自顶向下和自底向上双向搜索策略挖掘界标窗口中一系列的最大频繁项集。理论分析与实验表明，该算法比DSM MFI算法具有更高的效率，并

英文摘要:

The paper focuses attention on the study of mining of maximal frequent itemsets from data streams to solve the problem of data and pattern redundance in frequent itemset mining, and in consideration of the problem of bad performance in operating time and memory space of the DSM MFI， a typical algorithm for mining maximal frequent itemsets over data streams, presents an algorithm, called MMFI DS. Firstly, the algorithm uses a new compressed tree, called the summary extended frequent item tree (SEFI tree), to maintain the essential information about maximal frequent itemsets embedded in the stream so far, at the same time, a lot of infrequent items are deleted by pruning the tree. Then, it employs a top bottom and bottom top method to mine the set of all maximal frequent itemsets in landmark windows over the data stream. The theoretical analysis and experimental results show that the algorithm performs much better than the previous approaches.

查看全文查看/发表评论下载PDF阅读器

关闭