张忠平* **,孙光旭*,姚春辰*,刘硕*,齐文旭***.基于期望核密度离群因子的离群点检测算法[J].高技术通讯(中文),2024,34(2):187~198 |
基于期望核密度离群因子的离群点检测算法 |
Outlier detection algorithm based on expected kernel density outlier factor |
|
DOI:10. 3772/ j. issn. 1002-0470. 2024. 02. 008 |
中文关键词: 数据挖掘; 离群点; 核密度估计(KDE); 期望距离; 期望核密度离群因子 |
英文关键词: data minning, outlier, kernel density estimation (KDE), expected distance, expected kernel density outlier factor |
基金项目: |
作者 | 单位 | 张忠平* ** | (*燕山大学信息科学与工程学院秦皇岛 066004)
(**河北省计算机虚拟技术与系统集成重点实验室秦皇岛 066004)
(***信息工程大学信息系统工程学院郑州 450001) | 孙光旭* | | 姚春辰* | | 刘硕* | | 齐文旭*** | |
|
摘要点击次数: 709 |
全文下载次数: 508 |
中文摘要: |
针对基于密度的离群点检测方法在不同分布的数据集上检测精度低的问题,提出了一种基于期望核密度离群因子的离群点检测算法。首先,引入k近邻和反向k近邻扩展邻域空间(ENS)代替传统的k邻域范围,更加全面地考虑数据对象的邻域信息;其次,在传统核密度估计(KDE)方法的基础上引入多元高斯函数,在扩展邻域空间内估计数据对象的密度,同时借鉴自适应核带宽的思想,更好地适应不同数据集的数据分布;然后,给出期望距离的概念,进一步区分局部离群点和位于低密度区域的正常点;最后,定义了期望核密度离群因子刻画数据对象离群程度。在人工数据集和真实数据集上对所提算法进行实验验证,并与部分传统算法进行对比,验证了所提算法的有效性。 |
英文摘要: |
For the problem that density-based outlier detection method has low detection accuracy on different distributed data sets, an outlier detection algorithm based on expected kernel density outlier factor is proposed. Firstly, the k-nearest neighbor and reverse k-nearest neighbor extended neighborhood space are introduced instead of the traditional k-neighborhood range, and the neighborhood information of data objects is considered more comprehensively. Then, the multivariate Gaussian function is introduced on the basis of the traditional kernel density estimation (KDE) method to estimate the density of data objects in the extended neighborhood space, and the idea of adaptive kernel bandwidth is introduced to better adapt to the data distribution of different datasets. In addition, the concept of expected distance is proposed to further distinguish between local outliers and normal points located in low-density regions. Finally, the expected kernel density outlier factor characterizes the degree of outlier of the data object. The proposed algorithm is experimentally verified on artificial datasets and real datasets, and compared with some traditional algorithms to prove the effectiveness of the proposed algorithm. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|