融合最近邻矩阵与局部密度的K-means聚类算法_艾力米努尔·库尔班 : . 计算机科学与 K-means 聚类算法。受最邻近吸收原则与密度峰值原则启发,通过引入数据对象间的距离差异值构造邻近矩阵,根据 邻近矩阵计算局部密度,不需要任何参数设置,采取最近邻矩阵与局部密度融合策略,自适应确定初始聚类中 心数目和位置,同时完成非中心点的初分配。人工数据集和 UCI 数据集的实验测试,以及与传统 K-means 算法、基于离群点改进的 K-means 算法、基于密度改进的 K-means 算法的实验比较表明,本文提出的自适 应 K-means 算法对人工数据集的孤立点免疫度较高,对 UCI 数据集具有更准确的聚类结果。 关键词:自适应 K-means 聚类算法;密度峰值原则;最邻近吸收原则;局部密度 文献标志码:A 中图分类号:TP301 The Adaptive K-means Algorithm Combining Local Density and Nearest-Neighbor Matrix KUERBAN Ailiminuer, XIE Juanying, YAO Ruoxia+ School of Computer Science, Shaanxi Normal University, Xi’an 710119, China Abstract: To overcome the deficiencies of the traditional K-means algorithm and its variants introducing densities which are sensitive to the initial cluster centers and outliers and need giving arbitrary parameters, this paper propos- es an adaptive K-means clustering algorithm by combining the nearest neighbor matrix and local density. Inspired by the Nearest-Neighbors and the density peaks, the adjacency matrix is constructed by introducing the distance dif- ference between objects. Then the local density is calculated without any parameters except for the adjacency matrix. After that the initial centers and the number of clusters of K-means are simultaneously determined by using the Nearest-Neighbor matrix and the local density, and the rest objects are assigned as well. Experiments on synthetic datasets, and on real world datasets from UCI machine learning repository, and the comparisons with traditional K-means algorithm, and the improved K-means algorithms based on outliers or on the densities all demonstrate that the proposed adaptive K-means algorithm is robust to outliers on synthetic data