DB Seminar Series_ HARP_ A Hierarchical Algorithm with Automatic ....ppt
DB Seminar Series:HARP: A Hierarchical Algorithm with Automatic Relevant Attribute Selection for Projected Clustering Presented by: Kevin Yip 20 September 2002 0 Short Summary Our own work (unpublished), supervised by Dr. Cheung and Dr. Ng Problem: to cluster datasets of very high dimensionality Assumption: clusters are formed in subspaces 1 Short Summary Previous approaches: either have special restrictions on the dataset or target clusters, or cannot determine the dimensionality of the clusters automatically Our approach: not restricted by these limitations 2 Presentation Outline Clustering Projected clustering Previous approaches to projected clustering Our approach: HARP Concepts Implementation: Experiments Future work and conclusions 3 Clustering Goal: given a dataset D with N records and d attributes (dimensions), partition the records into k disjoint clusters such that Intra-cluster similarity is maximized Inter-cluster similarity is minimized 4 Clustering How to measure similarity? Distance-based: Manhattan distance, Euclidean distance, etc. Correlation-based: cosine correlation, Pearson correlation, etc. Link-based (common neighbors) Pattern-based 5 Clustering mon types of clustering algorithms: Partitional: selects some representative points for each cluster, assigns all other points to their closest clusters, and then re-determines the new representative points Hierarchical (agglomerative): repeatedly determines the two most similar clusters, and merges them 6 Clustering Partitional clustering: Dataset Representatives Assignment Replacement 7 Clustering Hierarchical clustering: Dataset Similarity calculation Best merge determination Merging 8 Projected Clustering Assumption (general case): each cluster is formed in a subspace Source of figures: ORCLUS (SIGMOD 2000) Assumption (special case): each cluster has a set of relevant attributes Goal: determine the records and relevant attributes of each cluster (to “
DB Seminar Series_ HARP_ A Hierarchical Algorithm with Automatic ... 来自淘豆网m.daumloan.com转载请标明出处.