DB Seminar Series_ HARP_ A Hierarchical Algorithm with Automatic ....ppt


文档分类:高等教育 | 页数:约49页 举报非法文档有奖
1/49
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/49
文档列表 文档介绍
DB Seminar Series: HARP: A Hierarchical Algorithm with Automatic Relevant Attribute Selection for Projected Clustering
Presented by:
Kevin Yip
20 September 2002
0
Short Summary
Our own work (unpublished), supervised by Dr. Cheung and Dr. Ng
Problem: to cluster datasets of very high dimensionality
Assumption: clusters are formed in subspaces
1
Short Summary
Previous approaches: either have special restrictions on the dataset or target clusters, or cannot determine the dimensionality of the clusters automatically
Our approach: not restricted by these limitations
2
Presentation Outline
Clustering
Projected clustering
Previous approaches to projected clustering
Our approach: HARP
Concepts
Implementation:
Experiments
Future work and conclusions
3
Clustering
Goal: given a dataset D with N records and d attributes (dimensions), partition the records into k disjoint clusters such that
Intra-cluster similarity is maximized
Inter-cluster similarity is minimized
4
Clustering
How to measure similarity?
Distance-based: Manhattan distance, Euclidean distance, etc.
Correlation-based: cosine correlation, Pearson correlation, etc.
Link-based (common neighbors)
Pattern-based
5
Clustering
mon types of clustering algorithms:
Partitional: selects some representative points for each cluster, assigns all other points to their closest clusters, and then re-determines the new representative points
Hierarchical (agglomerative): repeatedly determines the two most similar clusters, and merges them
6
Clustering
Partitional clustering:
Dataset
Representatives
Assignment
Replacement
7
Clustering
Hierarchical clustering:
Dataset
Similarity calculation
Best merge
determination
Merging
8
Projected Clustering
Assumption (general case): each cluster is formed in a subspace
Source of figures:
ORCLUS (SIGMOD 2000)
Assumption (special case): each cluster has a set of relevant attributes
Goal: determine the records and relevant attributes of each cluster (to “

DB Seminar Series_ HARP_ A Hierarchical Algorithm with Automatic ... 来自淘豆网m.daumloan.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数49
  • 收藏数0 收藏
  • 顶次数0
  • 上传人中国课件站
  • 文件大小0 KB
  • 时间2011-12-04