ponent Analysis (PCA) Data Reduction ? summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. n pA n kX Data Reduction ?“ Residual ” variation is information in A that is not retained in X ? balancing act between – clarity of representation, ease of understanding – oversimplification: loss of important or relevant information. ponent Analysis (PCA) ? probably the most widely-used and well- known of the “ standard ” multivariate methods ? invented by Pearson (1901) and Hotelling (1933) ? first applied in ecology by Goodall (1954) under the name “ factor analysis ”(“ principal factor analysis ” is a synonym of PCA) . ponent Analysis (PCA) ? takes a data matrix of n objects by p variables, which may be correlated, and summarizes it by uncorrelated axes (ponents or principal axes) that are binations of the original p variables ? the first k components display as much as possible of the variation among objects. Geometric Rationale of PCA ? objects are represented as a cloud of n points in a multidimensional space with an axis for each of the p variables ? the centroid of the points is defined by the mean of each variable ? the variance of each variable is the average squared deviation of its n values around the mean of that variable. ??????? nm i im iXX n V 1 21 1 Geometric Rationale of PCA ? degree to which the variables are linearly correlated is represented by their covariances .?????????? nm j jm i im ijXXXXn C 11 1 Sum over all n objects Value of variable j in object m Mean of variable j Value of variable i in object m Mean of variable i Covariance of variables i and j Geometric Rationale of PCA ? objective of PCA is to rigidly rotate the axes of this p -dimensional space to new positions ( principal axes ) that have the following properties: – ordered such that principal axis 1 has the highest variance , axis 2 has the next highest variance, .... , an
PCA主成分分析 来自淘豆网m.daumloan.com转载请标明出处.