All the time, I know it is a technique for dimensionality reduction. There are tow equivalent views of PCA:

  • A orthogonal projection of the data onto a lower dimensional linear space so that the variance of the projected data is maximized
  • Or we can think it as a mathematics way. It is just calculating the mean squared distance between the data points and their projections.

That is correct. But still need to keep in mind. PCA can also used in data visualization, feature extraction, pattern learning, etc.

Before I dive into it. Let us revisit some background knowledge. <img src=”/img/posts/variance.png”, alt=”variance”, align=”center”/> <img src=”/img/posts/covariance.png”, alt=”covariance”, align=”center”/> If covariance is zero, X and Y are linearly independent of each other.

What about Eigenvalues/Eigenvectors, Av=lambdav. A is a matrix, v is a vector and lambda obviously is a scalar. If the equation is satisfied. v is an eigenvector of A.

Before I explain it with a real example, I want to emphasize that eigenvectors can have length of 1 where V=(v1,v2,v3,v4,….vn).

An example of PCA. To start off, subtract the means along each feature. For x(1) is 1.81 while x(2) is 1.91. Next, calculate the covariance matrix. cov = (var(x1) cov(x1,x2) nextline cov(x2,x1) var(x2)). Third, calculate the eigenvectors andeigenvalues of the covariance matrix(I know the function, but I have no idea to get the result). In the fourth step, order eigenvector by eigenvalues, highest to lowest. #<img src=”/img/posts/covariance.png”, alt=”covariance”, align=”center”/> When we get the feature vector, time to derive the new data set. TransformedData = RowFeatureVector x RowDataAdjust(remember substract the mean?).

Remember PCA’s goal is to maximize the projected variance below:(shown with a pic) by finding a proper u1(v1). We also know that u1 is a unit vector.

What if you want to reconstruct the original data, just do all the things back wards. But when you leave out some eigenvector when applying PCA, You can still reconstruct the original data with some information loss.