Principal component analysis is a widely used method for dimension reduction. In high dimensional data, the "signal'' eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the ``noise'' eigenvalues. In this setting, it is not possible to decide based on the largest eigenvalue alone whether or not there are "signal" PCs in the data. In this talk we explore this phenomenon in a general model that captures the shape of eigenvalue distributions often seen in applications. We show how to construct statistical tests to detect principal components, based on all eigenvalues. We also explain how recent computational advances in random matrix theory enable the efficient implementation of our methods.
Optimal detection of weak principal components in high-dimensional data
Edgar Dobriban, Stanford University
May 9 2016 - 2:30pm