Xiuyuan Cheng FPO: Random Matrices in High-Dimensional Data Analysis

Xiuyuan Cheng
Oct 1 2013 - 2:00pm
120 Lewis Library

This thesis studies the spectrum of kernel matrices built from high-dimensional data vectors, a mathematical problem that naturally arises in many applications. In the first part, we consider the spectrum of large kernel matrices built from independent random high-dimensional vectors (the null model). Specifically, we consider n-by-n matrices whose (i, j)-th entry is $f(X_i^T X_j)$, where $X_1, . . .,X_n$ are i.i.d. random vectors in $R^p$, and f belongs to a large class of real-valued functions. As p,n →∞ and p/n →γ, we obtain a family of limiting spectral densities which includes the Marcenko-Pastur density and semi-circle density as special cases. The convergence of the spectral density is firstly proved for i.i.d. normal Gaussian vectors, and then extended to i.i.d vectors that can be “compared” with the normal Gaussian vectors. The study of the null model is fundamental towards understanding noise-corrupted kernel matrices, which are built from vectors admitting a decomposition of “signal + noise” (the “spiking” model). We provide conjectures for the spiking model based on our results for the null model. The second part addresses the application in cryo-EM, where certain kernel matrices built from microscopic image data are used to study the structure of biological molecules. We consider the situation where the molecule admits non-trivial group symmetries, and study (i) the symmetry detection problem and (ii) the structural reconstruction problem. For the former, we derive a theoretical solution based on estimating the rank of certain auto-correlation kernels. For the later, we propose two approaches extending the existing methods developed for non-symmetric molecules. For both problems the proposed methods are tested on simulated data sets. The cryo-EM problem together with other applications motivates the study of the random matrix model in the first part of the thesis.