-
Hongkang Yang
-
Princeton University
-
Mathematical Theory of Machine Learning Models for Estimating Probability Distributions
The Program in Applied and Computational Mathematics (PACM)
Announces
FINAL PUBLIC ORAL EXAMINATION OF
Hongkang Yang
Date: WEDNESDAY, MAY 3, 2023
Time: 8:00 PM (EDT)
Location: 214 FINE HALL
An electronic copy of Hongkang's dissertation is available per request. Please email bwysocka@princeton.edu.
________________________________________________________
Mathematical Theory of Machine Learning Models for Estimating Probability Distributions
The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Meanwhile, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject.
This paper aims at solving these problems. First, we provide a mathematical framework from which all the well-known models can be derived. The main factor is whether to use reweighting or transport for the modeling and evaluation of distributions. It leads to different choices of distribution representations and loss types, and their combinations give rise to the diversity of distribution learning models. Beyond a categorization, this perspective greatly facilitates our analysis of training and generalization, such that our proof techniques become applicable to broad categories of models instead of particular instances.
Second, we resolve the aforementioned paradox by showing that both generalization and memorization will take place, but over different time scales. On one hand, the models satisfy the property of universal convergence, so their concentration onto the empirical distribution is inevitable in the long term. On the other hand, these models enjoy implicit regularization during training, so that their generalization errors at early-stopping escape from the curse of dimensionality.
Third, we obtain comprehensive results on the training behavior of distribution learning models. For models with either the potential representation or fixed generator representation, we establish global convergence to the target distributions. For models with the free generator representation, we show that they all possess a large family of spurious critical points, which sheds light on the training difficulty of these models. Furthermore, we uncover 3 the mechanisms underlying the mode collapse phenomenon that disrupts the training of the generative adversarial networks
_____________________________________________________