Final Oral Public Examination

Other
Dec 8, 2025
1:30 - 2:30 pm
Fine Hall 224
On the Instability of Stochastic Gradient Descent: The Effects of Mini-Batch Training on the Loss Landscape of Neural Networks
 
Advisor: René A. Carmona
 
Abstract:
Machine learning systems, particularly neural networks, have become ubiquitous, owing to their exceptional expressive power in modeling highly complex processes and distributions. However, this expressivity brings significant optimization challenges: the resulting loss landscapes are notably non-convex and intricate, limiting the applicability of traditional optimization theory methods. Moreover, large-scale neural network training typically employs stochastic training procedures that further complicate optimization analysis. This dissertation investigates how the geometry of the neural network loss landscape interacts with the stochastic dynamics induced by mini-batch stochastic gradient descent (SGD), which—together with its variants—serves as the dominant optimization method in modern deep learning. 
 
We begin with an overview of prior work on neural network loss landscapes and the empirical and theoretical analyses of SGD noise. Subsequently, we identify and characterize a dis-tinct instability regime of mini-batch SGD training—termed the Edge of Stochastic Stability (EoSS)—which emerges during training and critically shapes the evolving loss landscape, ex-tending the previously known instability regime of full-batch deterministic training. Central to this characterization are our instability framework and a new curvature quantity, Batch Sharpness, which we show—empirically and theoretically—to govern the onset of instability in SGD. Building on the instability framework, we distinguish two types of oscillatory behavior in SGD—noise-driven and curvature-driven—and show that only the latter signals instability relevant to landscape adaptation. We further examine the implications of operating within the EoSS instability regime, detailing its effects on the landscape geometry, its implications for momentum-based training, and its consequences for modeling the dynamics of SGD. 
 
Altogether, this dissertation provides a unified view of how stochastic training alters, con-strains, and is constrained by neural network loss landscape geometry, offering a new lens for understanding and predicting the behavior of modern deep learning optimization.