Graduate Student Seminars
Implicit Biases of Stochastic Gradient Descent
Abstract: Deep Learning relies on the ability of stochastic gradient descent (SGD) to navigate high-dimensional non-convex loss landscapes and return minimizers which generalize to unseen data. However, this process remains poorly understood. I will present two recent results which attempt to explain the generalization ability of SGD by proving that SGD has a strong preference for "flatter" minimizers which generalize better.