GRADUATE STUDENT SEMINAR: Implicit Biases of Stochastic Gradient Descent, Alexandru Damian

Implicit Biases of Stochastic Gradient Descent

Abstract: Deep Learning relies on the ability of stochastic gradient descent (SGD) to navigate high-dimensional non-convex loss landscapes and return minimizers which generalize to unseen data. However, this process remains poorly understood. I will present two recent results which attempt to explain the generalization ability of SGD by proving that SGD has a strong preference for "flatter" minimizers which generalize better.