Learning via Nonconvex Optimization: ReLUs, neural nets and submodular maximization
Many problems of contemporary interest in signal processing and machine learning involve highly non-convex optimization problems. While nonconvex problems are known to be intractable in general, simple local search heuristics such as (stochastic) gradient descent are often surprisingly effective at finding global/high quality optima on real or randomly generated data. In this talk I will discuss some results explaining the success of these heuristics focusing on two problems.
The first problem is about learning the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). I will discuss this problem in the high-dimensional regime where the number of observations are fewer than the ReLU weights. I will show that projected gradient descent on a natural least-squares objective, when initialization at 0, converges at a linear rate to globally optimal weights with a number of samples that is optimal up to numerical constants. I will then discuss how this result can be generalized to single hidden layer networks in the over-parameterized regime. The second problem focuses on maximizing continuous submodular functions that emerge in a variety of areas in machine learning, including utility functions in matrix approximations and network inference. Despite the apparent lack of convexity/concavity in such functions, I will show that stochastic projected gradient methods can provide strong approximation guarantees for maximizing continuous submodular functions with convex constraints. In particular, this result allows us to approximately maximize discrete, monotone submodular optimization problems via projected gradient descent on a continuous relaxation, directly connecting the discrete and continuous domains.
Mahdi Soltanolkotabi is currently an assistant professor in the Ming Hsieh Department of Electrical Engineering at the University of Southern California. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is a recipient of the 2018 AFOSR young investigator award. His research focuses on the design and mathematical understanding of computationally efficient algorithms for optimization, high dimensional statistics, machine learning, signal processing and computational imaging. A main focus of his research has been on developing and analyzing algorithms for nonconvex optimization, with provable guarantees of convergence to global optima.