Previously, greedy algorithms have been shown to have good approximation and estimation properties for superpositions of a sigmoidal function or other ridge activation functions. In each step the parameters of a new sigmoid are fit to the residuals of the previous sigmoids. However, that has remained a challenging task to produce guarantees for the parameter search for each sigmoid. Here we discuss developments of two algorithms for this task. One is an implementation of adaptive annealing, in which internal modifications of the sigmoid is made, including parameter squashing and variable replication that permit individual parameter effects to be small, allowing stability of the adaptive annealing solution, at the expense of increasing dimensionality. The other algorithm is a convergent nonlinear modification of the tensor methods of Anandkumar and her colleagues, motivated by optimization of the inner product of the residuals and certain forms of sigmoids, in the context of data from Gaussian or other rotationally invariant distributions. This work is joint with Jason Klusowski.
Computationally feasible greedy algorithms for neural nets
Andrew Barron, Yale University
Feb 22 2016 - 4:30pm