On Time-stepping Methods for Gradient-flow Optimization


Gradient-based optimization methods are essential in neural network training in many applications. The evolution of neural network parameters can be considered as an ODE system evolving in pseudo-time towards a local minimum of the objective function. This interpretation allows us to use different time-stepping schemes for the optimization. We will show that existing gradient-descent and momentum methods such as SGD and ADAM can be viewed as special time-discretization of the continuous gradient-flow. We will also consider using IMEX and high order schemes for improved efficiency in the optimization. Some demonstrations on small test problems will be presented.

Slide Deck / Preprint