Stochastic approximation (SA) is a key method used in statistical learning and reinforcement learning. In this talk, we analyse a general SA scheme to minimise a non-convex and smooth objective function. We consider an update procedure whose drift term depends on a state-controlled Markov chain and the mean field is not necessarily of gradient type, therefore it is a biased SA scheme. Importantly, we provide the first non-asymptotic convergence rate under such dynamical setting. We illustrate these settings with the online expectation maximisation and the policy-gradient method for average reward maximisation, highlighting its trade-off in performance between the bias and variance.
Hoi-To Wai received his PhD degree from Arizona State University (ASU) in Electrical Engineering in Fall 2017, B. Eng. (with First Class Honour) and M. Phil. degrees in Electronic Engineering from The Chinese University of Hong Kong (CUHK) in 2010 and 2012, respectively. He is currently an Assistant Professor in the Department of Systems Engineering & Engineering Management at CUHK. Previously, he has held research positions at ASU, UC Davis, Telecom ParisTech, Ecole Polytechnique, and LIDS of MIT. Hoi-To’s research interests are in the broad area of signal processing, machine learning and distributed optimisation, with a focus of their applications to network science. His dissertation has received the 2017’s Dean’s Dissertation Award from the Ira A. Fulton Schools of Engineering of ASU and he is a recipient of a Best Student Paper Award at ICASSP 2018.
For more information about the ESD Seminars Series, please contact Karthyek Murthy at email@example.com.