Choisir la langue :

Zap Stochastic Approximation and Reinforcement Learning

Institutional tag: 
Thematic tag(s): 

Q-learning is know to be slow in practice. We will survey three recent Q-learning algorithms, introduced to improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak’s momentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov’s acceleration technique.  We will then introduce a recent generalization of Zap stochastic approximation, establish its stability under very general conditions, and discuss its applications to reinforcement learning.

Friday, October 25, 2019 - 11:00
Inria, room A00
Ana Bušić
Inria Paris / ENS