Institutional tag:
Thematic tag(s):
Q-learning is know to be slow in practice. We will survey three recent Q-learning algorithms, introduced to improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak’s momentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov’s acceleration technique. We will then introduce a recent generalization of Zap stochastic approximation, establish its stability under very general conditions, and discuss its applications to reinforcement learning.
Dates:
Friday, October 25, 2019 - 11:00
Location:
Inria, room A00
Speaker(s):
Ana Bušić
Affiliation(s):
Inria Paris / ENS
Speaker's URL: