Institutional tag:

Thematic tag(s):

Q-learning is know to be slow in practice. We will survey three recent Q-learning algorithms, introduced to improve performance: (i) The Zap Q-learning algorithm that has provably optimal asymptotic variance, and resembles the Newton-Raphson method in a deterministic setting (ii) The PolSA algorithm that is based on Polyak’s momentum technique, but with a specialized matrix momentum, and (iii) The NeSA algorithm based on Nesterov’s acceleration technique. We will then introduce a recent generalization of Zap stochastic approximation, establish its stability under very general conditions, and discuss its applications to reinforcement learning.

Dates:

Friday, October 25, 2019 - 11:00

Location:

Inria, room A00

Speaker(s):

Ana Bušić

Affiliation(s):

Inria Paris / ENS

Speaker's URL: