Choisir la langue :

Random times, filtrations and the bandit problem

Institutional tag: 

I will discuss two versions of the bandit problem (if time permits) in which dependencies over time play a crucial role and probabilistic machinery is needed to allow for concentration inequalities to be of use. In particular, I will talk about the restless bandit problem in the case where the pay-offs are not necessarily independent over time nor across the arms, but the joint distribution over the arms is phi-mixing. Even though this version of the problem provides a more realistic model for most real-world applications, it cannot be optimally solved in practice since it is known to be PSPACE-hard. I will characterise a special setting where good approximate solutions can indeed be found via simple UCB-type algorithms and I will discuss how one can deal with dependencies introduced by sampling arms at random times. Finally, I will present shortly results for the bandit problem where pay-offs are received anonymously and with some unknown delay. Like in the restless bandit problem dependencies complicate the derivation of algorithms and regret bounds and I will highlight one key difficulty that occurs in this setting.

Friday, December 15, 2017 - 11:00
Inria, room A00
Steffen Grünewalder
Lancaster University