Choisir la langue :

Mirror descent strategies for regret minimization and approachability

First, we present the online linear optimization model, which is a general framework for regret minimization. We construct a family of Mirror Descent strategies with time-varying parameters. We establish regret bounds for those strategies. The time-varying parameters allow to recover as special cases a large number of known strategies: Exponential Weights Algorithm, Smooth Fictitious Play, Vanishingly Smooth Fictious Play, as well as Follow the Perturbed Leader.

Second, we add the assumption that payoff vectors have at most $s$ nonzero components, and aim at determining the optimal regret bounds. We notice that a fundamental difference between gains and losses then appears.

Third, we present a general method for constructing strategies for Blackwell's approachability problem, based on regret minimizing strategies. The unifying character of this approach is illustrated by the construction of optimal strategies for online combinatorial optimization and internal/swap regret minimization. Besides, we establish that Blackwell's strategy is a special case of the family of strategies thus constructed.

Finally, we study the problem of approachability with partial monitoring (i.e. where the decision maker only observes random signals). We establish that the optimal convergence rate is T^{-1/3}.

Friday, December 16, 2016 - 11:00
Inria, room A00
Joon Kwon
CMAP, Ecole Polytechnique