Choisir la langue :

Stochastic Combinatorial Optimization under Bandit Feedback and Its Application to Online Shortest Path Routing

Institutional tag: 
Thematic tag(s): 

In the first part of this talk, we investigate stochastic combinatorial multi-armed bandit problems. We first study a lower bound on the regret that is satisfied by any problem instance. The derivation of our bound leverages techniques developed for optimal control of Markov chains. The proposed lower bound is tight in the sense that there exists an algorithm that attains the bound on all problem instances, although the algorithm might be computationally expensive. As the proposed lower bound is implicit, we further provide a simplified lower bound and investigate its scaling in terms of the dimension of the decision space for some combinatorial structures. We propose ESCB, an index policy that is relying on a novel index function for combinatorial bandit problems. We further show that ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice.The second part of the talk concerns online shortest path routing for packet transmission over multihop networks, where link costs or delays are stochastically varying. A routing policy selects a path to the destination on a packet-by-packet basis. We formulate the problem as a combinatorial bandit optimization problem and consider two scenarios: Source-routing where the path selection can be done at the source, and Hop-by-Hop routing where path selection can be done in the network as the packet progresses toward the destination. In the case of source routing, feedback on the realized delays on each link on the path from source to destination would be available when the packet reaches the destination. For the two scenarios, we provide tight and problem-specific regret lower bounds. We further show that regret lower bounds for source-routing policies and that for hop-by-hop routing policies are identical, indicating that taking routing decisions hop by hop does not bring any advantage asymptotically. We further propose two algorithms, with a tradeoff between computational complexity and performance, which improve upon the regret bound of the existing algorithms for the considered problem.

Friday, September 22, 2017 - 11:00
Inria, room A00
Sadegh Talebi