Multi-armed stochastic bandits are a now well researched problem: lower bounds for the attainable regret are obtainable for most settings and there are several algorithms matching these bounds in the base case. However, while the lower bound theory adapts readily to variations of the problem, the algorithms do not. I will present two settings where we are given structural information, which we want to use to get better performance than the agnostic strategy.
First, I will present a combinatorial semi-bandit problem in which we are given information on the noise structure and design a strategy using this a priori knowledge. The second problem, bandit with categories, will be an example of optimistic algorithms failing to get optimal regret and a case where the lower bound does not capture exactly the optimal behaviour for arms that should not be pulled often. We refine the analysis of the recent OSSB algorithm to get an asymptotically optimal strategy with finite time guarantees.