Choisir la langue :

Collaborative Bandits on a Network

We consider multi-agent, collaborative online learning,
wherein a group of agents connected by a network play a multi-armed
bandit game. Each time an agent takes an action, the corresponding
reward is observed by the agent, as well as its neighbours in the
network. This form of information structure is common in social
networks and distributed learning applications. We study the regret of
various policies in this collaborative learning setting. A key finding
is that plug-in extensions of optimal single-agent learning policies
need not be optimal in the networked setup. More generally, we
identify a class of `non-altruistic' and `individually consistent'
policies, and argue by deriving regret lower bounds that they are
liable to suffer a large regret in the networked setting. We also show
that learning performance can be substantially improved if the agents
exploit the structure of the network, and develop a simple learning
algorithm based on dominating sets of the network. Specifically, we
first consider a star network -- a common motif in hierarchical social
networks -- and show analytically that the hub agent can be used as an
information sink to expedite learning and improve the overall regret.
We also derive network- wide regret bounds for the algorithm applied
to general networks. We conduct numerical experiments on a variety of
networks that corroborate our analytical results.
[Joint work with Ravi Kolla and Krishna Jagannathan (IIT Madras).]

Friday, March 17, 2017 - 11:00
Inria, room A00
Aditya Gopalan
Indian Institute of Science