Department of Computer Science

Prof. Emo Welzl and Prof. Bernd Gärtner

Mittagsseminar Talk Information |

**Date and Time**: Thursday, December 19, 2013, 12:15 pm

**Duration**: 30 minutes

**Location**: CAB G51

**Speaker**: Hemant Tyagi

In this talk, we will look at some results from a paper by Auer et al. (SIAM J. Comput.'02) on the adversarial version of the classical multi-armed bandit problem. In this problem version, a player is given access to a set of K strategies (or arms). At each round t=1,2,.. the player chooses a strategy and receives a corresponding reward for it. The rewards are assumed to be assigned to the strategies by an adversary arbitrarily. The goal of the player is to play the strategies so as to minimize the cumulative expected regret i.e. the difference between the total expected reward of the best constant sequence of strategies and the total expected reward obtained by the player over T rounds. Assuming the adversary to be oblivious to the actions of the player we will first see a randomized algorithm that achieves a regret bound of O((K T log K)^{1/2}) against all oblivious adversaries. We will then look at the construction of a hard adversary against which any algorithm incurs a regret of \Omega((KT)^{1/2}).

