Dueling Bandits
[Summary & Contributions] | [Relevant Publications]
Summary and Contributions
Dueling bandits can be viewed as an ‘active learning’ extension of ranking from pairwise comparisons, where learning proceeds in rounds; on each round, the learner can select a pair of items to be compared by the user/environment, and the goal is to identify a good item while incurring low ‘regret’ (exploration overhead). Previous work either made stochastic transitivity assumptions on the underlying pairwise comparison model, or assumed the existence of a Condorcet winner; we developed low-regret algorithms for a broader class of models wherein the goal was to identify items from a general ‘tournament solution set’ (a solution concept widely studied in computational social choice). We also developed new results for dueling bandits where the feedback received by the learner is adversarially corrupted.
Relevant Publications
- Arpit Agarwal, Shivani Agarwal, and Prathamesh Patil.
Stochastic dueling bandits with adversarial corruption.
In Proceedings of the 32nd International Conference on Algorithmic Learning Theory (ALT), 2021.
[pdf] - Siddartha Ramamohan, Arun Rajkumar, and Shivani Agarwal.
Dueling bandits: Beyond Condorcet winners to general tournament solutions.
In Advances in Neural Information Processing Systems (NIPS), 2016.
[pdf]