Tag: rewards
-
Preference-based Pure Exploration
Preference-based Pure Exploration arXiv:2412.02988v1 Announce Type: new Abstract: We study the preference-based pure exploration problem for bandits with vector-valued rewards. The rewards are ordered using a (given) preference cone $mathcal{C}$ and our the goal is to identify the set of Pareto optimal arms. First, to quantify the impact of preferences, we derive a novel lower…