The Epsilon-Greedy /UCB ("upper confidence bound") for MAB (Multiarmed-bandit) problem sometime in reinforcement learning (RL)

NoSuchKey