If bandits for model evaluation are to determine the payout (e.g. prediction accuracy) of each mannequin, contextual bandits are to determine the payout of each action. In the case of recommendations, an action is an merchandise to show to customers, and the payout is how likely a consumer will click on on it. Currently, the industry’s commonplace for online model evaluation is A/B testing.