Contextual Bandits for Advertising Budget Allocation

Benjamin Han and Jared Gabor


When allocating budgets across different ad campaigns, advertisers confront the challenge that the payouts or returns are uncertain. In this paper, we describe a system for optimizing advertising campaign budgets to ensure long-term profitability in the face of this uncertainty. Our modified contextual bandit system 1) applies supervised learning to predict ad campaign payouts based on context features and historical performance; 2) extrapolates the payouts to out-of-sample budgets using a simple functional form for the distribution of payouts; then 3) uses Thompson Sampling from the predicted payout distributions to manage the explore-exploit trade-off when selecting budgets. Using our system, we measure an overall efficiency improvement of (22 ± 10)% in the mean Cost Per Acquisition over the previous budget allocation strategy using Markov Chain Monte-Carlo. This system is now responsible for managing hundreds of millions of dollars of annual marketing spend at Lyft.