Contextual Bandits for Advertising Budget Allocation
Paper |
Benjamin Han and Jared Gabor
When allocating budgets across different ad campaigns, advertisers confront the challenge that the payouts or returns are uncertain. In this paper, we describe a system for optimizing advertising campaign budgets to ensure long-term profitability in the face of this uncertainty. Our modified contextual bandit system 1) applies supervised learning to predict ad campaign payouts based on context features and historical performance; 2) extrapolates the payouts to out-of-sample budgets using a simple functional form for the distribution of payouts; then 3) uses Thompson Sampling from the predicted payout distributions to manage the explore-exploit trade-off when selecting budgets. Using our system, we measure an overall efficiency improvement of (22 ± 10)% in the mean Cost Per Acquisition over the previous budget allocation strategy using Markov Chain Monte-Carlo. This system is now responsible for managing hundreds of millions of dollars of annual marketing spend at Lyft.