Scaling Generative Pre-training for User Ad Activity Sequences
Sharad Chitlangia, Krishna Reddy Kesari and Rajat Agarwal
User activity sequence modeling has significantly improved performance across a range tasks in advertising spanning across supervised learning tasks like ad response prediction to unsupervised tasks like robot and ad fraud detection. Self-supervised learning using autoregressive generative models has garnered interest due to performance improvements on time series and natural language data. In this paper, we present a scalable autoregressive generative pre-training framework to model user ad activity sequences and inspect its scaling properties with respect to model size, dataset size and compute. We show that test loss on pre-training task follows power law scaling with respect to model size, with larger models being more data and compute efficient than smaller models. We also demonstrate that improvement in pre-training test loss translates into better downstream task performance by benchmarking the models on conversion prediction and robot detection tasks in advertising.