A Large Scale Benchmark for Uplift Modeling
Eustache Diemert, Artem Betlei, Christophe Renaudin and Massih-Reza Amini.
Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . To foster research in this topic we release a publicly available collection of 25 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor. We provide details on the data collection and sanity checks performed that allow the use of this data for counter-factual prediction. We formalize the task of uplift prediction that could be performed with this data, along with the relevant evaluation metrics. Finally we show that the dataset size makes it now possible to reach statistical significance when evaluating baseline methods on the most challenging target.