Estimating the instantaneous survival rate of digital advertising and marketing IDs: LIFESPAN by Cox-Proportional

Nilamadhaba Mohapatra, Humeil Makhija and Swapnasarit Sahu

Finding the active and inactive device IDs1(ID) in the digital advertising and marketing domain is one of the most crucial tasks in terms of the cost and quality aspect. Keeping the IDs for a longer time will increase the load for the downstream pipelines that incur more storage and computation cost. This can also lead to digital campaigns(advertising or marketing) with low active users thus degrading the performance. Though quality can be improved by putting a constant time to leave(TTL) to each of the IDs, deter-mining an optimal TTL is a tedious task. These IDs are the unique identifiers for the digital domain hence treated as the currency.It also plays an important role in the engineering framework for keeping all other attributes in the storage being linked to it. So, by putting a smaller TTL, losses of ID prematurely can lead to multiple loss of information. This can affect the segment2volume export fora campaign largely. On the contrary, if higher TTL is proposed, it can lead to the original problem of cost and computation. Checking an individual ID is active or not in realtime is almost impossible.While most of the non-feedback systems run on TTL based methods to purge the IDs and clean the database, in our paper we propose a granular machine learning-based approach which learns from implicit feedback. We take the bid request from DSPs3as feedback which can act as a proxy for individual ID’s activity. We created multiple duration parameters from this implicit feedback and experimented with different techniques such as Kaplan-Meier andCox-Proportional Hazard models to build a robust, learnable, and incremental model. We considered the attributes present for theIDs as covariates and built a Cox Proportional Hazard model with0.9 concordance score. For a billion scale profile store this is an excellent benchmark