Making Rewards More Rewarding: Sequential Learnable Environments for Deep Reinforcement Learning-based Sponsored Ranking

Chen Wang, Aidan Finn and Nishan Subedi

Reinforcement Learning (RL) methods have risen in popularity among general ranking systems. However, despite having properties suitable for sponsored ranking problems, Reinforcement Learning methods remain underexplored in this area. A major reason behind this gap is the dilemma of exploration: random exploration is prohibitively expensive in sponsored search ranking, with the potential to cause significant revenue loss. To address this concern, we study properties of a simulated environment for Reinforcement Learning in sponsored ranking. We demonstrate that by augmenting a learnable simulated environment based on intuitive design principles, we can significantly improve RL performances and boost the explainability of the model. We test our method with a Deep Deterministic Policy Gradient agent, and experimental results show our learned simulated environment outperforms existing methods. Furthermore, since our method is agent agnostic, it paves the way to a wide range of Reinforcement Learning applications to the sponsored ranking problem.