Analytics at Wharton Research
Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments
Firms using online advertising regularly run experiments with multiple versions of their ads since they are uncertain about which ones are most effective. Within a campaign, firms try to adapt to intermediate results of their tests, optimizing what they earn while learning about their ads. But how should they decide what percentage of impressions to allocate to each ad? This paper answers that question, resolving the well-known “learn-and-earn” trade-off using multi-armed bandit (MAB) methods. The online advertiser’s MAB problem, however, contains particular challenges, such as a hierarchical structure (ads within a website), attributes of actions (creative elements of an ad), and batched decisions (millions of impressions at a time), that are not fully accommodated by existing MAB methods. Our approach captures how the impact of observable ad attributes on ad effectiveness differs by website in unobserved ways, and our policy generates allocations of impressions that can be used in practice.
We implemented this policy in a live field experiment delivering over 700 million ad impressions in an online display campaign with a large retail bank. Over the course of two months, our policy achieved an 8% improvement in the customer acquisition rate, relative to a control policy, without any additional costs to the bank. Beyond the actual experiment, we performed counterfactual simulations to evaluate a range of alternative model specifications and allocation rules in MAB policies. Finally, we show that customer acquisition would decrease about 10% if the firm were to optimize click through rates instead of conversion directly, a finding that has implications for understanding the marketing funnel.
Keywords: multi-armed bandit, online advertising, field experiments, A/B testing, adaptive experiments, sequential decision making, explore-exploit, earn-and-learn reinforcement learning, hierarchical models