Articles on: Experiments

Experiment Methodology

First, a brief overview of our preferred incrementality experimentation methodology is in order. There are other documents and materials that delve deeper into the data science, so the purpose of this explanation is only to provide you with a high level intuition.

The Old Way: Randomized Controlled Trials

There are many ways to design incrementality experiments. The most popular has historically been a randomized controlled trial (RCT), usually called incrementality testing. This design is not privacy-compatible though since a large number of device IDs must be known ahead of the experiment so that they can be randomly divided into a control group and a treatment group, where only the treatment group is served ads.

The New Way: Privacy-Compatible Synthetic Controls

We chose a privacy-compatible methodology that is robust, widely utilized in other fields, and impervious to platform changes including ATT/SKAdNetwork on iOS 14. The technique is called the synthetic control method. A Polaris automatically designs, validates, and proposes experiments that only affect a single country (the treatment country).

Control Fitting and Validation

In the validation stage, Polaris fits a synthetic control group that is essentially a weighted average of countries other than the treatment country which match up closely with the treatment country. The idea is to use the synthetic control to predict what would’ve happened had the treatment (usually a pause) never happened. Each metric gets its own synthetic control. If a closely matching control can’t be identified for any metric, the experiment is discarded and the treatment country is flagged as invalid for experiments.

This methodology ensures that any incremental impact discovered by the experiment was caused directly by the treatment and not some other factor, but it only works if any extraordinary events that happen during the experiment affect both the treatment and control countries. For example, if a product bug is pushed only to users in the treatment country, the country level performance could tank, but since we can't control for it because it didn't affect any other country, the decrease would be misattribued to the treatment. If the bug is pushed globally, the synthetic control will be able to expect the performance hit.

Each potential experiment is further validated through placebo testing. This process can be thought of as a battery of simulated experiments on the data set. Prior to running an experiment, no treatment has happened. By simulating experiments in random countries and with random treatment dates, we can ensure that the future experiment will be valid when we find no statistically significant impact in any of the “placebo tests” since nothing has been treated yet. This process is also used to estimate the minimum duration of the experiment required for statistical significance.

Results Computation

Once an experiment is scheduled and executed, the incremental impact of the treatment (usually a traffic pause) can be computed by subtracting the actual performance in the treatment country from the expectation set by the synthetic control. In most cases, a pause will cause metrics to decrease in the treatment country. Any decrease below the synthetic control's expectation represents the incrementality that was lost by pausing the traffic versus the parallel universe in which the pause never happened.

MMM Calibration

These ground truth findings are automatically used to calibrate the model that is used to output all incrementality metrics across the entire media mix, even traffic that was not treated in the experiment. Therefore, experiments have a dual purpose: 1) to increase accuracy of incrementality metrics for some traffic in a specific country to 100% and 2) to substantially increase the accuracy of incrementality metrics for all other traffic, but to less than 100%.

Calibration improves accuracy of the MMM in order from greatest impact to least:

100% accuracy for treated traffic in the treatment country
Very large accuracy improvements to non-treated traffic in the treatment country
Potentially large, but usually smaller accuracy improvements to treated traffic in non-treatment countries (based on how similar the model feels each country is to the treatment country)
Potentially large, but usually smaller accuracy improvements to non-treated traffic in non-treatment countries (based on how similar the model feels each country is to the treatment country)

We accomplish these improvements through the use of Bayesian priors. We use extremely strong priors for treated traffic in the treatment country and weaker priors for treated traffic in non-treatment countries. In essence, the model is solving for a lot of unknown variables, each equating to a factor that may impact overall performance including organic demand and marketing incrementality. By injecting a prior based on ground truth findings, we are essentially replacing one unknown variable (e.g., a single channel's incrementality) with a known value, making it far easier for the model to accurately solve for the remaining unknown variables.

Updated on: 11/08/2022

Was this article helpful?

Thank you!