Articles on: Experiments

Experiment Result Analysis

Accessing Experiment Results



Understanding how to interpret the results of an executed experiment is crucial for confidence in the incrementality data. You can access the experiment results screen for any completed experiment by clicking the results button on the right side of the row in the Experiments screen. It replaces the edit button for all experiments that are in the Completed status.




Result Graphs



The experiment results screen displays information about the experiment along the left sidebar. In the main area, there is a dropdown arrow at the top where the currently viewed metric can be changed. Below that, there are two line graphs. In both graphs, the treatment date (the date that the action specified in the experiment was executed) is represented by the gray vertical dotted line.

Actual vs Synthetic Control





The first graph depicts 3 primary properties of the completed experiment over time:

Blue - The actual values of the selected metric in the country (as a whole) treated in the experiment
Gold - The predicted values of the synthetic control
Dotted Gold - The upper and lower bounds of the 95% confidence interval. This can be interpreted as the margin of error in the synthetic control’s approximation of the treatment country’s actual values.

Pre-Treatment Period



The gray vertical dotted line represents the treatment date and the start of the experiment. The area to the left of it is the pre-treatment period. As mentioned in the Methodology section, the blue line and solid gold line should be very close in that period. Otherwise, the experiment would have been rejected.

Post-Treatment Period



The area to the right of the gray vertical dotted line is the pre-treatment period, or the time during which the experiment was running. In this period, the blue line represents the selected metric’s actual values in the treatment country and the gold line represents the expected values had the treatment never occurred.

Therefore, any difference between the lines in that period represents the change in incrementality within the treatment country caused by the treatment. If incrementality exists, the blue line is expected to drop below the gold line.

Differences



The second graph depicts the difference between the blue and gold lines for convenience (blue minus gold) so it’s just another way of looking at the same data plotted in the first graph. Therefore, in the pre-treatment period, the line should hover around zero (meaning the actual and control values are very close). Any value below zero in the post-treatment period represents incrementality in the selected metric.




Statistical Significance



The p-value is located at the top of the Experiment Results screen. Lower p-values indicate more statistically significant impact of the treatment (or statistically significant incrementality) while higher p-values indicate less. From a more technical perspective, the null hypothesis is no incrementality, so lower p-values indicate a greater likelihood that the null hypothesis should be rejected. The higher a p-value is above 0.05, the more likely one or more of the following is true:

There was not enough of a difference between the actual values in the treatment country and the values predicted by the synthetic control in the post-treatment period.
The synthetic control wasn't confident in its predictions, resulting in a wide confidence interval.
The actual values in the treatment country were so low that treatment impact was difficult to detect.

Keep in mind that statistical significance refers to the impact or incrementality, not the data. If the p-value is high, it doesn't mean that there wasn't enough data or the experiment duration wasn't long enough. It's just means the incrementality found wasn't statistically significant. Also, just because the p-value is high doesn't mean we throw the results away. We still record and use them in the same way as results that have low p-values.


Timing of Results



Ground truth incrementality results are only computed for metrics that are available, as they become available, and only when new input data is imported into Polaris. For example, the day after an experiment is completed, assuming new input data is imported that day, results will only be available for cohort day 0 metrics, which includes installs. Day 1 metrics will be available the following day and so forth (assuming new data is imported each day).

Updated on: 12/08/2022

Was this article helpful?

Share your feedback

Cancel

Thank you!