Articles on: Modeling

Marketing Input Multicollinearity

Basics



Multicollinearity can be a problem in MMMs because the more correlated the spend/impressions of one channel is with other channels, the more difficult it is for the model to isolate the impact of that channel. That said, the longer the time period of the input data, the less likely multicollinearity will be a significant problem. The reason is, there are so many factors at play that determine spend and impressions for any given campaign, most of which are outside of the control of the marketer (e.g., ad network algorithms, bid competition, industry fluctuations, etc.). Over enough time, the daily variation in spend and impressions of each campaign will usually naturally diverge from the others.


Collinearities File



The collinearities file contains VIF (variance inflation factor) statistics for each channel, campaign, and site (vif column). These statistics are evaluated after the input data is transformed and standardized (which, at least in part, is meant to help deal with multicollinearity) and measure how much the variance of a coefficient is increased because of collinearity. Because most media mix models (MMMs) include spend/impressions for a large number of channels and campaigns, simple collinearity diagnostics like 2-dimensional correlation matrices don’t suffice.

In general, VIF above 10 means there is enough multicollinearity detected to create some level of uncertainty in the model. The higher the VIF statistic, the greater the uncertainty. Often, it can be helpful to run experiments on traffic that has high VIF. That ground truth has even greater value due to calibration that automatically improves the model thereby removing the uncertainty.

Updated on: 22/08/2022

Was this article helpful?

Share your feedback

Cancel

Thank you!