Ingredients for Successful Promotional Forecasting (Part I)
Introduction
Most retailers derive significant revenue from items sold on promotions. Yet, not many retailers have successfully implemented a promotional forecasting solution. The reasons stem from several challenges associated with managing promotions and accurately predicting their effects. Below is a list of three key challenges and proven solutions that make promotional forecasting more practical and successful. This post is written in the context of forecasting and store replenishment at SKU/Store/Week but the concepts are general.
1. Calculate promotional lifts at an aggregate level
A common challenge to statistical retail demand forecasting is historical sales sparsity at the SKU/Store/Week level. The well known solution to this challenge is to generate forecasts at an aggregate level. In RDF, the forecast generation level is referred to as Source Level. Source level forecasts are then pushed back to the final SKU/Store/Week level using spreading ratios. The spreading ratios are derived using some form of historical data aggregated over time and possibly other hierarchies as well. With an appropriate source level, this aggregation/spreading process is well known to generate more accurate forecasts than generating forecasts directly at the final SKU/Store/Week level.
The key characteristic of a good source level is similarity of trend and seasonal patterns of the underlying historical data. For instance, if all SKU/Store historical sales for a given Subclass (e.g., Men’s Coat) over all stores in a given Area (e.g., Northern States) follow the same trend and seasonal pattern then Subclass/Area would likely be an appropriate source level for all SKU/Store combinations belonging to this particular Men’s Coat/Northern States Subclass/Area.
Promotional forecasting generated at the SKU/Store level can work well for high velocity items. However, for the same reason as above, it does not provide robust forecasts and promotional lifts for slow movers. Some items may not also have enough promotional history so the system can calculate the effects of some planned promotions. To leverage promotional forecasting for a broad range of items including slow movers with a variety of promotional plans, promotional effects must often be derived at an aggregate level. Picking an adequate source level to calculate promotional lifts imposes additional constraints over source levels for seasonal forecasting. All items making a particular aggregation level (e.g., Men’s Coat Subclass/Northern States Area) should have the same promotional schedule – i.e., they should be on promotion at the same time. They should also react the same way to the promotions – i.e., have similar lifts.
Retailers often have to define alternate roll ups to fulfill the conditions for an appropriate promotional forecasting source level. Known business attributes typically define those roll ups. For instance, stores having to the same back to school calendar could define a natural roll ups for items heavily sold and promoted during this period. Statistical analysts can also derive custom product and location clusters that would optimize the definition of adequate promotional forecasting source levels.
2. Separate baseline and promotional modeling
From above, promotional forecasting imposes additional constraints on adequate source levels over the more standard statistical seasonal baseline forecasting. As a consequence, an optimum source level for evaluating promotions may not be the same as the one that best captures trend and seasonal patterns. For instance, SKU/Store/Week might be an appropriate level to evaluate promotional lifts for some high or medium velocity items while a less noisy seasonal pattern could be obtained at an aggregate level. It is also necessary to look at the historical lifts of other items when the first time a specific promotional event is planned. Such instances where the best seasonal and promotional source levels are different can represent a large number of cases. Then separating the seasonal baseline forecast generation process from promotional effect modeling is appropriate. In principle, one could also consider having different source levels for different promotional attributes though at the price of a significant additional complexity that may not be worth the effort.
Such a strategy improves promotional forecast accuracy and robustness with much less manual intervention from forecast analysts. This comes of course at the price of a more sophisticated solution that involves extracting the historical seasonal baseline and trend from the promotional peaks, generating baseline forecasts independently of the promotional lifts and then applying the promotional lifts over the baseline. Experience has shown that the extra efforts to implement such a process are well worth the long term benefits.
3. Cluster promotions
A promotional forecasting system derives demand lifts to be expected by running promotions. At a given time, a specific promotional event typically has the following attributes.
3.1. Price incentive (e.g., BOGO, 20% off).
3.2. Advertisement medium (e.g., mailer, coupon).
3.3. Placement (e.g., front page, store display).
The total number of attribute combinations making a specific promotional event might be large, over a hundred for some retailers. Traditional statistical promotional forecasting software packages like RDF rely upon an underlying stepwise regression algorithm. In the presence of many causal indicators, stepwise regression has limitations that are well documented in the literature. In lay terms, every planned promotional event of an item must have been run at least once in its own history. With raw promotional attributes as causal indicators to the stepwise regression algorithm, this assumption is often violated. For retailers, the business impact can be disastrous if it is not appropriately handled. For instance, the lift attributed to a single promotional attribute could end up in non-reasonable ranges like millions. Hence, it is important to reduce the number of causal variables passed to the stepwise regression model. However, too few causal variables lead to poor model fits and eventually bad promotional forecasts.
The objective of promotion clustering is to reduce the number of causal factors that the underlying stepwise regression algorithm uses. It does so by first analyzing the significance of different causal factors so as to eliminate the non-significant ones: causal factors that are consistently insignificant or those that have alternates are dropped. Then it combines the causal factors that have similar lifts: those that are mutually exclusive and produce a similar effect are combined into one causal variable.
Conclusion
The above techniques are proven ingredients to a successful promotional forecasting solution. Applying those techniques can make the difference between success and failure. In a next post I will address more advanced topics related to promotional modeling.

