Asking the Right Questions in Data Analysis

What is required for a company to do effective data analysis? Many would respond, “People with advanced degrees in statistics.” This is most assuredly a worthwhile characteristic. However, I would also add, “The ability to ask the right question!”

Asking the right question typically does not require an advanced degree in statistics. Conversely, having an advanced degree in statistics does not guarantee that the right question will be asked. I have seen too many advanced-degreed data mining professionals who have trouble asking the right question. Often, these individuals get so bollixed up in the numbers that it is difficult for them to think strategically.

An Example

I recently participated in a discussion group on the best ways to build a retail attrition model. The stated goal was to predict which customers are likely to defect and when. It was clear that my fellow participants were both brainy and highly-educated. For example, there were several references to dense academic papers on data mining. Nevertheless, until I raised the question, no one had asked, “Does it even make sense to try to build an attrition model in a retail environment?”

There is no question that attrition models are appropriate for industries in which contractual relationships are the norm between a company and its customers. Financial services, publishing and telecommunications immediately come to mind. For example, we know exactly when a customer cancels his or her credit card, magazine, or cell phone plan. Therefore, it makes good sense to build models to predict which customers will defect. Likewise, it might even be possible to predict when this will occur.

It is not so clear that attrition models make good sense for non-contractual verticals such as retail, catalog and e-commerce, where the likelihood of being a continuing customer is probabilistic. In these sorts of industries, we typically never know for sure if a customer has defected, much less the exact moment in which the defection took place. And, even when we think we know for sure, the reality is not so cut-and-dried. For example:

· Sometimes, a customer requests to never be contacted again. However, with the advent of the Web, the lack of future promotions does not necessarily mean that there will be no future purchases.

· Sometimes, companies receive notification about the death of a customer. However, I have seen examples of responder records tagged by an overlay deceased file. How could that be? The answer is that, in many instances, purchase decisions are made at the household rather than individual level.

So, what options are available to non-contractual industries where there is no way to know for sure if a customer has defected, much less when? Fortunately, many of us have been predicting attrition all along. It’s just that we didn’t realize it. I am referring to ‘implicit’ attrition models that are a byproduct of trying to predict upcoming purchase activity; for example, near-term dollar volume.

Specifically, as a given customer’s point score and corresponding model-segment assignment declines over time, the likelihood that defection has taken place increases accordingly. (For companies that employ rules-driven segments such as recency-frequency-monetary value (RFM), the equivalent is being assigned to less favorable cells over time.)

It is important to note that, in non-contractual verticals, even customers who are assigned to the worst segments still retain some probability of making a future purchase. The same is true of recency, a popular single-variable proxy for defection. I once had a client whose business dynamics were such that several reasonably-sized subsets of customers with recency as high as nine years could be consistently mailed at a profit.

Four Rules of Thumb

The following are four rules of thumb for effectively dealing with attrition in non-contractual industries such as retail, catalog and e-commerce:

1. Don't even try to build an explicit attrition model. Instead, build a model to predict future purchase activity; that is, where the dependent variable (‘target’) is revenue, response, and the like.

2. Such a model will, by definition, also serve as an implicit attrition model. As a given customer’s score and segment assignment degrades, his or her likelihood of having defected increases.

3. In order to try to reduce attrition, develop business rules that are triggered by patterns of downward segment migration, and then run over-time tests to measure their effectiveness. For example, analysis might indicate that customers who first drop from Decile 1 to Decile 2, and then to Decile 3, are very likely to never make a subsequent purchase. In this way, analysis can establish – retrospectively – that attrition has almost certainly taken place. (But, again, we typically never know for sure.) Therefore, it might make sense to execute an anticipatory intervention strategy as soon as a customer makes that first drop, from Decile 1 to Decile 2.

4. Be sure that the model segments (e.g., deciles) are predefined (i.e., ‘hard-coded’). Otherwise, your score definitions will change every time the model is deployed, which will render problematic the business rules and corresponding intervention strategies that you have developed.

Jim Wheaton ([email protected]) is a principal at Wheaton Group.

Related articles: