Separate Models for Separate Segments?

Posted on by Chief Marketer Staff

One of the ways you can improve modeling results is to look for segments within your customer database that have different relationships to variables such as recency, frequency, monetary value and products purchased.

Using contrived data, let’s look at how this works.

First Step Is Key

The trick is to determine if the strength of the relationships differs from segment to segment. For example, suppose you believe your sales are correlated with two variables, X1 and X2. Ask your statistician to create two “scatter diagrams” so you can see the relationships and calculate the correlation coefficients.

(The correlation coefficient measures the strength of the linear relationship between two variables. Its value ranges between 0, meaning no relationship, and 1, signifying a perfect relationship.)

Using a data set created for this column, we’ve done that in the graphs shown at right.

Your hunch appears to be correct. Your sales (Y) are positively correlated with X1 and also with X2. And while the correlation coefficients – .45 and .64 – are not in the great .8 to .9 range, they’re not weak, either.

Now that you’ve discovered two variables related to sales, you want to build a regression model. Using the same data set that produced the results above, have your statistician run the data through a regression procedure, which will produce the following equation:

Y = 31.5 + 9.2*X1 + 6.7*X2, with an R-squared of 59%.

Not bad. Our simple two-variable example produced an equation or a model that explains 59% of the differences in customer behavior. (R-squared is a statistic that measures the percent of variation explained by the model.)

What if Relationships Differ?

Next, suppose you realize that while sales to your customers were correlated with variables X1 and X2, your customer file was really made up of three demographic segments – young, middle-aged and old – and you suspect the relationship between sales and X1 and X2 might not be the same for each segment.

What could you do? Because you’ve identified three segments you could use this information in your model. How? Have your statistician create two new dummy variables DY and DM. Young customers would be assigned a “1” on variable DY and a “0” on variable DM and vice versa for middle-aged customers. Customers with “0” values on both dummy variables would be understood to be in the old segment.

Your statistician would then run the data through the regression program again and would arrive at the equation that follows.

The equation now has four variables – the original two and the two dummy variables:

Y = 428 + 8.4*X1 + 7.6*X2 – 539.5*DY – 804.4*DM, and R-squared goes to 86%.

Again, your hunch was correct – each segment has a different relationship with X1 and X2.

Further Improvement

Your statistician now suggests that the results could be improved even more if you looked for the interaction between the segment identifiers and the individual variables themselves. You have no idea what this means but it sounds good. So you try it.

What you come up with is an equation with eight variables: the two original, the two dummies and four variables representing the interaction of the original variables with the dummy variables:

Y = 4 + 7*X1 + 13*X2 – 1*DY + 1*DM – 2*DY*X1 – 5*DY*X2 + 4*DM*X1 – 10*DM*X2, and an R-squared of 100%.

What happened?

Well, we discovered in our made-up example that each segment behaves differently with regard to variables X1 and X2. What’s more, by understanding the different relationships between sales and the two variables, we built – in this artificial case – a perfect model!

(Of course, in actual practice you will never be able to build anything close to a perfect model.)

Result: A Simpler Solution

So what’s the lesson to be learned? If you suspect that different demographic or lifestyle or attitudinal segments might display different relationships with your key performance variables, try building separate models for each segment.

Building separate models, rather than creating one equation with all dummy and interaction variables, as we did above, is a simpler solution that’s more likely to be understood and less prone to implementation errors.

For a copy of the data set used in this column, e-mail [email protected].

More

Related Posts

Chief Marketer Videos

by Chief Marketer Staff

In our latest Marketers on Fire LinkedIn Live, Anywhere Real Estate CMO Esther-Mireya Tejeda discusses consumer targeting strategies, the evolution of the CMO role and advice for aspiring C-suite marketers.

	
        

Call for entries now open



CALL FOR ENTRIES OPEN