Updating the New DM

You Know What You Do


UPDATING THE NEW DM

The Bayesian Alternative

If you’ve been following this column over the last year or so, you know that I’ve become somewhat obsessed with the issue of multiple models. For those who haven’t been paying close attention, meaning just about anyone with a life, here’s the problem: You want to build a model to predict some outcome


UPDATING THE NEW DM

What if It All Went Away? THE TREND is definitely not good.

Privacy is not an issue that’s going to disappear, so direct marketers will have less data available for individual and/or household-level list overlays. If auto registration data will no longer be available except on an opt-in basis, can credit card data, except for the explicit purpose of granting credit, be far behind? Probably not.

But the real question is: Can marketers live without household-level demographic and financial data? Of course they can. And if they honed their modeling skills, they might even be better off.

Let’s examine some of the ways household-level data is used:

– To build better customer response and performance models. With some exceptions, customer transaction data (recency/frequency/monetary, product purchase, tenure, source) and a handful of other related variables are all that’s necessary to build satisfactory response and performance models. Additional demographic variables much more often than not don’t result in a larger spread or a more accurate model.

– To profile customer response and performance deciles. Based on the recent case regarding auto registration information, it may be possible to use overlay data for research purposes, and profiling would qualify as research. But even if overlay data couldn’t be used for profiling a gains chart, short telemarketing surveys could provide all the profiling information required, at a reasonable cost and perhaps with more accuracy.

– To build new customer acquisition models to be used against response lists. For those not familiar with this application, the idea here is to append household-level data to the names coming out of a merge/purge of multiple response lists.

Then these names are scored using a model, built from prior mailings, that gives specific weights to the demographic variables contained in the model. Prospects with scores that are lower than some criteria (perhaps the bottom two deciles) are dropped from the promotion.

This process is both time-consuming and complicated. Time-consuming because scoring and suppression have to take place after the merge, and complicated because of several issues – among them, how non-matched names will be scored and how missing data will be handled within the matched population.

Some Alternatives Working on the assumption that direct marketers will still have access to the detailed census data that’s collected at the block-group level and recompiled at the ZIP+4 level, they should be able to build response and performance models that for all practical purposes are as effective as models built on household-level data. What’s more, these ZIP or ZIP+4 models are easier to construct and are much less expensive to implement than those based on household-level data.

Model-Building Tips There are two keys to building good models based on census data. The first has to do with variable creation, the second with technique.

– Creating new variables: Companies should assemble their own historical response and performance indices based on past promotions and customer behavior.

Working at the ZIP+4 level, it’s possible to build historical indices, or simply historical response and/or performance rates, which can then be collected at either the five-digit ZIP code level, the sectional center facility (SCF) level, or what’s frequently even better, indices or historical rates aggregated at a Prism or a MicroVision segment level. (Each commercial clustering scheme associates a demographic or lifestyle segment with a ZIP+4 code). These historical results are then treated as potential independent variables in a marketer’s response or performance models. And in our experience, one or more of these historical variables will enter a model as one of its most important variables.

For example, a response model we built for a continuity program (the model had a top decile lift of 270) contained only three variables, and two of them were historical indices. The third was a principal component analysis (PCA) variable that compared each ZIP code’s educational level with the average educational level within the entire mailing population.

– Which brings us back to the subject of modeling technique – the second component of good census data models.

If you’ve dealt with census data you know that while there are some 300 to 400 census variables, there are only about 20 major data categories, and these are presented as frequency distributions.

In our experience, models built on individual census variables – rather than on a PCA analysis of the census category – are much easier to construct, but are far less stable and generally of poorer quality.

The Bottom Line We certainly don’t wish to see the demise of individual or household-level overlay data for direct marketing use. But what if it should happen? Well, if we’re smart and take advantage of the data and techniques at our disposal, we’ll be able to make up most, if not all, of the losses imposed on us.