There are a wide range of data modeling approaches and algorithms that might give marketers great insight—but how do you know which one makes sense for your company?
Before you choose a data modeling technique, make sure you decide on your goals and key performance indicators (KPIs) so you have a strong sense of what kind of marketing data quality is most important to you. This isn’t always easy, as there can be a contradiction between the clarity and insight that the data model provides versus the analytical quality of that model.
There is tension between these two dimensions and you usually need to lose something in one aspect to gain something in another. The more accurate and complex modeling techniques (neural network and “deep learning” as examples) are less intuitive even to the skilled modeler and provide little insight into the actual behavior, besides their direct prediction. This in contrast to decision tree and linear regression, which are usually less accurate but can be easily grasped and understood.
Problems With Panels
Making the most of predictive data analytics isn’t simply about the accuracy of the data modeling technique, but also about the quality of the input of that modeling technique. The saying, “Garbage in, garbage out” applies here. When you start to build a model, you will build a panel of explaining variables for that model. A panel is defined as a “dataset in which the behavior of entities are observed across time.” The quality of your results will highly depend on the quality of the panel.
A typical issue with panels could be not having enough explaining variables for the model to find the information it needs. If you want to predict who will buy a sports car, and all you know is the client’s height and hair color, you probably wouldn’t be able to provide a very good prediction.
Perhaps the panel has bugs or the wrong input. An example here would be explaining variables that see the future and consequently the model learned behavior pattern would not be applicable in real life. In these cases, the data models are working hard but making predictions based on illusions and will therefore not prove valuable in production.
Panels must be debugged to ensure the integrity of the data. Additionally, the more trust you place in the initial panel, the more complex you can get with the data modeling technique you select.
Data Modeling Techniques
Before you decide on the best technique, your goals need to be clearly established. Common goals for predictive marketing analytics include:
- Identifying the best target for acquisition within a prospecting list
- Determining the best ways to cross-sell products and services to existing customers
- Determining opportunities to deep-sell to existing customers
- Preventing churn
Decision-Trees
A decision tree is a technique that uses a tree-like graph or model for decisions and their possible consequences. Decision trees are a fairly simple, yet have a reasonable prediction powers as a modeling technique and are therefore very common and fairly easy to use and understand. They involve multiple variables, so they can help you explain behavior that occurred because of multiple factors.
Decision trees are an automated tool to provide rule based predictions.
Here is an example of a type of prediction a decision tree can provide: Your best prospects in the New England area are males aged 25-45, but in the Pacific area acquisition efforts are better spent on females aged 35-40, and only above a certain income level.
Linear Regression
Simple linear regression is a technique for modeling the relationship between a dependent variable or factor and and one or more explanatory variables or factors. This is a simple model and works well for situations where you want to predict a numerical value that is either whole or fractional.
For example, if you wanted to determine the optimal discount to give a client for the purposes of maximum conversion or determine how much a customer is likely to spend in their next transaction, a linear regression model would work well.
The technique can also be ideal for situations where you want to predict the lifetime value of a customer to project what new customers will spend over their entire lifetime. It is critical to accurately estimate lifetime value and linear regression lets you minimize the accumulated error of the model.
Logistic Regression Techniques
Logistic regression techniques describe situations where the dependent variable is binary (the number of available categories is two). Examples of binary values include answers to questions where there are two options such as yes or no or whether a statement is either true or false. For example, will a specific customer churn in the next 3 months?
Logistic Regression allows identifying connections between variables that are more complex than that of linear regression. It is also better suited to use non-numeric variables.
Logistic Model Trees
Logistic model trees combine logistic regression and decision-trees. In that respect they are optimizing the advantages of both modeling techniques, both with their accuracy and the level of insights they give into the data. The downside is that they are more complex to implement and to validate and tend not to be very common.
Measuring Error
In all of these modeling techniques, the quality of the prediction depends upon the way the automated model quantifies the price of a prediction mistake. Most modern modeling tools allow you to choose that error function. This will also help you chose the right modeling technique.
Therefore, when starting a modeling task, you should look at the way one will use the model prediction. This will help you quantify the error of every specific model. If you are only interested, for example, in the highest part of the distribution, meaning, out of a population of 10 million people, you want to choose 100,000 for a specific campaign, it would make sense to choose a type of model very focused on the top most portion of the population to ensure this population is the right one. In this situation, a linear regression would be less suitable, and logistic regression might be more relevant. If, on the other hand you want to minimize your mistake on all the population, such in specific uses of client lifetime value, linear regression might be the right approach.
In the end, it is critical that marketers align their data modeling approach with their marketing goals. While you might be lucky enough to have an expert research team that handles this, it is important that you have an understanding of the basics of the approaches.
Marketing today is data-driven, and effective marketing leaders need a well-rounded grasp of all marketing disciplines, particularly those that will empower them to make better use of the data they already have.
Guy Gildor is CTO of Pursway.