Put Your Money in the Right Place

Posted on by Chief Marketer Staff

Problems often can occur when joining tables in your marketing database. Joining tables can be thought of as linking them based on one or more common fields, such as a customer number. Potential dilemmas can include getting different answers based on whether the link is an inner join, which is simply the intersection between two tables, or an outer join, which behaves like a union in that it returns data from one table whether or not there’s a match in the other.

I’m not saying you shouldn’t join tables. Indeed, you don’t have much of a choice, unless your database consists of a single Great Big Table. Database administrators like to store data in separate tables for efficiency. For example, if a single customer purchases three items, it’s not a good idea to repeat her name and address three times. Instead, a good DBA will store the name and address once in a customer table, and then write out each of the three items to a separate items table. The fun begins when we need to link these tables back together again in order to run a selection that needs data from each table.

Whenever a row in one table matches more than one row in another, SQL databases repeat the values from the first row for each matching row in the other table. In other words, if a row in Customer matches three rows in Items, the database repeats the address data for each item, precisely as if our DBA had stored them together to begin with. Under normal circumstances, this is exactly what we want, because it lets us generate reports that show item sales by location and so on.

On occasion, however, joining tables can cause some really funky results, and you need to be aware of them before you rush in and tell your boss about the spectacular results you’ve achieved with your creative genius.

Let’s use a catalog database as our example. Most are very straightforward, containing three main tables:

  • Customer

    Storing names and addresses along with demographic data.

  • Orders

    Data such as catalog number, promotion code, order number, order date and payment type (cash vs. credit card).

  • Items

    Order line-item detail such as SKU number, quantity purchased, amount and tax.

To make marketing queries easier and faster, what many people like to do is maintain aggregate totals at various levels of detail. For example, you might calculate a total lifetime paid amount and most recent order date and store that on the customer table for fast and convenient access. It’s also pretty common to compute the total order amount and place that on the orders table. Our problems arise when we join a table containing an aggregated value with another table, and the aggregated value gets repeated for each matching row.

Say we ran a promotion that offered a free gift to everyone placing an order of $15 or more, and we want to see how well it did. We’ll produce a report showing the number of orders, total revenue, number of items shipped and average order size. Because this report requires order as well as item information, we need to start by joining the tables.

This is precisely where you can get into trouble if you’re not careful.

All databases contain standard built-in functions for aggregating data. Three that are frequently used in marketing queries are COUNT (to count how many rows match a given condition), SUM (to summarize a numeric field) and AVG (to compute an average). So to produce our report, it seems reasonable to assume that we’ll COUNT the number of orders, SUM the order amount, SUM the item quantity and AVG the order amount.

The reason our assumption doesn’t work is made clear in the chart on page 43. Note that customer Bonnie has placed a single order for items A, B and C. The total order amount for these three items is $28.50. But because this order contains three items, the amount gets repeated for each one when the tables are joined. This means when we SUM the order amount, we’re producing a total that is three times too large. Sounds like a nifty new Enron-like accounting technique, doesn’t it?

Further study of the diagram shows that the order amount is simply a sum of the item amounts on each order, and what we should be totaling to get the right answer is the item amount, by entering SUM (Item_Amount).

But there are a couple of other problems that warrant discussion: how to count the number of orders, and how to calculate the average order size.

There are two

More

Related Posts

Chief Marketer Videos

by Chief Marketer Staff

In our latest Marketers on Fire LinkedIn Live, Anywhere Real Estate CMO Esther-Mireya Tejeda discusses consumer targeting strategies, the evolution of the CMO role and advice for aspiring C-suite marketers.



CALL FOR ENTRIES OPEN



CALL FOR ENTRIES OPEN