“False Positives” an E-mail Negative

For those e-mailers who go out of their way to do everything right—getting permissions, using double opt-ins, employing an e-mail service provider, applying SPF authentication and generally treating the Can-Spam Act as the e-mail equivalent of the Bill of Rights—here’s some disturbing news: Those measures are no guarantee that your message will get delivered. That’s the main conclusion of a study performed by Pivotal Veracity, a service company that gives e-mailers the chance to track their communications all the way from their servers to the average recipient’s inbox. According to that research, 54% of companies may be at risk of looking like spammers to ISPs, at least some of the time. As a result, the ISPs are shunting the e-mail to the receiver’s spam folder.

These “false positives” introduce two kinds of uncertainty into the e-mail channel, says Deirdre Baird, Pivotal Veracity president/CEO. First and foremost, they prevent legitimate communication from reaching the user. But equally troublesome is that the sender never knows that the message didn’t make it all the way into the receiver’s inbox, because most e-mail reports report only bounces: either “hard” bounces for invalid addresses or “soft” ones because a recipient’s mailbox is full.

Mailers have no idea of how affected they are by false positives, because standard e-mail technology doesn’t track what happens to e-mail after it reaches the receiving ISP, according to Baird. “Many mailers calculate that the amount of their e-mail that gets ‘delivered’ is the amount of mail they send minus bounces they receive back,” she says.

Pivotal Veracity, which operates a network of test e-mail accounts as part of its tracking system for increasing deliverability, wanted to find out how far off the mark that assumption was. Beginning in April, the company chose 100 mailers from a range of industries, including some B-to-B merchants and non-profit organizations, and signed up to receive communications from them on three separate e-mail accounts: Yahoo Mail, MSN Hotmail, and Google’s Gmail. The e-mail accounts were not part of the existing Pivotal Veracity test network and were not used for any other purposes. And the companies involved were not notified of their part in the research, to avoid altering their mailing behavior. They included merchants such as L.L. Bean, Neiman Marcus, Target, and Buy.com; the four largest online travel agencies; media companies such as Business Week, CNET and the Wall Street Journal; and noncommercial mailers such as the Food and Drug Administration and the American Association of Retired Persons. None were current Pivotal clients.

With the registrations done, Pivotal Veracity then sat back to wait for the mail to arrive—and then checked whether the ISPs sent it to the accounts’ inbox or to the spam folder.

Pivotal Veracity never received any e-mail from 10 of the 100 companies chosen for study. Since researchers had no way of knowing whether these companies tried and failed to send e-mail or simply did not mail, those no-shows were not counted as failures in the final results. Neither were occasions when researchers got e-mail from a company on one or two of their accounts but not the others, since it was conceivable that a company could mail to Hotmail accounts but not Gmail.

Even by those conservative measurements, Baird says she was surprised to find that 54 of the 90 companies that sent e-mail were afflicted by at least one false positive result for spam. (In fact, the affected proportion was higher, because in the week after the study concluded, e-mail from a further six companies in the study turned up as spam on those accounts.)

“The problem was bigger than we thought,” Baird says. Pivotal Veracity usually looks at clients’ individual e-mail campaigns, mailing out to a handful of seed accounts to test delivery and recommending adjustments on those results; so Baird says the company is used to seeing initial false positive rates of 20% to 30%. “Never in a million years,” she says, did she expect that a broader industry view would reveal that more than half the companies had a false positive problem.

The microlevel results were equally surprising. Ninety-nine of the 100 companies selected used either opt-in or double opt-in for their e-mail registrations. Eighty-one used opt-in, either confirmed or unconfirmed. Eighteen used a double opt-in procedure, in which recipients have to respond to an e-mail to confirm that they indeed wanted to receive communications.

But those permission safeguards didn’t prevent false positives from occurring. Fifty-nine percent of the companies using opt-in experienced a false positive for spam, as did 39% of the companies using double opt-in. In a few cases, Baird says, the e-mail that turned up in the spam folder was the second opt-in message, so those user names disappeared from the company’s mailing list due to false positive results.

And of those companies who had at least one of their e-mails blocked, 18% found transactional e-mail– not just information or offers– going into the spam folder. These included the aforementioned second opt-in as well as thank-you and welcome messages, but did not include newsletters, although many e-mailers consider those to be transactional too.

Surprisingly, e-mailers that took part in accreditation or “bonded sender” programs actually fared slightly worse in the Pivotal Veracity study than those that did not. More than a third of the companies tested pay for some form of accreditation or certification program that offers to ensure inbox delivery. But 55% of those companies were hit with at least one false positive, compared to a 53% rate for the companies that pay nothing toward delivery assurance.

Is there any good news here? Yes. “Companies that outsourced their e-mail deployment—a bit more than half those studied—had a slightly lower false positive rate than those who managed their e-mail in-house,” Baird says. “Slightly” is the operative word here: Fifty-two percent of companies that mailed through ESPs encountered at least one false positive problem, compared to 67% of those sending their own e-mail.

What are some of the conclusions to be drawn from this reality check? For one thing, says Baird, it’s heartening to see such a high proportion of e-mailers adopting best practices such as SPF and double opt-in. “Unfortunately, it’s doing them no good when it comes to getting their messages into the inbox,” she adds. Almost three-quarters of the companies involved in the study used Sender Policy Framework (SPF) authentication in their e-mails. But this widespread implementation had no impact on whether or not a given company’s e-mail got tagged as spam: Seventy-three percent of the senders that fell victim to false positives in the survey were using SPF.

More important, Baird says, companies that make e-mail a crucial part of their marketing programs need to recognize that the false positive rate is a metric that different actors in the industry will calculate in different ways. Vendors of spam filters commonly claim that their products produce false positives less than 1% of the time. “But they’re calculating that figure using the universe of all e-mail, including the 70% to 80% they already say is spam,” Baird says. That proportion doesn’t give much help to anyone trying to figure out what fraction of legitimate e-mail gets misidentified.

And ESPs, which commonly claim delivery rates of 90% to 98%, are most often using the “mailed-minus-bounces” formula that can only tell whether a message has reached the ISPs’ servers—not whether it then passed into receivers’ inboxes or into their spam folders. Mailers need to recognize that both these figures may not measure what they’re most interested in: the proportion of legitimate e-mails that get diverted into recipients’ spam folders. (Baird admits that her company makes a business out of helping mailers find this figure, but maintains that does not have any impact on the validity of the study’s findings.)

Another distinction the Pivotal Veracity report lays bare is the impact of differing definitions of “spam”. Most mailers and ESPs strive to comply with the regulations laid out in the Can-Spam Act, which focuses on receiving users’ consent in order to e-mail them. But ISPs overlay other requirements on top of that, Baird says, which means that mailers that meet Can-Spam standards can still wind up in spam folders.

Worse yet, these additional requirements can vary among ISPs. Some label as spam any mail sent from servers that do not allow reverse domain name system (DNS), which enables ISPs to check that e-mail is really coming from the domain listed in the “from” line. Others consider mailing more than 500 messages an hour from a single IP address to be a spam indicator. AOL says you can’t mail from a dynamic IP address; Yahoo won’t consider white-listing a mailer who rents out their e-mail list to others.

ISPs are notoriously reluctant to provide details about their spam-trapping techniques for fear that spammers will game their systems. And they’re right to guard the content and format specifics that lead them to flag certain messages as spam, Baird says. But she also maintains that they should be explicit about the additional non-spam requirements that they impose on mailers, many of which are not published anywhere. And she points out that the problems are even more acute for B-to-B merchants who must contend with even broader spam policies enforced by corporate networks.

“Mailers today are at the mercy of the ISPs,” she says. “They’re saying, ‘Just tell me what to do and I’ll do it.’ ISPs should recognize that their own filters are inaccurate, and they have a responsibility to identify and resolve the problems that come from those imperfect filters.”