Thinking

Understanding data quality

Written by Matt Cheung | March 27 2015

We’ve blogged before about data quality management and some of the key steps to achieving good quality data. Here we’ll discuss some of the finer points of understanding data quality.

When someone says “I’ve got a real problem with duplicate data”, or “my data isn’t very good”, and they’re able to show me the statistics to prove it, my first instinct is to ask them what kind of data they think they have a problem with. And often the response is that account and contact data is somehow unsatisfactory.

But what does that really mean? There are different types of account or contact data to consider, but first of all we must turn our attention to the lifecycle.

As your real-life customer goes through the marketing, sales and billing cycle, the data that defines the customer in your systems should go through a similar cycle. Perhaps they start as a lead, then move to become a prospect and finally a customer. At some point they may become a former customer and you’re likely to want to be able to understand that transition too.

If this lifecycle isn’t in place, then it’s likely that your staff are struggling to understand who is who in a morass of data. If it is broken, then it is critical item to fix, you need to put in the groundwork to make sure you determine what the lifecycle stages are, and make the necessary changes in the impacted systems to ensure that people can filter data to see what’s important to them. Marketing types, for example, aren’t only interested in leads. Understanding customers is vital too, so they can see where whitespace and cross sell opportunities are. On the other hand, service teams are probably only interested in customers.

If you’ve got a lifecycle and your systems are using it and you’re still getting complaints, then I’d start to look at different types of data. Accounts and contacts diverge slightly here, so we’ll cover accounts first.

Before we get much further, we have to decide what an account is. I like to define an account as an address, it eliminates a significant amount of uncertainty. Ideally this would be a physical address, but sometimes that is a step too far. This is the first point where I often see duplication, perhaps as a result of data being created from different sources, or due to restrictive data visibility.

Then accounts broadly subdivide into three categories, service (or delivery), billing and legal entity. They may be all three, or any combination of the above. Often, this causes duplication where systems can’t accommodate this model.

Where you’re dealing with large businesses, they should also be arranged in a hierarchy that reflects that business or the sales approach you want to take to that business. Again, failing to do this, or creating “dummy” accounts, causes problems.

For contacts it is possible to identify a similar set of considerations. The definition of a contact is a little easier, they should be a person with whom you do business or want to do business. There’s a tricky area around how you handle generic contacts like accounts payable, but you should apply the 80:20 rule to determine whether to expend much brain power on it, or to worry about it later.

Contacts are again often only relevant to certain areas of the organisation. Finance and order management are likely to be quite protective of their billing contacts, whilst marketing is probably only interested when they need to send a price rise letter or some other communication. Your users should be able to identify the type of contact they’re entering – whether decision maker, influencer, accounts payable or end user.

Once you’ve worked through this exercise, it’ll become a lot clearer where the specific issue lies; perhaps you only have an issue with customer contacts, or billing contacts, and that is blurring the overall quality of data. Equally if you do have an across the board issue, this categorisation allows you to prioritise your efforts.