Tech & Gadgets

Data Quality, Still Bigger Than Big Data

Is data quality really that big of a deal?

The answer is yes, 53% of companies in the recent TDWI (The Data Warehousing Institute) study reported they have suffered losses, problems, or costs due to poor data quality. In a data warehouse, data quality issues can come from many different sources like historical data, data migrations, legacy systems, or data entry errors. Irrespective of the source of the bad data, it can cause various other issues related to costs, performance, and metrics. For marketing data warehouses, down stream costs such as postal mailings to non-existent and wrong addresses can add up quickly.

At Quaero, data quality is a promise to our customers. No matter what kind of data we receive from the list sources, we strive to cleanse that data to a very high standard. We achieve cleansed data by utilizing the following processes: Standardization, Data Parsing, Address Validation & Information, and Matching & Building Best Record.

1) Standardization: Data is standardized with the help of data quality tools. It is ensured that the data is able to be shared across the enterprise and can be moved further in the data processing as per the predefined columns in the database. This establishes trustworthy data for use by other applications in the organization. Ideally, such standardization should be performed during data loading.


2) Data Parsing: Data parsing is very similar to data standardization and is also important for the massaging of data as data parsing defines the different attributes of a data string.

Data Parsing

3) Address Validation & Information: After data standardization and parsing, the most important data quality addition to a record is to validate the address. That’s done with the help of data quality tools, which input the parsed address into an address engine. The engine is connected to various USPS approved address dictionaries and those directories are updated on a monthly basis. For the speed, performance, and reliability of this data cleansing and address validation process, powerful, dedicated servers are utilized to execute the data quality processing.

Recommended for YouWebcast: Sales and Marketing Alignment: 7 Steps To Implement Effective Sales Enablement

Address validation saves time and money by ensuring that the mailing list is accurate before the mail goes out of the mail house. Address validation helps in:

  • Reaching more prospects and customers on time
  • Avoiding wrong and missed connections
  • Enhancing addresses by automatically providing and adding missing information to a record

4) Matching & Building Best Record: Data matching is about linking entities in or between databases, where these entities are not already linked with the help of unique master keys.

Matching is completed with the help of Master Data, Transaction Data, and by matching Transaction Data to Master Data.

  • Incoming from Source: Transaction Data accurately identifies who, what, where and when
  • Processed Records in Database: Master data accurately describes who, what, and where

Matching of data/records generally includes de-duplication, setting a match group and then choosing and preparing the best record out of the group, which goes something like this:

  • Group records together based on match criteria
  • Match on source id, assigned id, address, name, etc.
  • Rank records within a group: master, subordinate, unique, etc.
  • Associate transform: create master grouping from several sub-groupings (A matches to B, B matches to C, therefore A matches to C)
  • Build best record

This is how a source record gets changed into a processed best record in the database.

Best Record

Quaero’s data quality services are comprehensive and leverage our years of marketing and data quality experience. Ultimately, our customers are concerned about their customers, their prospects, retention, and their acquisition costs. Clean data makes all aspects of the marketing process a reality at the lowest possible cost.

  Discuss This Article

Comments: 1

  • Yes, this is what so many people fail to see. If the source data is dirty, the results of your data mining efforts will be dirty. I think the 80-20 rule applies here. 80% of the effort is in data-quality, 20% on the mining.

Add a New Comment

Thank you for adding to the conversation!

Our comments are moderated. Your comment may not appear immediately.