Data Quality – Go for Gold

July 16, 2012 By (Altis Consulting)

From time to time, ETL will highlight data quality issues.  There is often a choice between fixing the issue at the source or in the ETL processes.

In this blog I argue that the issue should be fixed at the source, but if that isn’t practical I provide  some guidelines to ensure the best outcome.

Correcting the issue at the source is the gold standard as this will not only correct the data itself, but may address the root cause of the underlying problem (be it process, standards or a technical error). It’ always worthwhile aiming for the problem to be fixed at the source, for the following reasons.  In other words … Go for Gold!

It stays fixed

If the issue is fixed at the source, it can prevent similar or related data quality issues from re-occurring. A one-off correction made within the receiving system is no guarantee that the data won’t be reloaded or re-submitted incorrectly in the future, and  the issue will re-appear. Correct it at source and it stays fixed.

It stays fixed for everyone

If the issue is fixed at the source, not only is it fixed for the warehouse, it is also fixed for the source system and any other upstream users of the data. Never underestimate the breadth of impact a data quality issue may have. Imagine a spider’s web with lots of tangled threads that radiate out from the centre and you’re looking at a how data quality issues can thread through an organisation . Enforcing a solution at the source means everyone benefits from a consistent and correct view of the data.

It’s usually cheaper

Even though it may appear easier to just patch in the warehouse, appearances are often deceptive.  It’is worthwhile keeping the following saying in mind:

“If you want quick and dirty, we can guarantee the dirty but not the quick”.  One example of hidden costs is reconciling the source and the target.

The reality is though, sometimes we are forced to apply a patch in the data warehouse .  The root cause may involve external systems over which we have little control.  The time needed to co-ordinate the fix may be long and complex, meanwhile we need to do something to keep the data flowing.

In this case we need to settle (at least temporarily) for silver, but here are some steps to ensure that the fix is as effective as possible:

Raise the Issue

Ensure the issue is communicated so that other users of the data are aware of it.  Those responsible for the root cause should know so that it at least doesn’t increase in impact.  Highlighting the impact of data quality issues may be all that’s needed to get the ball rolling towards addressing it.

Isolate the fix

Minimize the flow-on effect of the issue.  In other words, avoid becoming part of the problem.  Once the root cause is fixed, it should be easy to remove the temporary fix. Don’t comprise on your standards to attain the best data quality possible. Remember – go for gold!

Revisit the Issue

Keep an eye on progress towards fixing the root cause.  Don’t drop the issue because the symptoms have disappeared for the moment.

Keep aiming for gold in data quality!

Simon McAlister


About Andy Painter

A passionate Information and Data Architect with experience of the financial services industry, Andy’s background spans pharmaceuticals, publishing, e-commerce, retail banking and insurance, but always with a focus on data. One of Andy’s principle philosophies is that data is a key business asset.
This entry was posted in Data Quality and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s