Should You Wait for Data Quality?
Originally published in Information Management on July 2006.
A couple of IT managers recently approached me with a thorny issue. The key business sponsor for their data warehousing effort asked them if they should hold off building an enterprise data warehouse until they fixed the data quality in their source systems.
It is always positive when both the business and IT groups even think about data quality before starting a project. Whether they're working on a business intelligence (BI) project, building a data warehouse or dashboard, or getting their feet wet with corporate performance management (CPM), unresolved issues with data quality can bite them in the tail.
Ideally, everyone would be aware that their source systems have data quality issues - and honestly, most source systems, even yours, do have data quality issues. Haven't we all witnessed projects that start off with the assumptions that the data is fine or that the data quality problems are "owned" by the source systems, so the data warehouse project team does not have to worry about it?
Who Owns the Problem?
Assumptions like these are naíve and can undermine the use and ROI of the data warehouse. If you publish the data, regardless of where the data quality problems originate, the data warehouse owns the problem. I do not mean you own the responsibility to fix it, but you own the responsibility to be proactive in explaining to your business-group customers about data issues.
It is never too early to start thinking about data quality. In fact, you should insist on data quality requirements and metrics from the business group when you are working with them to determine the business specifications of the project. The data warehouse project team needs to consider the data quality metrics as their own key performance indicators (KPIs). That means they have to monitor, measure and report on the data quality KPIs. They need to generate reports or dashboards to show to the business.
Although the data warehouse team is not responsible for fixing the problems, it does need to monitor and report on them. Perhaps this information will be used to help justify an initiative to address the data quality issues. If that happens, your data quality metrics can be used to determine how effective those efforts have been.
Back to the question the IT managers asked me. Should they wait for perfect data before proceeding with their project? My answer to the initial question was, "Hell no! In fact, you would be performing a disservice to the business if you waited for the data quality to be perfect."
How can someone who is so adamant about data quality say this? It may sound like a cop-out, but I am just trying to be realistic. I am certainly not suggesting that anyone abandon his or her quest for quality data, but it is important to remember that data quality is not the only thing the business group needs. They also need access to data without having to wait for it.
Businesspeople must make decisions on a daily basis. They need to make these decisions regardless of the availability or quality of the data. If there is no data warehouse, then the businesspeople will make do with what they have. That usually results in gathering data from various places, putting it into spreadsheets and analyzing it - building a data shadow system (as discussed in my previous DM Review columns). That's a perfect recipe for even more data quality problems, not to mention lost productivity while the business group attempts to grow their own solution. They would be better off with a real data warehouse that uses their less-than-perfect data for the time being. After all, you cannot fix a problem until you recognize it. Hidden data quality issues can cost your company dearly when you're tackling governmental regulations, competitive challenges and financial pressures.
Data Quality is a Journey
The data warehouse with less-than-perfect data is just the first step. Ideally, the business group started addressing data quality issues while the data warehouse project was underway, and better-quality data will soon find its way into the picture. But will it be perfect? Will there be an end point when the team members can pat themselves on the back, knowing that data quality issues are behind them? It is doubtful. Data quality is a journey, not a destination.
Changes in source systems, new tools and the creation of new generations of data warehouses will all affect the quality and accuracy of data in the future as will business pressures such as mergers, acquisitions and new government regulations. So keep the momentum going. Keep moving forward with your BI initiatives, and understand that even though you're not waiting for the day when your data is perfect, you never give up the quest for it.