| Upcoming
Speaking Engagements

Business
Intelligence & Data Warehousing Conference
Chicago — February 7 - 9, 2006
Session: Moving from Spreadsheets
to Analytics Tuesday, February
7, 2006, 3:15 PM - 4:30 PM

Gartner
Business Intelligence Summit
Chicago, March 6-8, 2006
Session: Data Shadow Systems,
Monday, March 6, 2:45 PM - 3:30 PM
Recent
Articles in DM Review
Align
Metadata and Business Initiatives, January
2006
The
Enterprise Data Warehouse Strikes Again,
Part II, December 2005
The
Enterprise Data Warehouse Strikes Again,
Part 1, November 2005
ODS
Redux, Part 2 (August 2005)
(Contact
us to have Rick Sherman speak at your
event or deliver onsite
data warehouse training to your employees.)
|
If
you're attending DCI's show in Chicago next month,
be sure to stop by. Details for my session are
on the right side of this newsletter.
And,
if you're attending Gartner's
Business Intelligence Summit March 6-8 (also
in Chicago) I'll be in the Solution Showcases
Monday and Tuesday, and delivering a session on
Monday.
Be
Prepared to Duel with Data Quality
by
Rick
Sherman, Athena IT Solutions
Plenty of business intelligence or data warehouse
projects have been blindsided by complications
related to data quality. Sometimes these issues
aren't apparent until business users start testing
the system just before going live with the project.
So what causes BI project teams to get caught
off guard by data quality issues? And why do these
problems surface so late in the project?
There
are two common pitfalls: defining data quality
too narrowly and assuming data quality is the
responsibility of the source systems.
People
often assume that data quality simply means eliminating
bad data -- data that is missing, inaccurate or
incorrect. Bad data is certainly a problem, but
it isn't the only problem. Good data quality programs
also ensure that data is comprehensive, consistent,
relevant and timely.
Don't
blame the source systems
Defining
data quality too narrowly often leads people to
assume that source transactional systems -- either
through data entry or systemic errors -- cause
the bad data. Although they may be a source of
some errors, the more likely culprits are
either
inconsistent dimensions across source systems
(such as customer or product identifiers) or inconsistent
definitions for derived data across organizations.
Conforming dimensions -- developing consistent
customer or product identifiers -- is important
for accessing and analyzing data for a company.
The source systems do not own the data quality
issues across other systems, the BI project team
does. The source systems need to ensure that the
data within their data silo is correct. But the
BI project team is responsible for providing the
business with data that is consistent across the
enterprise.
|
“There are two common pitfalls: defining
data quality too narrowly and assuming data
quality is the responsibility of the source
systems.” |
Similarly,
each organization within the enterprise may have
valid business reasons to derive data differently
than others. For example, their position in a
set of business processes may determine how they
view their data. The individual organizations
aren't tasked with developing common definitions
for derived data, but the BI project team is.
Many BI project teams try to claim that data quality
issues aren't their responsibility. However, from
a practical viewpoint, the BI team does need to
make these issues their own, since their job is
to ensure the highest data quality possible. The
BI project team is packaging the data for consumption
by business users and they will be held accountable
for the data quality. This may not seem fair,
but the success of their project depends on it.
Don't
shortchange the pilot
Surprises
happen when the project does an initial pilot
or release involving only a small subset of source
systems. While there may be many good reasons
to have a narrow scope for a pilot, you won't
get an appreciation for the effort necessary to
conform these dimensions as the number of source
systems expands.
Sometimes
pilots are only with a single organization, using
only their definitions for derived data. Once
again, the tough issue is often how to accommodate
the differences in the derivation definitions
between organizations. In both cases the real
challenges are encountered when dealing with multiple
systems and organizations. The business users
need to look at the big picture, and that is only
possible when they can access and analyze data
across the enterprise.
Steps
to address data quality
To
ensure data quality, the BI project team has to
address it from the very beginning. Here are several
significant steps to consider:
-
Require the business to define data quality
in a broad sense, establish metrics to monitor
and measure it, and determine what should be
done if the data fails to meet these metrics.
-
Undertake a comprehensive data profiling effort
when performing a source systems analysis. Data
anomalies across source systems and time (historical
data does not always age well!) is needed so
that the team can address them with the business
early on.
-
Incorporate data quality into all data integration
and business intelligence processes from data
sourcing to information consumption by the business
user. Data quality issues need to be detected
as early in the processes as possible and dealt
with as defined in the business requirements.
Enterprises
must present data that meets very stringent data
quality levels, especially in light of recent
compliance regulations and demands. The level
of data transparency needed can only result from
establishing a strong commitment to data quality
and building the processes to ensure it.
For
more information on training, see our page on
data
warehouse training.
|