|
Data "Huh?-housing"
There
are many data warehousing misconceptions, but
one of the overarching ones that permeates so
many failed projects is a basic misunderstanding
of what data warehousing really is.
This
month's article tackles the "huh?" factor
of data warehousing. For one thing, it's not a
single data warehouse.
Also,
I'd like to extend a special welcome to the new
subscribers I met at the DCI show in Orlando in
September. It was a great conference as always,
and it never hurts to be in sunny Orlando! And
special thanks to conference organizer Barbara
Gavin for all her patience with my late submission
of my materials. I still owe her a case of beer!
Do
People Really Understand Data Warehousing?
by
Rick
Sherman, Athena IT Solutions
It's
been 20 years since the term "data warehousing"
was first coined. Since then, it has evolved into
a mainstream activity for Fortune 1000 companies,
with many undertaking one or more data warehousing
projects over the last decade. But, even after
20 years in the marketplace, it's evident that
data warehousing is still misunderstood.
The
goal of data warehousing is to create "one
version of the truth." But in these attempts,
many companies often create a proliferation of
data silos instead. Here are some classic examples:
-
Companies implement a plethora of databases
under the data warehousing umbrella -- many
of them inconsistent or not integrated with
each other. According to a 2004 The Data Warehousing
Institute survey, "On average, organizations
have two data warehouses, six independent data
marts, 4.5 operational data stores and 28.5
spreadmarts."
-
Many ERP reporting and business intelligence
(BI) efforts within companies continue to be
siloed -- organizationally and technically.
This is true despite the fact that ERP vendors
have embraced data warehousing and have built
data warehousing modules as part of their product
offerings. Unfortunately, these modules are
often implemented separately from the corporate
data warehouses companies have already built.
- Corporate
performance management (CPM) presents a perfect
business justification to implement a comprehensive,
consistent and integrated data warehousing effort
across an enterprise. However, a CPM project
often implements a prebuilt solution from a
BI vendor that is, just like the ERP one above,
separate from the company's corporate data warehouse.
This approach produces short-term, quick business
wins. That's OK for the short term , but in
the long term it just expands the silos and
data inconsistency across a company.
|
“Too often data warehousing is associated
merely with a data warehouse rather than the
entire architecture and process. The problem
is, when you narrow your focus to a single
database, you lose the entire context of the
staging of data. ” |
Just because a company's data warehousing efforts
have resulted in more data silos doesn't mean
that data warehousing is a poor choice. What it
might mean is that people simply do not understand
or cannot get company-wide buy-in about how to
do data warehousing right.
Why is there such a large gap between the goal
of data warehousing and the reality of what a
company has actually implemented? Quite often,
it's confusion.
People
often confuse a "data warehouse" with
"data warehousing." Data warehousing
encompasses a complete architecture and process;
it's not just having a single data warehouse.
Data
warehousing is the transformation of data to information,
thereby enabling the business to examine its operations
and performance. This task is accomplished by
the staging and transformation of data from data
sources, enabling the business to access and analyze
information. The data stores may be persistent
(stored on disk) or transient (using disk or memory).
In
addition, the workflow usually involves multiple
data stores to support the staging and transformation
of data into information such as operational data
stores, data warehouses, data marts, online analytical
processing cubes, files such as a flat file (comma-separated
values extract, for example), XML data and even
spreadsheets.
Any
time you get data from the operational systems
to perform reporting and analysis you are performing
a data warehousing process. In the old days, it
was called decision support; now the term is business
intelligence. Data warehousing is what lies beneath
the graphs and pivots presented by BI. In fact,
BI is merely the presentation layer of the data
warehousing architecture.
Too
often data warehousing is associated merely with
a data warehouse rather than the entire architecture
and process. The problem is, when you narrow your
focus to a single database, you lose the entire
context of the staging of data. Data quality,
consistency and integrity -- not to mention being
able to audit the data trail -- is only achieved
when the entire data staging (or data
warehousing) architecture is considered.
A
data warehousing program is much more than a data
warehouse. With a single, narrow DW focus, separate
efforts for ERP, CPM and BI recreate the DW architecture
and create their own data silos. Ideally, an overall
architectural view would let all these efforts
leverage each other's work and reuse tools, code,
processes, data and standards. A company would
be able to implement these systems more economically,
with a higher ROI, lower overall operating and
maintenance costs, but also strive toward the
"single version of the truth."
|