They are billed as a handy shortcut to data warehousing success, but prebuilt data models are more trouble than they’re worth.
By Margy Ross (Kimball University)
September 13, 2010
Industry-standard data models are an appealing concept at first blush, but they aren’t the time savers they are cracked up to be. What’s more, these prebuilt models may inhibit data warehouse project success.
Vendors and proponents argue that standard, prebuilt models allow for more rapid, less risky implementations by reducing the scope of the data model design effort.
Every manufacturer takes orders and ships products to fulfill the orders. Every insurance company sells policies and processes claims. Every transportation company moves cargo between an origin and a destination. The list goes on across other industries.
Why bother “recreating the wheel” by designing custom data models to support these common business processes when you can buy an industry-standard model instead?
Yes, most businesses in a given industry perform common functions. But if everyone’s approach to these business functions was so similar, then why are there so many alternative organizations? Don’t most organizations do things slightly differently than their industry peers? And if so, how do these “competitive advantages” get addressed by a pre-defined industry model?
True business intelligence requires the injection of an organization’s own intellectual capital. Would you really want to use the identical industry solution as your peers?
In virtually every data warehouse design and delivery project, the vocabulary used by the operational source system’s data model needs to be translated into business vernacular. Some might argue that the source system speaks “geek” rather than Greek. Embracing an industry-standard model introduces the need for yet another pocket dictionary.
First, the data in the source system’s language needs to be translated and transformed into the industry model’s generic language. This is no small feat; while some data will translate without too much compromise, other data will need to be wrestled and coerced into the pre-defined model and invariably some source data just won’t fit.
Once the source has been manipulated into the model’s supposedly universal language, the data then needs to go through a second translation so that the vocabulary used in the final presentation layer makes sense to the business users. The challenges surrounding these multiple transformations, and the opportunity to lose something in the translations between three languages — source system, industry model, and business usage — are extensive but often overlooked when considering a standardized model.
Of course, the transformation effort is less onerous if the source data capture system and industry model are supplied by the same vendor. But there are still some sharp corners lurking even in this scenario. First, you’ll need to incorporate any custom extensions or flex field data elements from your implementation into the vendor’s generic model. Secondly, you’ll need to worry about the integration of any source data that’s outside the vendor’s domain.
Can you readily conform the industry model’s dimensions with other internally available master data? If not, the industry model is destined to become another isolated stovepipe dataset. Clearly, this outcome is unappealing, but it may be less of an issue if all your operational systems are supported by the same ERP vendor or you’re a very small organization without an IT shop doing independent development.
What can you realistically expect to gain from an industry-standard model? A pre-built generic model can help identify core business processes and associated common dimensions for a given industry. That provides some comfort for data warehouse project teams feeling initially overwhelmed by design tasks.
However, is this knowledge worth six figures? Alternatively, you could likely gain this same insight by spending a few weeks with the business users; you’d not only improve your understanding of the business’s needs, but you’d also begin “bonding” business users to the DW/BI initiative.
Ultimately, the business’s ownership and stewardship of the effort are critical to its long term success. Even if you buy an industry-standard model, you’re still going to need to spend time with the business to understand the final translations required to address their requirements. There’s no dodging this necessary step. In Kimball Group’s experience, after a few days or weeks studying the standard model, most teams typically gain enough confidence to want to customize the schema extensively for “their data.”
It’s also worth mentioning that just because you spend thousands of dollars on a standard model doesn’t mean it exhibits generally-accepted dimensional modeling best practices. Unfortunately, some pre-built models embody common dimensional modeling design flaws; this isn’t surprising if the model’s designers have focused more on best practices for source system data capture rather than those required for business reporting and analytics.
Under the heading of full disclosure, Kimball Group helps clients design dimensional models; but given that we’re a six-person organization, the overall demand for customized dimensional models far exceeds our capacity, with or without broad adoption of industry-standard models.
I’ll also acknowledge that we’ve worked with several source system vendors on their complementary industry model solutions; we appreciate that it’s much harder than you would imaging to design a pre-defined generic model — even when you own the data capture source code.
We teach organizations via our Kimball University classes, our Toolkit books, and our free Design Tips, articles and Webinars to design dimensional models themselves to address their business users’ needs.
Based on our experience and observations, business-centric custom models are more likely to be adopted by the business users than vendor-supplied pre-built standard models which appeal more to IT teams less attuned to the business’s nuances.
Margy Ross is President of the Kimball Group and teaches dimensional modeling and data warehouse and business intelligence project lifecycle classes through Kimball University. She co-authored The Data Warehouse Lifecycle Toolkit, 2nd Edition (Wiley 2008).