William Giovinazzo writes:
The chief promise of business intelligence is the delivery to decision-makers the information necessary to make informed choices. The unspoken assumption in this statement is that the data from which this information is derived is correct. Unfortunately, as is frequently the case, unspoken assumptions are translated into unaddressed requirements. It is, therefore, essential that the BI project team address data quality with the same rigor and drive as they would such requirements as system up-time, system response time or network performance. Data quality must be integrated into the very DNA of the BI system.
Although a BI system can only be as successful as the confidence decision-makers have in the data, the issue of data quality extends well beyond the front-end reporting tool or back-end data warehouse to the entire information infrastructure of the organization. The irony is that because the BI system brings the issue of data quality to light it is quite often blamed for poor data quality. Prior to the data warehouse, data is viewed at the transaction level or within disparate islands of information. The errors are hidden, therefore, by lack of a larger view. In others words, the forest of errors could not be seen because of the trees of data. The aggregation of the data in the BI system amplifies the issues with the data giving the organization a view of the entire forest. The BI project manager should see this not as a problem, but rather as an opportunity to deliver greater return on the BI system by improving the data quality for all information systems.
Many studies have been performed and much has been written about the financial impact of data quality. We will not dwell on these issues here. For the purposes of this discussion, let us agree that there is a cost to poor data quality and a benefit to good. Let us also agree that these costs and benefits provide sufficient ROI to merit funding of data quality efforts. The question here is how BI project managers can integrate data quality into projects to help not only ensure the overall success of the data warehouse but also raise the quality of information throughout the organization.
Recognizing this opportunity to increase value, the BI project team during the planning phase must address data quality within their quality management plan. Although traditionally the quality management plan addresses the quality of the project deliverables, the data quality plan itself should be seen as a BI project deliverable. This plan is employed first by the project team, establishing the required organizational structures and processes. It is then delivered as part of the go-live transition to the support organization. This in turn becomes the basis for the organization’s data governance system.
The data quality management plan begins with requirements. Everything begins with requirements. At a minimum, data quality requirements definitions should include the following:
- Comprehensiveness. Missing data can be just as problematic as incorrect data, especially if the user community assumes that the data within the warehouse is complete. The user community must define what data is needed. We should also bear in mind that the BI project team cannot boil the ocean. It may not be practical in the first phase of the project to integrate all source systems; in this situation, the user community collaborates with IT to prioritize which data sources are provided in which phases. This not only ensures that user requirements are met in the correct order, but helps set customer expectations that not all data sources will be included in the first phase of the project. In addition to addressing which source systems are included, the user community also defines which data elements are included. Is it acceptable for a customer account to be missing the Social Security number? What about a fax number? It is up to the users to tell us what they need.
- Accuracy. The users and IT collaborate on the cleanliness of the data. Is it significant when a customer’s Social Security number is incorrect? What about an incorrect birthday? If multiple systems disagree, which of those systems is the trusted system? Also, what level of data accuracy is required? Of course, we are all tempted to say that we want the data to be 100 percent accurate, but the business must be cautioned that there is a cost to accuracy. In some scenarios, the incremental benefit of greater data accuracy may be outweighed by the incremental cost. If going from a 90 to a 92 percent accuracy rate doubles the cost, the business may decide that it is not worth the investment. Marketing may use the data to drive a mailing campaign where the additional two percent would not be worth the investment. Others may be using the data to drive security, where the slightest breach in security may mean the loss of millions of dollars or, worse yet, the loss of life. In either case, the additional cost is money well spent. Again, the user community provides this input.
- Consistent. Here we are referring not to the consistency of the data itself, but the metadata. This is the definition of the various data elements and the rules around the data. What is a customer? One would assume that such a question is easy to answer, but in many organizations, different departments have different, often conflicting, responses. The chart of accounts is an area where many organizations struggle. Even when there is agreement on the structure and values in the chart of accounts, there is conflict in how it is used. IT works side by side with the users to create a data glossary in which the data elements are well defined, establishing business rules to provide for consistent usage across the organization. The data glossary not only defines the metadata of the BI system, it also documents how the data is created and used by the business, the business rules.
- Frequency. How frequently does the data need to be refreshed, at what times? Typically, a nightly refresh of the data warehouse is sufficient. During month-end close, accounting may require the warehouse to be updated on an hourly basis, or even in real time. Again, the user community provides this input.
As we can see from the requirements listed, the BI project team must work in close cooperation with the user community. A data warehouse, however, is not static. In fact, a parameter of BI success is that the system changes; it grows and evolves with the business. The collaboration between the BI project team and the user community, therefore, lives beyond the project itself. Given this perspective, the data governance plan must define the organizational structures necessary to maintain this relationship.
Within this organizational structure, key individuals are responsible for data quality with the authority necessary to establish the correct data quality policies and procedures. Within each organization, there needs to be a data governance board that is responsible for the quality of the data. The number of members participating on this board will vary with the size of the organization, but should compose a minimum of following:
- Executive Sponsor. The executive sponsor is a c-level executive that drives corporate commitment of the data governance initiative. As we have seen in data warehouse projects, in order for cross line-of-business systems to be successful, executive sponsorship needs to be established to create sufficient commitment at the lower levels of the organization. Preferably, this sponsorship should come from outside IT.
- Data Quality Manager. The role of the data quality manager is to coordinate the activities of the data governance board and ensure the quality of the board’s deliverables. Some in the past have referred to this role as a program or project manager. Neither term is correct; a project is a temporary endeavor to create a unique result while a program is composed of projects. The data quality manager is a role that is established during the development of the data warehouse and continues past project closure. It is not temporary.
- Subject matter steward. The subject matter steward is the user community’s representative on the data governance board. This person understands how the business uses the data and is responsible for defining the data quality requirements described earlier. This person will work shoulder to shoulder with the data steward.
- Data steward. The data steward is the IT person that works with the subject matter steward. The data steward translates the requirements defined by the subject matter steward into a technical specification used by the development team to create a system.
Each role is performed on either a full- or a part-time basis, depending on the needs of the particular organization. The number of stewards may vary with the organization. Also, the relationship between the subject matter stewards and the data stewards may be many-to-one or many-to-many, depending on situational requirements.
To determine who participates on this governance board, we create a simple matrix. Along one axis, we list each of the subject areas or dimensions within the data warehouse. We might list, for example; store, customer or product. Along the other dimension, we list the related functions by area. Here, we would expect to see entries such as sales, with the various sales activities or supply chain with its related functions. At the intersection of each row and column, we identify if that function creates, reads, updates or deletes that dimension. For each functional area that interacts with a particular dimension, a subject matter steward is selected to represent them on the data governance board. Note that one subject matter steward can span several dimensions or several functional areas. So, a single subject matter steward can represent all of finance, while another represents all of sales. Similarly, a single subject matter steward could be responsible for customer data across both finance and sales, while another is responsible for product.
The data governance board has both tactical and strategic responsibilities. Tactically, they are responsible for the monitoring of data quality, working to resolve issues as they arise. Issues are escalated according to an escalation policy, which is one of the responsibilities of this board. Strategically, the data governance board is responsible for delivering the procedures and policies necessary to progressively improve the overall quality of the data within all the organization’s systems. These policies include topics such as:
- When a new data source is added/to dropped from the data warehouse,
- Business rules for the data elements,
- What determines when a report is no longer necessary,
- How a new data source is integrated into the environment and
- Which data elements receive what level of security.
In addition to defining policy, the data governance board establishes the procedures necessary to enforce these policies. These procedures would include:
- Data glossary maintenance,
- Incorporation of thirds party data,
- Integration of new data sources,
- Data access and
- Data content review.
We began this discussion by noting that a BI system can only be as successful as the confidence the decision-makers have in the data. A chief aspect of this method of data governance is the inclusion of the user community into the process. A common solution to any management problem is to empower the people who are affected by the problem to help fix it; after all, they are usually the most knowledgeable about the situation. By establishing data governance procedures that create a collaboration between users and IT, we give users a greater understanding of the issues of data quality while also providing partial ownership of the resolution. This understanding will increase the users’ confidence in the BI system, ultimately increasing the success of the BI system.