Data Quality is the degree to which a set of data values meets the usage requirements of its various users. In other words: is the data “fit” to be used for its intended purpose(s)?
Therefore, “data quality” is relative to each individual or group of individuals that uses the data. This relativity is what can make the implementation of a robust data quality function a challenging exercise. While one can create measures simply showing how often a data field is populated, its range of values, and how the values are distributed, it is not until business context is applied do these measures become relevant and effective controls can be stablished.
So, if data quality is in the eye of the beholder (the data user), data quality controls are driven by those user’s priorities, processing needs, and rules. Those needs will shift and change over time and, therefore, so will the controls. If our organization is serious about managing data quality an enduring data quality control function will be established. Why is this important?
Data, the digital building blocks of information, is a valuable corporate asset that outlasts applications and processes. Data is used as input into tactical and strategic decisions. Analysis of data allows organizations to learn and grow. Data is received from and transmitted to customers and business partners. Data is reported to investors and regulators and used by the media. Data flows through every process your organization runs and is used by every individual your organization employs. As important your central nervous system is to your body, data is to your organization. Data signals when you should move, how fast, and in what direction. If those signals are faulty, poorly managed, or constantly in conflict you may find yourself paralyzed into inaction or moving in the wrong direction at the wrong speed. So, what steps can you take to ensure your data’s quality is understood and sufficient for your organization?
“Quality Control”: an aggregate of activities (as design analysis and inspection for defects) designed to ensure adequate quality of a target object or output.
A Data Quality function is nothing more than a quality control process applied against the data relied upon by your organization. As the above definition implies, a quality control function is a set of activities needing to be both designed and executed effectively. These activity categories include:
Data Profiling – the process of understanding the current state of your data. This is a necessary initial analysis step to provide basic information about your data. This step identifies potential problem spots, spawns follow-up questions, and informs stakeholders on the true (vs. perceived) state of the profiled data. It provides facts and can be accomplished quickly by using a data quality or data management tool.
Data Quality Rules – the applicable constraints in relation to the data’s business context. This is where the data rubber starts to meet the business road. Some rules are generic and simply measure how well populated a field is, whether the data is valid for its basic data type, does the data exceed a broad reasonability test, etc. Other rules are more specific to the industry, company, process, and subject matter context that the data is being used within.
Data Quality Controls – the rules, processes, and systems used to identify and address the risk that the data does not meet its required level of quality. Controls can be automated, manual, or a mix of both. When implementing controls consider the different mix of control types that can be utilized.
o Detective – identify defects but do not stop them from initial processing
o Corrective – fix or enrich data that is defective or deficient
o Preventative – identify and stop defective data from being processed
When designing controls, consider the following:
o Where will the controls be placed (implemented) along your information supply chain?
o In which cases will the control stop data from being processed versus allowing processing with notification for follow-up?
o Who will monitor for control exceptions (defects)?
o If a risk, issue, or defect is identified how will it be tracked, prioritized, and resolved and who will do this?
o How will controls for process-level risk differ from controls for systemic (entity-level) risk? Do you need/want both? Why or why not?
Data Quality Reporting & Metrics – information about data quality levels across a given process, data set, or an enterprise. Reports and metrics are a type of detective control. They inform the parties responsible for ensuring data quality as to how well they are doing their job. They inform data users as to the overall quality of the data. Reports and metrics can be developed to cover a single point in time view and to show trends and variance over time. Metrics (facts) on the reports can be summarized, aggregated, and viewed from different perspectives and across various dimensions.
When defining and developing data quality metrics and reports considers the following:
o Who will be able to view the reports and what metrics and metric views will they want to see?
o How often shall the reports be refreshed?
o What is the right mix of pre-canned and ad-hoc reports?
o Are there existing quality control metrics at the company that can be leveraged?
o Can existing techniques used at your organization for operations monitoring be leveraged, such as Six Sigma, Total Quality Management (TQM), and Statistical Process Control (SPC)?
Ultimately, all of your organization’s data should have some degree of quality control. However, it is not realistic to assume all data will have the same level of rigor applied. Combining & applying the dimensions of criticality, commonality, and quality to your organization’s data assets to establish a prioritized list of data categories is a key initial step on the path to better data quality.
Some data is more critical than others. Its degree of importance is driven by its usage – i.e. the criticality of the processes, reports, or decisions relying upon that data. Each organization will have a different view on their most critical processes and decisions, and therefore, pieces of information. To make things more complex this view will change over time.
Additionally, some data is more highly shared than others. Customer and Product data may be used repeatedly across many of your organization’s processes, while other information may be limited to use by a single function. A good indicator of the importance of a data subject to your entire enterprise is its level of sharing or common use. Finally, some data is simply better than others. Higher quality data typically needs less attention than data that is known to be defective.
Once some intelligence is gathered, create a prioritized list of processes &/or data categories. Warning: the prioritization process has the potential to drag on due to unresolved conflict and an overly broad span of coverage. Be sure that basic decision making roles, processes, and timeframes are in place prior to embarking on this initial step. But most importantly get started. With each successive data quality project more data risks and defects will be addressed, you will become more adept at prioritization, control coverage will expand, and you will mature in the art of designing and implementing controls.
Improving data quality is an iterative process by which the both the data itself and the controls that manage data quality levels are improved and matured over successive cycles. Here are some guiding principles to follow
Understand current data quality levels and practices at a broad, high-level, get started in a targeted area then expand scope.
Data Quality is never “done”. It is a continuous improvement process baked into everyday business and IT functions.
Data content is owned by the business. Data systems are run by IT. Data Quality depends upon both constituent groups. Don’t drive the effort exclusively from one side or the other.
Establish a core competency center in data quality.
When forming this center, create a team(s) possessing the following attributes:
o Strong communication skills and ability to understand and work with others
o Substantial data analysis skills from both a business and technology perspective
o Deep expertise in statistics, quality control, and process engineering
o Demonstrated knowledge and experience using a data quality tool
o Demonstrated ability to get things done.
While there may be no one person on the team that embodies all of the above characteristics, the combined group should cover all of them. How many teams and team size is variable depending upon the nature of your organization and initial scope. The general guideline is start small (one team with no more than 3-5 resources). Additional data quality competency teams can be established later, if needed.
o Controls must be both designed effectively and executed effectively.
o It does no good to architect an elegant suite of controlled processes if the operational function that will employ and react to deviations does not function well.
o A diligent operational function may be hampered by poorly placed or highly-manual controls that limit control coverage or inhibit the efficiency by which defects are identified, prioritized, and fixed.
o Focus on both operational controls at a local process level as well as enterprise control reports that gauge overall data quality risk for the enterprise.
In conclusion, data is an extremely important asset to any organization that must be continuously improved based upon data consumer’s feedback and priorities. By leveraging a well-planned Data Quality effort, the value your organization receives from your data in relation to the associated data acquisition, management, and usage costs and risks will increase significantly.
Eric Hartung is the Chief Information Officer of Unissant, Inc., and Managing Partner of InfoShare360, llc. He has extensive program management and operations expertise in the areas of Data Strategy & Governance, Data Architecture, Data Quality, Data Management, Data Warehousing, Business Intelligence, and System Integration. Mr. Hartung also possesses deep industry experience in the mortgage and financial services industry, having worked at large financial institutions in both business operations and information technology roles for over 16 years. Eric holds a Bachelor of Science degree in Finance from Virginia Tech, Pamplin College of Business, Blacksburg, VA and a Master of Business Administration degree with a concentration in Information Systems from George Mason
University, Fairfax, VA.
Eric Hartung is the Chief Information Officer of Unissant, Inc., and Managing Partner of InfoShare360, llc. He has extensive program management and operations expertise in the areas of Data Strategy & Governance, Data Architecture, Data Quality, Data Management, Data Warehousing, Business Intelligence, and System Integration. Mr. Hartung also possesses deep industry experience in the mortgage and financial services industry, having worked at large financial institutions in both business operations and information technology roles for over 16 years.
Eric holds a Bachelor of Science degree in Finance from Virginia Tech, Pamplin College of Business, Blacksburg, VA and a Master of Business Administration degree with a concentration in Information Systems from George Mason