Data Quality Management
Contents |
Abstract
Data quality management (DQM) serves the objective of continuously improving the quality of data relevant to an organisation, program or project[1]. It is important to understand that the end goal of DQM is not about simply improving data quality in the interest of having high-quality data, but rather to achieve desired outcomes that rely on high-quality data[2]. DQM is is the management of people, processes, technology and data through coordinated activities aimed at directing and controlling a projects or programs in terms of data quality"[3].
Data quality has a significant impact on both the efficiency and effectiveness on organisations[4]. As part of the digital transformation, data has become more readily available and more important than ever before. Organisations are performing data analytics to leverage key resources and optimise processes to gain a competitive advantage. As such, data is becomingly increasingly valuable to program and project managers who are driving decision making based on data insight. However, if the data quality is poor, managers risk taking misguided decisions based on unreliable data. It is therefore imperative that a proper DQM system is in place to ensure decisions are being driven based on high-quality data. This article explores the fundamentals behind DQM using references to industry best practices and ISO guidelines.
Overview
Data Quality
An important element of DQM is understanding the dimensions and complexity of the term data quality. As per ISO 8000-2 guidelines, data is defined as "reinterpretable representation of information in a formalised manner suitable for communication, interpretation, or processing" while data quality is defined as the "degree to which a set of inherent characteristics of data fulfils requirements"[5]. Data quality is a multifaceted concept which considers various dimensions and it can therefore be difficult to measure data quality[6]. Data quality dimensions in literature consider accuracy, completeness, consistency, integrity, representation, timeliness, uniqueness and validity. The ISO 8000-8 guidelines divide data quality into three categories based on semiotic theory. Semiotic theory concerns the usage of symbols such as letters and numbers to communicate information[7]. The three semiotic categories that are relevant in regard to discussing data quality are; syntactic quality, semantic quality, and pragmatic quality[8]. These categories provide a base for measuring data quality and are important terms to recognize before establishing a DQM program.
Syntactic Data Quality
The goal of syntactic data quality is consistency. Consistency concerns the use of consistent syntax and symbolic representation for particular data. Syntactic quality can be measured based on percentage of inconsistencies in data values. Consistency is often developed through a set of rules concerning syntax for data input[9].
Semantic Data Quality
The goal of semantic data quality is accuracy and comprehensiveness. Accuracy can be defined as the degree of conformity a data element holds compared to the truth of the real world. Comprehensiveness can be understood as the extent to which relevant states in the real world system are represented in a data warehouse[10]. Properties that fall under semantic quality are completeness, unambiguity, meaningfulness and correctness[11].
Pragmatic Data Quality
The goal of pragmatic data quality concerns usability and usefulness. Usability refers to how easy it is for a stakeholder to be able to effectively interact and access the data while usefulness refers to the ability of the data in supporting the stakeholder in accomplishing tasks and decision making. Data may be more useful/usable for some stakeholders than others, depending on their ability to interpret the data and the context of their tasks. Pragmatic data quality involves the properties of timeliness, conciseness, accessibility, reputability and understood[12].
Framework: Data Quality Life Cycle
Insert Diagram!
Quality Management
ISO 9001, reasons and benefits of implementing a quality management system
Fundamental Principles of a Data Quality Management Program
Before investigating the principles that make up a DQM program, it is important to recognize that DQM often functions as one of many building blocks of a larger data governance program[13]. Figure 3.A highlights the various functions which make up a data governance program, these include; DQM, data architecture, metadata management, master data management, data distribution, data security, and information lifecycle management. Therefore, DQM does not touch upon these other building blocks of data governance, however, there is often a strong interplay between the different functions. The ISO 8000 guidelines define the three pillars of a DQM program as: People, Processes, and Improvement. The following section will explain the three pillars of a DQM program, according to ISO specifications.
Three Pillars of DQM Program
People
Process
Improvement
ISO 8000-61 Framework for the DQM Process
The Basic Structure of the DQM Process
Insert ISO Inspired Diagram
Detailed Structure of the DQM Process
Implementation
Plan Do Check Act
Data-Related Support
Resource Provision
Glossary
DQM: Data Quality Management //ISO: International Organisation for Standardization
Bibliography
Batini, C. and Scannapieco, M. (2006): Data Quality: Concepts, Methodologies and Techniques. Berlin: Springer. This book explores various concepts, methodologies and techniques involving data quality processes. It provides a solid introduction to the topic of data quality.
Knowledgent (2014): Building a Successful DQM Program. Knowledgent White Paper Series. This paper provides an introduction to DQM within enterprise information management, explaining the basic concepts behind DQM and also explaining the data quality cycle framework.
Shanks, G. and Darke, P. (1998): Understanding Data Quality in a Data Warehouse: A Semiotic Approach. Massachusetts USA: University of Massachusetts Lowell, pg. 292-309. This paper provided an overview of data quality measures using a semiotic approach, explaining each semiotic level and how they are interlinked to data quality. The semiotic theory discussed is similar to the one later adopted by the ISO 8000-8 standard for data quality.
References
- ↑ Pg. 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent
- ↑ Pg. 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent
- ↑ 2017 ed. ISO 8000-2:2015 Data Quality - Part 2: Vocabulary, ISO
- ↑ Pg. 2, 2006 ed. Data Quality: Concepts, methodologies and Techniques, Carlo Batini & Monica Scannapieca
- ↑ 2017 ed. ISO 8000-2:2015 Data Quality - Part 2: Vocabulary, ISO
- ↑ Pg. 6, 2006 ed. Data Quality: Concepts, methodologies and Techniques, Carlo Batini & Monica Scannapieca
- ↑ Pg. 298, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ 2015 ed. ISO 8000-8:2015 Data Quality - Part 8: Information and data quality: Concepts and measuring, ISO
- ↑ Pg. 303, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 301, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 303, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 302, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Page 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent