Data Quality Management
Developed by Oliver Adam Mølskov Bech
Contents |
Abstract
Data quality has a significant impact on both the efficiency and effectiveness of organisations[1]. As part of the digital transformation, data has become more readily available and more important than ever before. Project teams are performing data analytics to leverage key resources and optimise processes to gain insight and make informed decisions. As such, data is becoming increasingly valuable to project managers who are driving decision making based on high-quality data insight. However, if data quality is poor, project managers risk taking misguided decisions based on unreliable data. Data quality management (DQM) can be utilised to ensure communication and decisions are being driven based on high-quality data.
DQM serves the objective of continuously improving the quality of data relevant to a project or program within an organisation[2]. It is important to understand that the end goal of DQM is not about simply enhancing data quality in the interest of having high-quality data, but rather to achieve desired project or program outcomes that rely on high-quality data[3]. DQM revolves around the management of people, processes, technology and data through coordinated activities aimed at improving data quality[4]. The following article explores DQM as a process that can be applied by project and program managers alike, delving deeper into the meaning behind the term data quality and investigating the process for DQM as reflected by the ISO 8000-61 framework.
Overview
Importance of Data Quality
Data is defined as "reinterpretable representation of information in a formalised manner suitable for communication, interpretation, or processing" (ISO 8000-2)[5] while quality is defined as "the degree to which a set of inherent characteristics fulfil requirements" (ISO 9000)[6]. One can argue that data quality has the ability to impact stakeholders involved in each knowledge area of project management. For example, a supply chain planner may expect a given material to be received on a specific date based on data for a vendor lead time, however if the data quality for the lead time is highly inaccurate, then the supply chain planner may be misguided in terms of project time management and conduct inappropriate planning of schedule development. Similarly, a project cost controller overseeing a construction project may provide inaccurate input for project cost management if the data quality for different cost parameters is inaccurate. Likewise, a risk consultant using master data for risk quantification may provide project managers with an inaccurate risk assessment which could detrimentally influence a project outcome through poor project risk management - due to low-quality data.
Data is used everywhere and within each area of project management and as such it can have an enormous influence on both project execution and outcomes. It is for this reason that DQM is a highly important process. DQM is a process which is relevant and can be utilised within several areas of the Project Management Body of Knowledge (PMBOK®) guide. One area of particular relevance for the DQM process to be applied is the project quality management area. The project quality management area provides many parallels between the purpose and guiding principles of the DQM process. As such, the project quality management area and its link to DQM is briefly described below.
DQM within Project Quality Management
The PMBOK® guide describes project quality management as a set of processes and activities of an organisation aimed at defining "quality policies, objectives and responsibilities" in order for a project to fulfil the requirements of its purpose sufficiently[7]. DQM is very relevant as a supporting process within project quality management as it can be used to enhance processes ensuring project quality requirements are satisfied. As described in the PMBOK® guide, project quality management relies on the use of procedures and policies to support an organisation's quality management system[8]. Project quality management is divided into three underlying processes, these are: plan quality management, perform quality assurance, and quality control, this is illustrated in figure 1[9].
- Plan Quality Management: process involving defining quality requirements for a project deliverable and documenting how specified quality requirements will achieve compliance.
- Perform Quality Assurance: process of ensuring requirements for quality standards are fulfilled. Includes auditing of quality requirements and use of quality control results to perform quality assurance.
- Control Quality: monitoring process involving control activities related to quality performance. Includes providing future recommendations to improve quality based on findings from monitoring activities, i.e. striving for continuous improvement.
The guidelines for project quality management as described in PMBOK® is designed to be compatible with ISO standards[10]. The DQM framework described in this article is based off the 8000-61 ISO guidelines and as such there are many parallels which allow for this DQM process to be compatible and function as a tool to support the overarching project quality management processes. Parallels in ISO compatibility between the DQM process and the project quality management processes include concepts such as continuous improvement using plan-do-check-act (PDCA) cycles[11]. As stated by PMBOK®, every project should ideally include and follow a plan for quality management, and such a plan should also include data to support compliance for quality requirements[12].
Data Quality
An important element of DQM is understanding the dimensions and complexity of the term data quality. As per ISO 8000-2 guidelines, data quality is defined as the "degree to which a set of inherent characteristics of data fulfils requirements"[13]. However, data quality is a multifaceted concept which considers various dimensions[14] and it can therefore be difficult to define in one sentence, let alone measure. Data quality dimensions in literature consider: accuracy, completeness, consistency, conformity, integrity, precision, privacy, representation, timeliness, uniqueness, unambiguity and validity. The ISO 8000-8 guidelines divide data quality into three manageable categories based on semiotic theory. Semiotic theory concerns the usage of symbols such as letters and numbers to communicate information[15]. The three semiotic categories that are relevant in regard to discussing data quality are; syntactic quality, semantic quality, and pragmatic quality[16]. As illustrated in figure 2, these categories provide a base for measuring data quality and are important terms for understanding overall data quality. To successfully conduct the DQM process one needs to understand the principles that define data quality.
Syntactic Data Quality
The goal of syntactic data quality is consistency. Consistency concerns the use of consistent syntax and symbolic representation for particular data. Syntactic quality can be measured based on percentage of inconsistencies in data values. Consistency is often developed through a set of rules concerning syntax for data input[17].
Semantic Data Quality
The goal of semantic data quality is accuracy and comprehensiveness. Accuracy can be defined as the degree of conformity a data element holds compared to the truth of the real world. Comprehensiveness can be understood as the extent to which relevant states in the real world system are represented in a data warehouse[18]. Properties that fall under semantic quality are completeness, unambiguity, meaningfulness and correctness[19].
Pragmatic Data Quality
The goal of pragmatic data quality concerns usability and usefulness. Usability refers to how easy it is for a stakeholder to be able to effectively interact and access the data while usefulness refers to the ability of the data to support a stakeholder in accomplishing tasks and aid decision-making. Data may be more useful/usable for some stakeholders than others, depending on their ability to interpret the data and the context of their tasks. Pragmatic data quality involves the properties of timeliness, conciseness, accessibility, reputability and understood[20].
Figure 3 highlights the various categories addressed above and summarises each quality category goal, properties and example methods for measuring empirically. There are various empirical models that build on the semiotic theory for categorizing data quality, some of which use different regression and weighting models to empirically measure data quality. These can be studied further in Moody, et al's paper: Evaluating the quality of process models: empirical analysis of a quality framework.
Fundamental Principles of DQM
It is important to recognize that DQM often functions as one of many building blocks of a larger data governance project or program[21]. Figure 4 highlights the various tools and building blocks which make up data governance, these include; DQM, data architecture, metadata management, master data management, data distribution, data security, and information lifecycle management. Therefore, DQM does not touch upon these other building blocks of data governance, however, there is often some overlap between the different functions. DQM functions as a support to the "processes, roles, and standards" of data governance[22]. The ISO 8000-61 guidelines define the three fundamentals of DQM as: process approach, continuous improvement, and involvement of people. The three fundamental principles of DQM act as pillars in building and managing a process for the assurance of high-quality data, as illustrated in figure 5.
ISO 8000-61 Principles of DQM
Process Approach
The first fundamental principle is the process approach, this principle concerns defining and operating the processes that use, create and update relevant data[23]. This principle states that a successful DQM program requires a process approach to managing key process activities. The process approach involves defining and operating recurring and reliable processes to support DQM.
Continuous Improvement
The principle of continuous improvement forms the second fundamental, this principle establishes the idea that data most be constantly improved through effective measurement, remediation and corrective action of data nonconformities. As stated by the ISO 8000-61 guidelines, continuous improvement depends on "analysing, tracing and removing the root causes of poor data quality" which may require adjustments to faulty processes[24]. This fundamental is closely linked to the concept of Kaizen and is also an important approach of project quality management described in PMBOK®.
Involvement of People
The third fundamental principle highlights the importance of people to DQM. This principle states that different responsibilities are allocated to individuals at different levels within a project or program. These people include managers, data specialists and end users. Top level management provide necessary and sufficient resources to guide the DQM process towards achieving specific goals in regard to data quality. Data specialist perform activities such as implementation of processes, intervention, control and the embedding of future processes for continuous improvement. While end users perform direct data processing activities such as input of data and analysis. End users typically have the greatest direct influence on actual data quality as these are also the individuals in closest contact to the data itself[25].
ISO 8000-61 Framework for the DQM Process
The Basic Structure of the DQM Process
The basic structure of the DQM process is illustrated in figure 6. The structure is illustrated by three overarching and interlinked processes; implementation, data-related support, and resource provision.[26].
- Implementation Process: this stage is aimed at achieving continual improvement of data quality using a systematic and cyclic PDCA process. The cycle involves planning (plan), control (do), assurance (check), and improvement (act).
- Data-Related Support: this stage provides input in the form of information and technology related support to the implementation stage.
- Resource Provision: this stage involves training of individuals performing data related tasks and providing sufficient resources to effectively and efficiently manage the implementation and data-related support processes. This includes resource provision of for example IT systems and various data collection instruments. As highlighted by figure 6, it provides input to both implementation and data-related support.
The PDCA cycle of the implementation stage is a process promoting high level data quality through continuous improvement. The PDCA cycle is described in ISO 8000 in terms of DQM as follows:[27]
- Plan: developing strategic plans of action for implementation and delivery of results in regard to data quality requirements. Plan to enhance and maintain quality.
- Do: implement and conduct plan for data quality control.
- Check: monitor, measure and compare results against data requirements. Conduct performance reporting.
- Act: remediate for continuous improvement of process. Also concerns preventing and reducing undesired effects noticed in the check stage.
The Detailed Structure of the DQM Process
Figure 7 reveals the lower levels of the DQM process and are explained accordingly:
Implementation Process
Implementation of the DQM process includes the following four sub-processes involving data quality planning, control, assurance, and improvement.
- Data quality planning: the four sub-processes involved in data quality 'planning' concern: requirements management, data quality strategy management, policies and procedures and implementation planning. Data quality planning holds the purpose of creating and refining a strategy for data quality objectives that is in alignment with the data quality requirements. Strategy management provides goals for the DQM process. Implementation planning establishes a plan of action for achieving the strategic goals defined. Planning ensures the 'do' phase of implementation is carried out according to a valid plan. The planning phase involves carefully planning and balancing data quality levels, cost, and resources to meet needs[28].
- Data quality control: this phase is conducted based on the data quality plan established prior. It concerns three sub-processes regarding: provision of data specifications and work instructions, data processing, and monitoring and control activities. This is the 'doing' phase which ensures data quality is delivered to a standard which fulfils project needs and stakeholder requirements. The main activities include creating, using, and updating of data according to standard operating procedures. It also involves monitoring to ensure data is conforming to specifications[29].
- Data quality assurance: this phase involves 'checking', it considers four sub-processes known as: review of data quality issues, provision of measurement criteria, measurement of data quality performance, and evaluation of measurement results. This phase involves activities such as measuring data quality levels and process performance. It aims to provide an assessment of the data quality level and identifying potential data issues or nonconformities through evaluation and analysis of data[30].
- Data quality improvement: this is the 'act' phase of implementation which consists of the following sub-processes: root causes analysis and solution development, data cleansing, and process improvement. It concerns improving the data quality by remediating and correcting data nonconformities. This phase takes the results of the data quality assurance phase and investigates the root causes of any data issues identified. The root cause analysis helps tackle and build solutions that address the roots of issues addressed. Data cleansing also ensures that data sets that previously contained nonconformities are corrected. Often, processes will need to be transformed so that the same nonconformities do not reoccur. The solutions developed in this phase 'act' as input to the 'plan' phase[31].
Data-Related Support Process
The data-related support process provides input data, control information and support to the implementation process[32]. This process is divided into data architecture management, data transfer management, data operations management and data security management. Data architecture management concerns the disposition of data, in terms of structure and storage. Data transfer management ensures records of data are kept to guarantee traceability and transparency of data flow. This is in regard to data flow both within and out of an organisation. Data operations management maintains the technology required for operation of the DQM process. Lastly, the data security process concerns data confidentiality and accessibility.
Resource Provision Process
This process is divided into two sub-processes which are: data quality organisation management and human resource management. This process allows the entire DQM process to function, providing input to the two other overarching processes. This process involves providing the necessary resources and management required for both data-related support and implementation to function. Data quality organisation management involves establishing the organisational structure of DQM process. This entails allocating roles, responsibilities, and giving authority for the execution of tasks. Human resource management in the DQM process involves training individuals to perform the required data quality tasks. This entails developing the knowledge and skills of the individuals within the organisation in regard to the DQM process[33].
Limitations and Difficulties of DQM
There are a number of difficulties associated with the DQM process explained above. Some of the most prevalent limitations are described below:
- Defining data quality requirements: in the implementation phase, the strategy aims to define the goals and objectives in terms of achieving sufficient data quality. However, defining the right strategy for the required data quality level can be immensely difficult due to the complexity of the term data quality. If the goals and objectives are set too high, the data quality level may be excessively high compared to the actual needs and requirements. However, if the goals for data quality are set too low, there is an increased risk that project outcomes may be adversely influenced. As mentioned earlier, the strategy must balance goals, costs and resources closely. Defining the required data quality level for the project is crucial in achieving the right balance, something that can be very difficult.
- Complexity of processes: the activities involved in the DQM process can be complex and require extensive analysis and understanding of different IT systems. Complexity of the subject area can be an obstacle as it requires training and development of knowledge. In other words; it is not a simple process.
- Number of processes: the number of different processes and sub-processes involved in running a successful DQM process can be time and cost consuming.
- Obtaining resource provision: it may be difficult to obtain support for the provision of the required resources for a successful DQM system. Appropriate IT systems and data specialists are required, both of which can be costly to a project or program.
- Extensive documentation of data: the DQM process requires data architecture and data transfer management that allows data to be stored systematically to ensure traceability and transparency of data. Over time, the accumulation of data can be extremely large and data warehouses can be difficult to manage as they get larger.
- Initiation: the PDCA cycle of the DQM process does not include an initiation phase. Without an initiation phase clearly defined, it can be difficult to kick-start a DQM process. Perhaps the: initiating, planning, executing, controlling and closing (IPECC) scheme highlighted in PMBOK® could be used to formulate a superior framework in the future as it also helps define procedures for initiating and closing of a process.
Abbreviations
- DQM: Data Quality Management
- IPECC: Initiating-Planning-Executing-Controlling-Closing
- ISO: International Organisation for Standardization
- PDCA: Plan-Do-Check-Act
- PMBOK: Project Management Body of Knowledge
Bibliography
Batini, C. and Scannapieco, M. (2006): Data Quality: Concepts, Methodologies and Techniques. Berlin: Springer. This book explores various concepts, methodologies and techniques involving data quality processes. A solid introduction to the topic of data quality and the dimensions used to measure data quality.
ISO 8000-2. (2017): Data Quality - Part 2: Vocabulary. International Organisation for Standardisation. Ref: ISO 8000-2:2017(E). The ISO standard for data quality vocabulary. It provides clear, concise, and authoritative definitions of data quality terms.
ISO 8000-61. (2016): Data Quality - Part 61: Data Quality Management: Process Reference Model. International Organisation for Standardisation. Ref: ISO 8000-61:2016(E). This is the ISO standard for data quality management processes. It provides an excellent and concise overview of the industry best practices regarding DQM processes, explaining the fundamental principles behind DQM and elaborating on process procedures through a framework guide.
ISO 8000-8. (2015): Data Quality - Part 8: Information and Data Quality: Concepts and Measuring. International Organisation for Standardisation. Ref: ISO 8000-2:2017(E). This is the ISO standard data quality concepts and measuring theory. It introduces the semiotic theories for data quality.
Knowledgent. (2014): Building a Successful DQM Program. Knowledgent White Paper Series. This paper provides an introduction to DQM within enterprise information management, explaining the basic concepts behind DQM and also explaining the data quality cycle framework.
PMI. (2013): A Guide to the Project Management Body of Knowledge (PMBOK® Guide), 5th ed. Pennsylvania USA. The PMBOK® guide provides a wealth of knowledge regarding the different knowledge bodies of project management. This article references chapter 8 of the PMBOK®: project quality management.
Shanks, G. and Darke, P. (1998): Understanding Data Quality in a Data Warehouse: A Semiotic Approach. Massachusetts USA: University of Massachusetts Lowell, pg. 292-309. This paper provided an overview of data quality measures using a semiotic approach, explaining each semiotic level and how they are interlinked to data quality. The semiotic theory discussed is similar to the one later adopted by the ISO 8000-8 standard for data quality.
References
- ↑ Pg. 2, 2006 ed. Data Quality: Concepts, methodologies and Techniques, Carlo Batini & Monica Scannapieca
- ↑ Pg. 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent
- ↑ Pg. 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent
- ↑ 2017 ed. ISO 8000-2: Data Quality - Part 2: Vocabulary, ISO
- ↑ 2017 ed. ISO 8000-2: Data Quality - Part 2: Vocabulary, ISO
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ Chapter 8. 2013 ed. A Guide to Project Management Body of Knowledge (PMBOK®): Project Quality Management. ed 5. PMI
- ↑ 2017 ed. ISO 8000-2: Data Quality - Part 2: Vocabulary, ISO
- ↑ Pg. 6, 2006 ed. Data Quality: Concepts, methodologies and Techniques, Carlo Batini & Monica Scannapieca
- ↑ Pg. 298, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ 2015 ed. ISO 8000-8: Data Quality - Part 8: Information and data quality: Concepts and measuring, ISO
- ↑ Pg. 303, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 301, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 303, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 302, 1998. Understanding Data Quality in a Data Warehouse: A Semiotic Approach, Shanks, G and Darke, P.
- ↑ Pg. 3, 2014 ed. Building a Successful Data Quality Management Program, Knowledgent
- ↑ Pg. 19, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 2, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 2, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 2, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 2, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 3, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 6, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 9, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 11, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 13, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 15, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO
- ↑ Pg. 17, 2015 ed. ISO 8000-61: Data quality management: Process reference model, ISO