Data Quality Flashcards
What is the DMBoK definitiion of data quality management?
The planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers.
What are the 4 business drivers for establishing a formal Data Quality Management program?
- Increasing the value of organizational data and the opportunities to use it
- Reducing risks and costs associated with poor quality data
- Improving organizational efficiency and productivity
- Protecting and enhancing the organization’s reputation
7 direct costs are associated with poor quality data. Name 4
- Inability to invoice correctly
- Increased customer service calls and decreased ability to resolve them
- Revenue loss due to missed business opportunities
- Delay of integration during mergers and acquisitions
- Increased exposure to fraud
- Loss due to bad business decisions driven by bad data
- Loss of business due to lack of good credit standing
What are the 4 goals Data Quality programs focus on?
- Developing a governed approach to make data fit for purpose based on data consumers’ requirements
- Defining standards and specifications for data quality controls as part of the data lifecycle
- Defining and implementing processes to measure, monitor, and report on data quality levels
- Identifying and advocating for opportunities to improve the quality of data, through changes to processes and systems and engaging in activities that measurably improve the quality of data based on data consumer requirements
Data Quality programs should be guided by these 10 principles
- Criticality:
- Lifecycle management
- Prevention
- Root cause remediation
- Governance
- Standards-driven
- Objective measurement and transparency
- Embedded in business processes
- Systematically enforced
- Connected to service levels
Which principle of Data Quality Management is to focus improvement efforts on data that is most important to the organization and its customers?
Criticality or Critical Data
What are the six core dimensions of data quality?
- Completeness: The proportion of data stored against the potential for 100%.
- Uniqueness: No entity instance (thing) will be recorded more than once based upon how that thing is identified.
- Timeliness: The degree to which data represent reality from the required point in time.
- Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition.
- Accuracy: The degree to which data correctly describes the ‘real world’ object or event being described.
- Consistency: The absence of difference, when comparing two or more representations of a thing
against a definition.
The _________ cycle is a problem-solving model known as “plan-do-check-act’.
Shewhart / Deming
In the ________ stage of the DQ Improvement Life Cycle, the Data Quality team assesses the scope, impact, and priority of known issues, and evaluates alternatives to address them.
Plan
In the ________ stage of the DQ Improvement Life Cycle, the DQ team leads efforts to address the root causes of issues and plan for ongoing monitoring of data.
Do
In the ________ stage of the DQ Improvement Life Cycle, the team actively monitors the quality of data as measured against requirements. As long as data meets defined thresholds for quality, additional actions are not required.
Check
In the ________ stage of the DQ Improvement Life Cycle, activities occur to address and resolve emerging data quality issues.
Act
What framework focuses on data consumers’ perceptions of data. It describes 15 dimensions across four general categories of data quality:
Strong-Wang Framework
What 4 general categories are described in the Strong-Wang framework?
- Intrinsic DQ
- Contextual DQ
- Representational DQ
- Accessibility DQ
In the Strong-Wang framework, What 4 dimensions are there in Intrinsic Data Quality?
o Accuracy
o Objectivity
o Believability
o Reputation
In the Strong-Wang framework, Which of these dimensions is not part of Contextual Data Quality?
o Value-added
o Interpretability
o Timeliness
o Completeness
o Appropriate amount of data
Interpretability. Should be relevancy