Data Quality Flashcards

1
Q

What is the DMBoK definitiion of data quality management?

A

The planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 business drivers for establishing a formal Data Quality Management program?

A
  • Increasing the value of organizational data and the opportunities to use it
  • Reducing risks and costs associated with poor quality data
  • Improving organizational efficiency and productivity
  • Protecting and enhancing the organization’s reputation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

7 direct costs are associated with poor quality data. Name 4

A
  • Inability to invoice correctly
  • Increased customer service calls and decreased ability to resolve them
  • Revenue loss due to missed business opportunities
  • Delay of integration during mergers and acquisitions
  • Increased exposure to fraud
  • Loss due to bad business decisions driven by bad data
  • Loss of business due to lack of good credit standing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 4 goals Data Quality programs focus on?

A
  • Developing a governed approach to make data fit for purpose based on data consumers’ requirements
  • Defining standards and specifications for data quality controls as part of the data lifecycle
  • Defining and implementing processes to measure, monitor, and report on data quality levels
  • Identifying and advocating for opportunities to improve the quality of data, through changes to processes and systems and engaging in activities that measurably improve the quality of data based on data consumer requirements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Quality programs should be guided by these 10 principles

A
  • Criticality:
  • Lifecycle management
  • Prevention
  • Root cause remediation
  • Governance
  • Standards-driven
  • Objective measurement and transparency
  • Embedded in business processes
  • Systematically enforced
  • Connected to service levels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which principle of Data Quality Management is to focus improvement efforts on data that is most important to the organization and its customers?

A

Criticality or Critical Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the six core dimensions of data quality?

A
  • Completeness: The proportion of data stored against the potential for 100%.
  • Uniqueness: No entity instance (thing) will be recorded more than once based upon how that thing is identified.
  • Timeliness: The degree to which data represent reality from the required point in time.
  • Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition.
  • Accuracy: The degree to which data correctly describes the ‘real world’ object or event being described.
  • Consistency: The absence of difference, when comparing two or more representations of a thing
    against a definition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The _________ cycle is a problem-solving model known as “plan-do-check-act’.

A

Shewhart / Deming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the ________ stage of the DQ Improvement Life Cycle, the Data Quality team assesses the scope, impact, and priority of known issues, and evaluates alternatives to address them.

A

Plan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In the ________ stage of the DQ Improvement Life Cycle, the DQ team leads efforts to address the root causes of issues and plan for ongoing monitoring of data.

A

Do

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In the ________ stage of the DQ Improvement Life Cycle, the team actively monitors the quality of data as measured against requirements. As long as data meets defined thresholds for quality, additional actions are not required.

A

Check

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the ________ stage of the DQ Improvement Life Cycle, activities occur to address and resolve emerging data quality issues.

A

Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What framework focuses on data consumers’ perceptions of data. It describes 15 dimensions across four general categories of data quality:

A

Strong-Wang Framework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What 4 general categories are described in the Strong-Wang framework?

A
  • Intrinsic DQ
  • Contextual DQ
  • Representational DQ
  • Accessibility DQ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the Strong-Wang framework, What 4 dimensions are there in Intrinsic Data Quality?

A

o Accuracy
o Objectivity
o Believability
o Reputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the Strong-Wang framework, Which of these dimensions is not part of Contextual Data Quality?
o Value-added
o Interpretability
o Timeliness
o Completeness
o Appropriate amount of data

A

Interpretability. Should be relevancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

As part of the Strong-Wang Framework, which data quality category do these dimension belong?
o Interpretability
o Ease of understanding
o Representational consistency
o Concise representation

A

Representational DQ

18
Q

In the Strong-Wang Framework Accessibility DQ category there are two dimensions, what are they?

A

o Accessibility
o Access security

19
Q

There are 8 DQ issues caused by Poor System Design, name 6 of them.

A
  • Failure to enforce referential integrity
  • Failure to enforce uniqueness constraints
  • Coding inaccuracies and gaps
  • Data model inaccuracies
  • Field overloading: Re-use of fields over time for different purposes,
  • Temporal data mismatches: In the absence of a consolidated data dictionary, multiple systems could implement disparate date formats or timings, which in turn lead to data mismatch and data loss when
    data synchronization takes place between different source systems.
  • Weak Master Data Management
  • Data duplicatiom: Single Source / Multiple Local Instances
    o Multiple Sources / Single Instance
20
Q

________________ is a form of data analysis used to inspect data and assess quality. It uses statistical techniques to discover the true structure, content, and quality of a collection of data.

A

Data Profiling

21
Q

Name the 2 activities prevalent in Data Quality Management

A

Maturity Assessment and Profiling

22
Q

What are the 5 statistical techniques used to inspect data and assess quality in data profiling?

A
  • Counts of nulls
  • Max/Min value
  • Max/Min length
  • Frequency distribution of values for individual columns
  • Data type and format
23
Q

Profiling also includes __________ analysis, which can identify overlapping or duplicate columns and expose embedded value dependencies.

A

cross-column

24
Q

_____________ analysis explores overlapping values sets and helps identify foreign key relationships.

A

Inter-table

25
Q

______________ is the process of adding attributes to a data set to increase its quality and usability

A

Data enhancement or enrichment

26
Q

There are 8 type of data enrichment or enhancements listed in DMBok. Name 6 of them.

A
  • Time/Date stamps
  • Audit data
  • Reference vocabularies
  • Contextual information
  • Geographic information
  • Demographic information
  • Psychographic information
  • Valuation information
27
Q

Which data enrichment or enhancement type adds information such as location, environment, or access methods and tagging data for review and analysis.

A

Contextual Information

28
Q

Which data enrichment or enhancement type is used to segment the target populations by specific behaviors, habits, or preferences, such as product and brand preferences, organization memberships, leisure activities, commuting transportation style, shopping time preferences, etc.?

A

Psychographic information

29
Q

Which data enrichment or enhancement type would include data lineage?

A

Audit data

30
Q

Which data enrichment or enhancement type uses business specific terminology, ontologies, and glossaries to enhance understanding and control while bringing customized business context.

A

Reference vocabularies

31
Q

Which data enrichment or enhancement is used for asset valuation, inventory, and sale?

A

Valuation Information

32
Q

_____________ is the process of analyzing data using pre-determined rules to define its content or value. This process enables the data analyst to define sets of patterns that feed into a rule engine used to distinguish between valid and invalid data values. Matching specific pattern(s) triggers actions.

A

Data Parsing

33
Q

In Data Quality, having proven that the improvement process can work, the next goal is to apply it strategically. Doing so requires _______ and ______ potential improvements.

A

identifying and prioritizing

34
Q

Provide continuous monitoring by incorporating ________ and __________ processes into the information processing flow.

A

control and measurement

35
Q

Data quality incident tracking requires staff be trained on how issues should be _______, _________, and _________.

A

logged, classified, and tracked

36
Q

Data quality reporting should focus on these 7 areas.

A
  • Data quality scorecard, which provides a high-level view of the scores associated with various metrics, reported to different levels of the organization within established thresholds
  • Data quality trends, which show over time how the quality of data is measured, and whether trending is up or down
  • SLA Metrics, such as whether operational data quality staff diagnose and respond to data quality incidents in a timely manner
  • Data quality issue management, which monitors the status of issues and resolutions
  • Conformance of the Data Quality team to governance policies
  • Conformance of IT and business teams to Data Quality policies
  • Positive effects of improvement projects
37
Q

Name 6 ways to prevent poor quality data from entering an organization.

A
  • Establish data entry controls
  • Train data producers
  • Define and enforce rules: Create a ‘data firewall,’ which has a table with all the business data quality rules used to check if the quality of data is good, before being used in an application such a data
    warehouse.
  • Demand high quality data from data suppliers
  • Implement Data Governance and Stewardship: Ensure roles and responsibilities are defined that describe and enforce rules of engagement, decision rights, and accountabilities for effective management of data and information assets (McGilvray, 2008). Work with data stewards to revise theprocess of, and mechanisms for, generating, sending, and receiving data.
  • Institute formal change control: Ensure all changes to stored data are defined and tested before being implemented. Prevent changes directly to data outside of normal processing by establishing gating processes.
38
Q

____________ actions are implemented after a problem has occurred and been detected.

A

Corrective

39
Q

What are 3 corrective actions that can be taken against poor data?

A
  • Automated correction
  • Manually-directed correction
  • Manual correction
40
Q

Which corrective action uses automated tools to remediate and correct data but requires manual review before committing the corrections to persistent storage.

A

Manually-directed correction

41
Q

What are the 4 goals of a data quality program?

A
  • Developing a governed approach to make data fit for purpose based on data consumers’ requirements
  • Defining standards and specifications for data quality controls as part of the data lifecycle
  • Defining and implementing processes to measure, monitor, and report on data quality levels
  • Identifying and advocating for opportunities to improve the quality of data, through changes to processes and systems and engaging in activities that measurably improve the quality of data based on data consumer requirements