Data Quality Flashcards

1
Q

Data Quality Indicators (TARMAC)

A

Trackability, Acceptibility, Relevance, Measureability, Accountability, Controllability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

TARMAC - trackability

A

Can measure data quality over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TARMAC - Acceptability

A

be able to define what good looks like

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

TARMAC - Relevance

A

Make sure measuring something relevant to the business

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TARMAC - Measureability

A

What will actually be measured? how can it be measured?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

TARMAC - Accountability/Stewardship

A

Who will be held accountable if it goes wrong?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

TARMAC - Controllability

A

Defining remedial actions in advance of the thing going wrong

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What must you know to be able to define quality quality?

A

the purpose of use of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you manage data quality when you don’t know the purpose?

A

don’t over assume - stick to basic validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Reference Data

A

data not subject to change e.g., identifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Master Data

A

Descriptive attributes of business entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can be defined as data standards?

A

data types, acceptable values, attribute domains, metadata format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two type of data quality management?

A
  1. Governing/Strategic
  2. Tactical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is tactical data quality management?

A

short terms fixing of problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is governing/strategic data quality management?

A

Overarching long term goals e.g., root cause analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Common DQ mistakes

A
  • failing to consider the intended use of the data
  • Confusing Validity and accuracy
  • treating it as a one time activity
  • not fixing at the source
  • applying software quality principle’s
  • laziness, blaming the system
  • believing good data quality is the end goal (not the use of that data)
  • believing that quantity beats quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data quality firewall

A

taking external data and applying data cleansing before it is stored in the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Impacts of of poor data

A
  1. aggravation
  2. loss of reputation
  3. loss of business
  4. regulatory risk
  5. loss of life
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is it important to communicate the cost of poor quality data?

A

to raise awareness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data Quality Management Cycle

A

Plan -> Deploy -> Monitor -> Action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the four data quality governance steps

A
  • standardisation
  • assignment
  • escalation
  • completion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Causes of data issues

A

Human causes
Organisational (system)
Physical causes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

roles of a the data quality oversight board

A
  • setting data quality improvement priorities
  • establishing communications & feedback mechanisms
  • producing certification & compliance policies
  • approving data quality strategies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data Quality Service Level Agreement (SLA) will include

A

defining roles & responsibilities for data quality

25
Q

What is a key process of defining data quality business rules

A

separating data that does not meet business needs from the data that dies

26
Q

Why is top down and bottom up profiling best done together?

A

it balances the business relevance and the actual state of the data

27
Q

Steps in root cause analysis

A
  1. define the problem
  2. collect data
  3. identify all possible casual factors
  4. identify root causes(s)
  5. recommend and implement solutions
28
Q

Dimensions of data quality

A

Completeness
Consistency
Currency
Reasonableness
Integrity
Timeliness
Validity
Accuracy
Uniqueness
(privacy)
(precision)

29
Q

Completeness Data Quality DImension

A

All mandatory values are present

30
Q

Consistency Data Quality DImension

A

Data of one concept corresponds with the same concept in another system

31
Q

Currency Data Quality DImension

A

Is the data up to date

32
Q

Reasonableness Data Quality Dimension

A

Business rules, does it feel right/ is it inline with what is expected

33
Q

Integrity Data Quality Dimension

A

Child data must have a parent

34
Q

Timeliness Data Quality Dimension

A

Accessibility/ availability

35
Q

Validitity Data Quality Dimension

A

Is the value in the correct domain?

36
Q

Accuracy Data Quality Dimension

A

Does the data correctly represent the real life model

37
Q

Uniqueness Data Quality Dimension

A

business concept must not be duplicated.

38
Q

How to measure data quality?

A

Stats (sampling / basic summaries/ process control charts)
Profiling ( manual, tools, columnar, intra-table, cross-table, cross-table)
information flow diagrations

39
Q

process of profiling

A
  1. identify subset of data
  2. understand business use
  3. put into profiling tool
  4. list potential anomalies
  5. prioritise critically
40
Q

Whats the output of profiling tool?

A

counts, summaries, data types, PKs, percentage of completeness, identification (e.g., duplicated records, out of range values).

41
Q

Inspecting the quality of data using statistical techniques is called

A

data profiling

42
Q

Advantages of defining data quality rules upfront

A
  • setting clear expectations for data quality
  • creating the foundation for ongoing data quality measurement
  • providing the requirements for system control to prevent quality issues
  • provide data quality requirements to external parties
43
Q

What 3 levels of data granularity should you measure for data quality

A

Data element value, record & dataset

44
Q

How is data quality management and data governance linked

A
  • both are essential for organisation success
  • both ongoing efforts
  • governance supports DQ
  • DQ sustains governance
45
Q

Where to focus for DQ

A
  • focus on critical data
  • focus on preventing errors (not just fixing)
  • address the root cause of problems
  • enforce quality standards
46
Q

Shewhart data quality cycle stages

A

plan do check act
(plan deploy monitor act)

47
Q

When measuring data quality which three levels of granularity should you measure?

A

Data element, record, dataset

48
Q

Data profiling software

A

Data profiling software investigates data to understand its structure, content and quality. It helps us find patterns and problems in the data.

49
Q

Data Quality dimensions

A

Accuracy
Completeness
Integrity
Uniqueness
Consistency

50
Q

Data Accuracy

A

How closely data represents reality
(hard to measure, compared to trusted sources e.g. check that postcodes match real postcodes)

51
Q

Complete data

A

all data is present, no gaps
(depends on mandatory or optional fields)

52
Q

Data consistency

A

Making sure 2 or more representation of something are the same

53
Q

Data Integrity

A

Making sure data is complete, accuracy and consistent
(making sure data objects are connected properly)

54
Q

Referential Integrity

A

the connections between data objects it consistent

55
Q

Internal Consistency Problem Example

A

list of names and emails, where 2 people have the same name, or some names don’t have emails.

56
Q

Orphan

A

A data object with a missing or invalid reference to another data object

57
Q

Data Quality Oversight Board

A

Provides strategic direction with policies & activities.

58
Q

Data Value Domain

A

A set of rules that describe the set of values that can be taken.

59
Q

Business rules are not required for….

A

critical data improvement