15 Data Quality and Management Flashcards

1
Q

What is quality control?

A

The process of testing data to ensure data integrity

Quality control is essential because bad data can lead to misleading results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When should you check for data quality?

A

Any time there is a major change, such as:
* Data acquisition
* Data transformation
* Data manipulation
* Final product review

Regular checks are also important beyond routine maintenance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data acquisition?

A

The process of obtaining new data

It requires checking for bias and the current state of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does data transformation involve?

A

Changing data from one form to another, including:
* Intrahops
* Pass-throughs
* Conversions

Transformations should ideally be done in new variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data manipulation?

A

Changing the shape of the data without altering its content

Examples include breaking down or combining variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the data quality dimensions?

A

Key dimensions include:
* Data consistency
* Data accuracy
* Data completeness
* Data integrity
* Data attribute limitations

These dimensions help assess the quality of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data consistency?

A

Ensuring data is uniform and reported the same way across different levels

This applies to both individual variables and broader databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is data accuracy?

A

Whether the data is correct

Checking data accuracy often involves verifying it against an outside source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is data completeness?

A

Checking for gaps in data, such as missing values or entire variables

This is essential for valid analyses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does data integrity encompass?

A

It includes consistency, accuracy, completeness, and security

Data integrity is crucial in regulated fields like pharmaceuticals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are data quality rules and metrics?

A

Guidelines that define acceptable data standards and formats

These include cutoff scores and conformity rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cross-validation?

A

A statistical analysis that checks if results can be generalized

It helps assess model effectiveness and reduce test error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are sample/spot checks?

A

Quick checks focusing on one or two data quality dimensions

They are often prompted by unusual data observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are reasonable expectations in data quality?

A

Assessing whether data values make sense based on historical norms

This can involve formalized processes for flagging outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data profiling?

A

A formal process that checks data quality across entire databases

It usually includes structure, content, and relationship discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a data audit?

A

A systematic check to see if a dataset meets specific goals

Audits are often scheduled and performed at all stages of the data lifecycle.

17
Q

What is master data management (MDM)?

A

The process of creating and managing a centralized data system

MDM aims to create a ‘golden record’ for improved data quality.

18
Q

When should MDM be used?

A

During:
* Mergers and acquisitions
* Compliance checks
* Streamlining data access

MDM helps integrate disparate data sources and manage protected data.

19
Q

What is the benefit of having a golden record?

A

It provides a single source of truth with clean, standardized data

This facilitates faster access and higher data quality.

20
Q

What challenges are associated with implementing MDM?

A

It can be labor-intensive and expensive to set up

Many companies may only implement MDM for specific data types.

21
Q

What is policy in the context of data management?

A

Policy is in reference to compliance, ensuring all records are organized for easier regulation checks.

22
Q

What does streamlining data access mean?

A

Streamlining data access allows faster retrieval of data from a single table without complex queries.

23
Q

What is the first step in the MDM process?

A

Consolidation

24
Q

What does consolidation involve in MDM?

A

Consolidation involves creating the golden record by combining data from multiple sources into one place.

25
Q

What is the purpose of standardization in MDM?

A

Standardization makes data uniform, ensuring all data works together and is consistent.

26
Q

What is a data dictionary?

A

A data dictionary is a document that defines variables, their attributes, structure, and relationships.

27
Q

Why are data dictionaries important?

A

They help ensure that multiple users understand the data and its usage.

28
Q

What does data quality control involve?

A

Data quality control involves checking for accuracy, consistency, and reliability of data.

29
Q

When should data quality be checked?

A

Data quality should be checked after data manipulation, after data transformation, and before the final report.

30
Q

Which of the following is a data quality dimension: Data completeness, Data retention, Rows passed, or Data manipulation?

A

Data completeness

31
Q

What is data profiling?

A

Data profiling is a structured formal process for assessing the quality and efficiency of an entire database.

32
Q

True or False: Acquisitions are an appropriate time to institute MDM.

33
Q

Creating a document that explains variables in a dataset represents which part of the MDM process?

A

Data dictionary

34
Q

Fill in the blank: _______ is the process of combining data from multiple sources into one place.

A

Consolidation

35
Q

Fill in the blank: A data dictionary provides definitions for every variable, as well as how they are used and how they _______.

A

relate to other variables

36
Q

What should be included in a data dictionary?

A

Definitions and attributes for every variable, structure, relationships, and data organization.

37
Q

What is the significance of having a data dictionary in a collaborative database environment?

A

It ensures that all users understand the data and its usage, preventing confusion.

38
Q

List three circumstances where data quality should be checked.

A
  • After data manipulation
  • After data transformation
  • Before the final report
39
Q

What is the main goal of standardization in data management?

A

To ensure all data works together and is consistent across different sources.