Chapter 10 Flashcards

1
Q

What is data governance?

A
  • High-level organizational groups and processes overseeing data stewardship across the organization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a data steward?

A

A person responsible for ensuring that organizational applications properly support the organization’s data quality goals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the requirements for data governance to be successful?

A
  • Sponsorship from both senior management and business units
  • A data steward manager to support, train, and coordinate data stewards
  • Data stewards for different business units, subjects, and/or source systems
  • A governance committee to provide data management guidelines and standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is data quality important?

A

If the data are bad, the business fails. Period.

  • GIGO - Garbage in, garbage out
  • Sarbanes-Oxley (SOX) compliance by law sets data and metadata quality standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of data quality?

A
  • Minimize IT project risk
  • Make timely business decisions
  • Ensure regulatory compliance
  • Expand customer base
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of quality data?

A
  • Uniqueness
  • Accuracy
  • Consistency
  • Completeness
  • Timeliness
  • Currency
  • Conformance
  • Referential Integrity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some causes of poor data quality?

A
  • External data sources (Lack of control over data quality)
  • Redundant data storage and inconsistent metadata (Proliferation of databases with uncontrolled redundancy and metadata)
  • Data entry (Poor data capture controls)
  • Lack of organizational commitment (Not recognizing poor data quality as an organizational issue)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some steps that can be taken to improve data quality?

A
  • Get business buy-in
  • Perform data quality audit
  • Establish data stewardship program
  • Improve data capture processes
  • Apply modern data management principles and technology
  • Apply total quality management (TQM) practices
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you create business buy-in?

A
  • Executive sponsorship
  • Building a business case
  • Prove a return on investment (ROI)
  • Avoidence of cost
  • Avoidance of opportunity loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you do in a data quality audit?

A
  • Statistically profile all data files
  • Document the set of values for all fields
  • Analyze data patterns (distribution, outliers, frequencies)
  • Verify whether controls and business rules are enforced
  • Use specialized data profiling tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the roles of a data steward?

A
  • Oversight of data stewardship program
  • Manage data subject area
  • Oversee data definitions
  • Oversee production of data
  • Oversee use of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you improve data capture processes?

A
  • Automate data entry as much as possible
  • Manual data entry should be selected from preset options
  • Use trained operators when possible
  • Follow good user interface design principles
  • Immediate data validation for entered data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some software tools for analyzing and correcting data quality problems?

A
  • Pattern matching
  • Fuzzy logic
  • Expert systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Besides software tools, what other modern tools can be applied to data management?

A
  • Sound data modeling and database design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does TQM stand for?

A

Total Quality Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the TQM Principles?

A
  • Defect prevention
  • Continuous Improvement
  • Use of enterprise data standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the components of a balanced focus?

A
  • Customer
  • Product/Service
  • Strong foundation of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is master data management (MDM)?

A

Disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the three main architectures of MDM?

A
  • Identity registry
  • Integration hub
  • Persistent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Identity registry in MDM?

A

Master data remains in source systems; registry provides applications with location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an integration hub in MDM?

A

Data changes broadcast through central service to subscribing databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is persistent in MDM?

A

Central “golden record” maintained; all applications have access. Requires applications to push data. Prone to data duplication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does data integration do?

A

Creates a unified view of business data

Other possibilities:

  • Application integration
  • Business process integration
  • User interaction integration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In data integration, what does any approach require?

A

Change data capture (CDC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does changed data capture do?

A

Indicates which data have changed since previous data integration activity

26
Q

What are three techniques for data integration?

A
  • Consolidation (ETL)
  • Data federation (EII)
  • Data propagation (EAI and EDR)
27
Q

What is consolidation (ETL) in data integration?

A
  • Consolidating all data into a centralized database (like a data warehouse)
28
Q

What is data federation (EII) in data integration?

A
  • Provides a virtual view of data without actually creating on centralized database
29
Q

What is data propogation (EAI and EDR) in data integration?

A
  • Duplicate data across databases, with near real-time delay
30
Q

Comparison of Consolidation, Federation and Propogation forms of data integration

A
31
Q

In the reconciled data layer, what is Typical operational data?

A
  • Transient - not historical
  • Not normalized (perhaps due to denormalization for performance)
  • Restricted in scope-not comprehensive
  • Sometimes poor quality - inconsistencies and errors
32
Q

After ETL what characteristics should data have?

A
  • Detailed - not summarized yet
  • Historical - periodic
  • Normalized - 3rd normal form or higher
  • Comprehensive - enterprise-wide perspective
  • Timely - data should be current enough to assist decision-making
  • Quality controlled - accurate with full integrity
33
Q

What does ETL stand for?

A

Extract, Transform, Load

34
Q

What is the ETL process?

A
  • Capture/Extract
  • Scrub or data cleansing
  • Transform
  • Load and Index
35
Q

When is the ETL process done?

A
  • During initial load of Enterprise Data Warehouse (EDW)
  • During subsequent periodic updates to EDW
36
Q

When is mapping and metadata management completed?

A

It’s a design step prior to performing ETL

37
Q

What is mapping?

A

Required data are mapped to data sources

(Graphical or matrix representations)

38
Q

What information should mapping provide?

A
  • Explanations of reformatting, transofrmations, and cleansing actions to be done
  • Process flow involving tasks and jobs
39
Q

What makes good metadata for mapping?

A
  • Identifies data sources
  • Recognizes same data in different systems
  • Represents process flow steps
40
Q

What is static extract?

A

Capturing a snapshot of the source data at a point in time

41
Q

What is an incremental extract?

A

Capturing changes that have occurred since the last static extract

42
Q

What is capture/extract?

A

Obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse

43
Q

Visual of data reconciliation in ETL

A
44
Q

What is scrub/cleanse?

A

Uses pattern recognition and AI techniques to upgrade data quality

45
Q

What are you looking for when fixing errors?

A
  • Misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies
46
Q

What are some other things to look for in scrub/cleanse?

A

Decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

47
Q

What does transform mean?

A

Convert data from format of operational system to format of data warehouse

48
Q

What is at the record-level?

A

Selection - data partitioning

Joining - data combining

Aggregation - data summarization

49
Q

What is at the field level?

A

Single-field - from one field to one field

multi-field - from many fields to one, or one field to many

50
Q

What is load/index

A

Place transformed data into the warehouse and create indexes

51
Q

What is refresh mode?

A

Bulk rewriting of target data at periodic intervals

52
Q

What is update mode?

A

Only changes in source data are written to data warehouse

53
Q

What are the four record level transformation functions?

A
  • Selection
  • Joining
  • Normalization
  • Aggregation
54
Q

What is the process of partitioning data according to predefined criteria?

A

Selection

55
Q

What is the process of combining data from various sources into a single table or view?

A

Joining

56
Q

What is the process of decomposing relations with anomalies to produce smaller, well-structured relations?

A

Normalization

57
Q

What is the process of transforming data from detailed to summary level?

A

Aggregation

58
Q

What is basic representation in single field transformation?

A

Translates data from old form to new form

59
Q

What is algorithmic transformation in single-field transformation?

A

Uses a formula or logical expression

60
Q

What is table lookup in single-field transformation?

A

Uses a seperate table keyed by source record code

61
Q

What is multi-field transformation?

A

Converting many source to one target

or

Converting one source to many targets