Chapter 10 Flashcards

1
Q

What is data governance?

A
  • High-level organizational groups and processes overseeing data stewardship across the organization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a data steward?

A

A person responsible for ensuring that organizational applications properly support the organization’s data quality goals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the requirements for data governance to be successful?

A
  • Sponsorship from both senior management and business units
  • A data steward manager to support, train, and coordinate data stewards
  • Data stewards for different business units, subjects, and/or source systems
  • A governance committee to provide data management guidelines and standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is data quality important?

A

If the data are bad, the business fails. Period.

  • GIGO - Garbage in, garbage out
  • Sarbanes-Oxley (SOX) compliance by law sets data and metadata quality standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of data quality?

A
  • Minimize IT project risk
  • Make timely business decisions
  • Ensure regulatory compliance
  • Expand customer base
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of quality data?

A
  • Uniqueness
  • Accuracy
  • Consistency
  • Completeness
  • Timeliness
  • Currency
  • Conformance
  • Referential Integrity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some causes of poor data quality?

A
  • External data sources (Lack of control over data quality)
  • Redundant data storage and inconsistent metadata (Proliferation of databases with uncontrolled redundancy and metadata)
  • Data entry (Poor data capture controls)
  • Lack of organizational commitment (Not recognizing poor data quality as an organizational issue)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some steps that can be taken to improve data quality?

A
  • Get business buy-in
  • Perform data quality audit
  • Establish data stewardship program
  • Improve data capture processes
  • Apply modern data management principles and technology
  • Apply total quality management (TQM) practices
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you create business buy-in?

A
  • Executive sponsorship
  • Building a business case
  • Prove a return on investment (ROI)
  • Avoidence of cost
  • Avoidance of opportunity loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do you do in a data quality audit?

A
  • Statistically profile all data files
  • Document the set of values for all fields
  • Analyze data patterns (distribution, outliers, frequencies)
  • Verify whether controls and business rules are enforced
  • Use specialized data profiling tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the roles of a data steward?

A
  • Oversight of data stewardship program
  • Manage data subject area
  • Oversee data definitions
  • Oversee production of data
  • Oversee use of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you improve data capture processes?

A
  • Automate data entry as much as possible
  • Manual data entry should be selected from preset options
  • Use trained operators when possible
  • Follow good user interface design principles
  • Immediate data validation for entered data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some software tools for analyzing and correcting data quality problems?

A
  • Pattern matching
  • Fuzzy logic
  • Expert systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Besides software tools, what other modern tools can be applied to data management?

A
  • Sound data modeling and database design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does TQM stand for?

A

Total Quality Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the TQM Principles?

A
  • Defect prevention
  • Continuous Improvement
  • Use of enterprise data standards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the components of a balanced focus?

A
  • Customer
  • Product/Service
  • Strong foundation of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is master data management (MDM)?

A

Disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the three main architectures of MDM?

A
  • Identity registry
  • Integration hub
  • Persistent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Identity registry in MDM?

A

Master data remains in source systems; registry provides applications with location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an integration hub in MDM?

A

Data changes broadcast through central service to subscribing databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is persistent in MDM?

A

Central “golden record” maintained; all applications have access. Requires applications to push data. Prone to data duplication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does data integration do?

A

Creates a unified view of business data

Other possibilities:

  • Application integration
  • Business process integration
  • User interaction integration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In data integration, what does any approach require?

A

Change data capture (CDC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does changed data capture do?
Indicates which data have changed since previous data integration activity
26
What are three techniques for data integration?
- Consolidation (ETL) - Data federation (EII) - Data propagation (EAI and EDR)
27
What is consolidation (ETL) in data integration?
- Consolidating all data into a centralized database (like a data warehouse)
28
What is data federation (EII) in data integration?
- Provides a virtual view of data without actually creating on centralized database
29
What is data propogation (EAI and EDR) in data integration?
- Duplicate data across databases, with near real-time delay
30
Comparison of Consolidation, Federation and Propogation forms of data integration
31
In the reconciled data layer, what is Typical operational data?
- Transient - not historical - Not normalized (perhaps due to denormalization for performance) - Restricted in scope-not comprehensive - Sometimes poor quality - inconsistencies and errors
32
After ETL what characteristics should data have?
- Detailed - not summarized yet - Historical - periodic - Normalized - 3rd normal form or higher - Comprehensive - enterprise-wide perspective - Timely - data should be current enough to assist decision-making - Quality controlled - accurate with full integrity
33
What does ETL stand for?
Extract, Transform, Load
34
What is the ETL process?
- Capture/Extract - Scrub or data cleansing - Transform - Load and Index
35
When is the ETL process done?
- During initial load of Enterprise Data Warehouse (EDW) - During subsequent periodic updates to EDW
36
When is mapping and metadata management completed?
It's a design step prior to performing ETL
37
What is mapping?
Required data are mapped to data sources (Graphical or matrix representations)
38
What information should mapping provide?
- Explanations of reformatting, transofrmations, and cleansing actions to be done - Process flow involving tasks and jobs
39
What makes good metadata for mapping?
- Identifies data sources - Recognizes same data in different systems - Represents process flow steps
40
What is static extract?
Capturing a snapshot of the source data at a point in time
41
What is an incremental extract?
Capturing changes that have occurred since the last static extract
42
What is capture/extract?
Obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse
43
Visual of data reconciliation in ETL
44
What is scrub/cleanse?
Uses pattern recognition and AI techniques to upgrade data quality
45
What are you looking for when fixing errors?
- Misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies
46
What are some other things to look for in scrub/cleanse?
Decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data
47
What does transform mean?
Convert data from format of operational system to format of data warehouse
48
What is at the record-level?
Selection - data partitioning Joining - data combining Aggregation - data summarization
49
What is at the field level?
Single-field - from one field to one field multi-field - from many fields to one, or one field to many
50
What is load/index
Place transformed data into the warehouse and create indexes
51
What is refresh mode?
Bulk rewriting of target data at periodic intervals
52
What is update mode?
Only changes in source data are written to data warehouse
53
What are the four record level transformation functions?
- Selection - Joining - Normalization - Aggregation
54
What is the process of partitioning data according to predefined criteria?
Selection
55
What is the process of combining data from various sources into a single table or view?
Joining
56
What is the process of decomposing relations with anomalies to produce smaller, well-structured relations?
Normalization
57
What is the process of transforming data from detailed to summary level?
Aggregation
58
What is basic representation in single field transformation?
Translates data from old form to new form
59
What is algorithmic transformation in single-field transformation?
Uses a formula or logical expression
60
What is table lookup in single-field transformation?
Uses a seperate table keyed by source record code
61
What is multi-field transformation?
Converting many source to one target or Converting one source to many targets