Chapter 9 Flashcards

1
Q

What is a data warehouse?

A

A subject-oriented, integrated, time-variant, non-updateable collection of data used in support of management decision-making processes.

(Subject-oriented = customers, patients, etc)

(Integrated = consistent naming conventions formats, encoding sturctures; from multiple data sources)

(Time-variant = can study trends and changes)

(Non-updatable = read-only, periodically refreshed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a data mart?

A

A data warehouse that is limited in scope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is there a need for data warehousing?

A
  • For an integrated, company-wide view of high-quality information (from disparate databases)
  • For seperation of operational and informational systems and data (for improved performance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some issues with company-wide view?

A
  • Inconsistent key structures
  • Synonyms
  • Free-form vs. structured fields
  • Inconsistent data values
  • Missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What organizational trends drive data warehouses?

A
  • There is no single system of records
  • They have multiple systems that are not synchronized
  • They have an organizational need to analyze activities in a balanced way
  • Customer relationship management
  • Supplier relationship management
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an operational system?

A

A system that is used to run a business in real time, based on current data; also called a system of record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an informational system?

A

A system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Comparison of operational and inofrmational systems

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the data warehouse architectures?

A
  • Independent data mart
  • Dependent data mart and operational data store
  • Logical data mart with real-time data warehouse
  • three-layer architecture

(all involve some form of extract, transform and load (ETL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Independent Data Mart visual

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the limitations of an independent data mart?

A
  • Seperate ETL process for each data mart -> redundant data and processing
  • Inconsistency between data marts
  • Difficult to drill down for related facts between data marts
  • Excessive scaling costs as more applications are built
  • High cost for obtaining consistency between marts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dependent data mart visual

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

logical data mart visual

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Differences in Data warehouse and Data mart

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Visual of three-layer data architecture for a data warehouse

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an event?

A

a database action (creat/update/delete) that results from a transaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Status vs event visual

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What happens with transient data?

A

Changes to existing records are written over previous records, thus destroying the previous data content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What happens with periodic data?

A

They are never physically altered or delected once they have been added to the store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Besides transient data coverting to periodic data what other six changes must happen when going from a data mart to data warehouse?

A
  • New descriptive attributes
  • New business activity attributes
  • New classes of descriptive attributes
  • Descriptive attributes become more refined
  • Descriptive data are related to one another
  • New source of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is derived data?

A

Data that have been selected, formatted, and aggregate for end-user decision support applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the objectives of derived data?

A
  • Ease of use for decision support applications
  • Fast response to predefined user queries
  • Customized data for particular target audiences
  • Ad-hoc query support
  • Data mining capabilities
23
Q

What are the characteristics of derived data?

A
  • Detailed (mostly periodic) data
  • Aggregate (for summary)
  • Distributed (to departmental servers)
24
Q

What is the most common data model?

A

Dimensional model (usually implemented as a star schema)

25
Q

Star schema visual

A
26
Q

Start schema example

A
27
Q

What does surrogate mean?

A

Non-intelligent and non-business related

28
Q

Why should dimension table keys be surrogate?

A
  • Business keys may change over time
  • Helps keep track of nonkey attribute values for a given production key
  • Surrogate keys are simpler and shorter
  • Surrogate keys can be same length and format for all keys
29
Q

What are the fact table granularities?

A

Transactional grain - finest level

Aggregrate grain - more summarized

30
Q

What are the advatantages of a finer grain?

A
  • Better market basket analysis capability
  • More dimenstion tables, more rows in a fact table

( In web-based commerce, finest graunlarity is a click)

31
Q

What are the durations of a database?

A
  • Natural duration - 13 months or 5 quarters
  • Financial institutions may need longer durations
  • Older data is more difficult to source and cleanse
32
Q

What determines the size of a fact table?

A
  • It depends on the number of dimensions and the grain of the fact table
  • Number of rows = product of number of possible values for each dimension associated witht he fact table
33
Q

Example of how to determine the size of a fact table

A
34
Q

Benefits of multiple facts tables

A
  • Can improve performance
  • Often used to store facts for different combinations of dimensions
  • Conformed dimensions
35
Q

What is a factless facts table?

A
  • It has no nonkey data, but foreign keys for associated dimensions
  • Used for; Tracking events, Inventory coverage
36
Q

Conformed dimensions example

A
37
Q

Example of a factless fact table

A
38
Q

What are multivalued Dimensions?

A
  • Facts qualified by a set of values for the same business subject
  • Normalization involves creating a table for an associative entity between dimensions
39
Q

What are hierarchies?

A
  • Sometimes a dimension forms a natural, fixed depth hierarchy
  • Design options
40
Q

What are design options for hierarchies?

A
  • Include all information for each level in a single denormalized table
  • Normalize the dimension into a nested set of 1:M table relationships
41
Q

What is a Helper table?

A

An associative entity that implements a M:N relationship between deminsion and fact.

42
Q

Multivalued Dimension example

A
43
Q

What do dimension hierarchies provide?

A

Levels of aggregation for users wanting summary information in a data warehouse.

44
Q

Example of fixed product hierarchy

A
45
Q

How are Kimball’s approaches to maintaining knowledge of the past in Slowly Changing Dimensions (SCD)?

A
  • Type 1: Just replace old data with new (lose historical data)
  • Type 2: For each changing attribute, create a current value field and several old-valued fields (multivalued)
  • Type 3: Create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change. Most common approach
46
Q

Type 2 SCD example

A
47
Q

What should be used for rapidly changing attributes (hot attributes) instead of Type 2 SCD which creates too many rows and too much redundant data?

A

Dimension segmentation

48
Q

What are 10 essential rules for dimensional modeling?

A
  • Use atomic facts - Honor hierarchies
  • Create single-process fact tables - Decode dimension tables
  • Include a date dimension for each fact table - Use surrogate keys
  • Enforce consistent grain - Conform dimensions
  • Disallow null keys in fact tables - Balance requirements with actual data
49
Q

What is the future of data warehousing?

A

Integrations of Big data and Analytics

50
Q

What are some problems with the integration of big data and analytics?

A
  • Big data has huge volume and is often unstructured
51
Q

How is the speed of processing an issue with the integration of big data and analytics with data warehousing?

A
  • Design/purchase storage, database, and networking aspects in tandem
  • Use in-memory databases (RAM instead of disk)
  • Add analytics capabilities closer to the original data sources instead of seperate data warehouses
52
Q

What Cost of Data Storage considerations are there with Data warehousing/big data integration?

A
  • Moving data warehouse to the cloud
  • Columnar databases optimize storage
53
Q
A