Data Warehousing Flashcards

Question 1

Q

Data warehouse:

Answer

A

A subject-oriented, integrated, non- volatile, time-variant data store in support of management’s decisions

Question 2

Q

Data mart:

Answer

A

A specialised, subject-oriented, integrated, volatile time-variant data store in support of a specific subset of management’s decisions.

Question 3

Q

ETL process

Answer

A

The process of extraction, transformation and loading of the data in creating a data warehouse

Question 4

Q

Why individual data extracts are popular:

Answer

A

Data is out of the way of high performance processing
End user now owns the data

Question 5

Q

Why is individual extraction bad?

Answer

A

We end up with a spider web of extract processing.

Question 6

Q

Problems with the spider web?

Answer

A

no time basis of data
algorithmic differential
levels of extraction
external data
no commons data source

Question 7

Q

No time basis of data:

Answer

A

Data changes so when different departments extract data at different times they get different results.
Any correlation is then coincidental

Question 8

Q

Algorithmic differential:

Answer

A

This is the difference that occurs when different departments choose different rows to process and analyse. They neglect to mention this when presenting results

Question 9

Q

Levels of extraction:

Answer

A

Extracting from an extract magnifies the problems of time basis and algorithmic differentials

Question 10

Q

External data:

Answer

A

Inclusion of external data in the mainstream analysis and not specifying, creates knowledge gaps among users. Because we’re each using our own storage spaces

Question 11

Q

No common source:

Answer

A

No synchronisation or sharing of data so you can’t expect the same results

Question 12

Q

OLTP:

Answer

A

Online Transaction Processing
Data warehouse = data from a variety of OLTPs kept on a different platform (the DW)

Question 13

Q

Bad data:

Answer

A

Inconsistent (when there are multiple versions of the data inconsistency can occur)
Inaccurate -> misleading
Incomplete -> not useful
Untimely -> irrelevant

Question 14

Q

Good data

Answer

A

The right data for the right person at the right time

Question 15

Q

Impact of Decision Support Systems on OLTP systems

Answer

A

DSS accesses large volumes of data and slows down OLTP. This data can sometimes even be locked by DSS and slow OLTP more.
DSS has unpredictable requirements that make performance tuning difficult.

Question 16

Q

Subject-oriented vs Process-oriented

Answer

A

Process: all the data you need for a process
Subject: complete and consistent data per subject in the organisation

Question 17

Q

DW scale

Answer

A

Data warehouses are large in scale. They span the organisation. They store historical data (5-10 years old). They integrate data from independent silos in the organisation. Their data is continually in sync with data sources

Question 18

Q

Data granularity:

Answer

A

How summarised or raw the data is. Low granularity-> summaries
High granularity -> all details
High -> more flexible and can be used for further analysis in future

Question 19

Q

Star schema

Answer

A

fact table at the centre (it stores transactional data) -> facts: measurable events.
dimension table connected to the fact table -> dimension: descriptive context which we summarise by.

Question 20

Q

Snowflake schema

Answer

A

similar to star but dimension tables are normalized (split into smaller related tables)