Data Warehousing Flashcards

1
Q

Data warehouse:

A

A subject-oriented, integrated, non- volatile, time-variant data store in support of management’s decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data mart:

A

A specialised, subject-oriented, integrated, volatile time-variant data store in support of a specific subset of management’s decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ETL process

A

The process of extraction, transformation and loading of the data in creating a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why individual data extracts are popular:

A

Data is out of the way of high performance processing
End user now owns the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is individual extraction bad?

A

We end up with a spider web of extract processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Problems with the spider web?

A
  • no time basis of data
  • algorithmic differential
  • levels of extraction
  • external data
  • no commons data source
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

No time basis of data:

A

Data changes so when different departments extract data at different times they get different results.
Any correlation is then coincidental

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Algorithmic differential:

A

This is the difference that occurs when different departments choose different rows to process and analyse. They neglect to mention this when presenting results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Levels of extraction:

A

Extracting from an extract magnifies the problems of time basis and algorithmic differentials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

External data:

A

Inclusion of external data in the mainstream analysis and not specifying, creates knowledge gaps among users. Because we’re each using our own storage spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

No common source:

A

No synchronisation or sharing of data so you can’t expect the same results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP:

A

Online Transaction Processing
Data warehouse = data from a variety of OLTPs kept on a different platform (the DW)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Bad data:

A

Inconsistent (when there are multiple versions of the data inconsistency can occur)
Inaccurate -> misleading
Incomplete -> not useful
Untimely -> irrelevant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Good data

A

The right data for the right person at the right time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Impact of Decision Support Systems on OLTP systems

A

DSS accesses large volumes of data and slows down OLTP. This data can sometimes even be locked by DSS and slow OLTP more.
DSS has unpredictable requirements that make performance tuning difficult.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Subject-oriented vs Process-oriented

A

Process: all the data you need for a process
Subject: complete and consistent data per subject in the organisation

17
Q

DW scale

A

Data warehouses are large in scale. They span the organisation. They store historical data (5-10 years old). They integrate data from independent silos in the organisation. Their data is continually in sync with data sources

18
Q

Data granularity:

A

How summarised or raw the data is. Low granularity-> summaries
High granularity -> all details
High -> more flexible and can be used for further analysis in future

19
Q

Star schema

A
  • fact table at the centre (it stores transactional data) -> facts: measurable events.
  • dimension table connected to the fact table -> dimension: descriptive context which we summarise by.
20
Q

Snowflake schema

A
  • similar to star but dimension tables are normalized (split into smaller related tables)