Data Warehousing Flashcards

1
Q

Data warehouse

A

A collection of data that helps analysts to make decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or false: Data warehouses store operational databases

A

False! Data warehouses store historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Features of a data warehouse

A

Subject-oriented - provides info around a subject rather than operations
Integrated - data from many different sources
Time-variant - data is identified with a particular period (e.g. last 12 months)
Non-volatile – data is not erased when new data is added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Information processing

A

Processing data via queries or statistical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Analytical processing

A

Processing data via Online Analytical Processing (OLAP) tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data mining

A

Finding hidden patterns and associations in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Features of an enterprise data warehouse

A

o An EDW
o An operational data store
o Data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Operational data store

A

A hybrid data warehouse containing integrated information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data extraction

A

Gathering data from a variety of sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data cleaning

A

Finding and correcting errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data transformation

A

Converting data to warehouse format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data loading

A

Sorting/summarising/consolidating/checking integrity and building patterns accordingly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Refreshing

A

Updating data sources to the warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dimensional modelling

A

Different individual models (e.g. separate models for sales and inventory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or false: Dimensional modelling leads to fewer tables

A

True! Data is grouped together and includes redundancies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fact table

A

Measurement of business (e.g. sales)

17
Q

True or false: Fact tables are usually small

A

False! Fact tables usually range from 1 to 5 TB.

18
Q

Transaction fact table

A

Information related to events (e.g. product sales)

19
Q

Snapshot fact table

A

Information from specific moments in time (e.g. year-end accounts)

20
Q

Accumulating snapshot

A

A running tally (e.g. year to year sales figures)

21
Q

Dimension table

A

Describes dimensions of the business (e.g. product, customer)

22
Q

True or false: Dimension tables are larger than fact tables

A

False! They are usually smaller than fact tables.

23
Q

Types of dimension table

A
o	Time
o	Geography (address/location)
o	Product
o	Customer
o	Range (range of values for time, price, etc.)
24
Q

Star schema

A

Dimension tables connected to a central fact table

25
Q

True or false: A star schema is normalised

A

Both! The fact table is normalised but the dimension tables aren’t, resulting in faster read queries and analysis.

26
Q

How does a star schema work?

A

Fact data is pulled from dimension tables, duplicated and stored in the fact table

27
Q

Advantages of star schema

A

o Simpler queries – dimension tables are treated as one large table
o Easier business insights
o Better query performance
o No bottlenecks from normalised schemas
o Faster queries
o Read only
o Can be used to build OLAP cubes

28
Q

Disadvantages of star schema

A

o Less accurate data due to normalisation
o Less capable of handling complex queries
o No many-to-many relationships

29
Q

Snowflake schema

A

Like a star schema, but the dimension tables connect to other dimension tables

30
Q

True or false: A snowflake schema is normalised

A

True! Its purpose is to normalise data in a star schema.

31
Q

Advantages of snowflake schema

A

o No integrity issues
o Faster write queries
o Uses less space than star schema

32
Q

Why are read queries slower in a snowflake schema?

A

The DBMS needs to search through multiple tables

33
Q

Roll-up

A

Removes dimensions in a data cube by:
o Climbing up a hierarchy for a dimension
o Dimension reduction (e.g. orders grouped by city rather than country)

34
Q

Drill-down

A

Adds dimensions to a data cube to create more detailed data

35
Q

Slice

A

Selects a dimension and creates a sub-cube

36
Q

Dice

A

Slicing multiple dimensions

37
Q

Pivot/rotation

A

Swapping data axes to provide an alternative presentation of data