Data Warehousing Flashcards
Data warehouse:
A subject-oriented, integrated, non- volatile, time-variant data store in support of management’s decisions
Data mart:
A specialised, subject-oriented, integrated, volatile time-variant data store in support of a specific subset of management’s decisions.
ETL process
The process of extraction, transformation and loading of the data in creating a data warehouse
Why individual data extracts are popular:
Data is out of the way of high performance processing
End user now owns the data
Why is individual extraction bad?
We end up with a spider web of extract processing.
Problems with the spider web?
- no time basis of data
- algorithmic differential
- levels of extraction
- external data
- no commons data source
No time basis of data:
Data changes so when different departments extract data at different times they get different results.
Any correlation is then coincidental
Algorithmic differential:
This is the difference that occurs when different departments choose different rows to process and analyse. They neglect to mention this when presenting results
Levels of extraction:
Extracting from an extract magnifies the problems of time basis and algorithmic differentials
External data:
Inclusion of external data in the mainstream analysis and not specifying, creates knowledge gaps among users. Because we’re each using our own storage spaces
No common source:
No synchronisation or sharing of data so you can’t expect the same results
OLTP:
Online Transaction Processing
Data warehouse = data from a variety of OLTPs kept on a different platform (the DW)
Bad data:
Inconsistent (when there are multiple versions of the data inconsistency can occur)
Inaccurate -> misleading
Incomplete -> not useful
Untimely -> irrelevant
Good data
The right data for the right person at the right time
Impact of Decision Support Systems on OLTP systems
DSS accesses large volumes of data and slows down OLTP. This data can sometimes even be locked by DSS and slow OLTP more.
DSS has unpredictable requirements that make performance tuning difficult.