Exam 1 Flashcards
Four Major components of BI
Data warehouse
Business Analytics
BPM
User interface
Descriptive Analytics
Answers “What happened”
Historic Data
Predictive Analytics
Determine what is likely to happen in the future
Prescriptive Analytics
“Prescribe” a solution based on data
What is a data warehouse?
o A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
Relational Database
o A subject-oriented, integrated, time-variant, nonvolatile collection go data in supports of management’s decision-making process
Subject Oriented
- Data organized by detailed subject such as sales or customers
Integrated
- Data inconsistencies are removed; data from diverse operational applications is integrated
Time-Variant
- Contains historical data. Time is the one important dimension that all data warehouses must support
Nonvolatile
Users cannot change or update the data
Data Mart
A departmental smaller-scale “DW” that stores only limited/relevant data focused on a particular subject or department
Dependent Data Mart
Subset that is created directly from a data warehouse
Independent Data Mart
Small data warehouse designed for a strategic business unit or department, but its source is not a EDW.
Compare DW &DL
The nature of Data
o Data Warehouse – Structured, processed
o Date Lake – Any data in raw/native format
Compare DW &DL
Processing
o Data Warehouse – Schema-on-write (SQL)
o Date Lake – Schema-on-read (NoSQL)
Compare DW &DL
Retrieval Speed
o Data Warehouse – Very Fast
o Date Lake - Slow
Compare DW &DL
Cost
o Data Warehouse – Expensive for large data volumes
o Date Lake – Designed for low-cost storage
Compare DW &DL
Agility
o Data Warehouse – Less agile, fixed configuration
o Date Lake – Highly agile, flexible configuration
Compare DW &DL
Novelty
o Data Warehouse – Not new/matured
o Date Lake – Very new/maturing
Compare DW &DL
Security
o Data Warehouse – Well-secured
o Date Lake – Not yet well-secured
Compare DW &DL
Users
o Data Warehouse – Business professionals
o Date Lake – Data scientists
What is Extract?
Reading data from one or more sources (i.e. OLTP databases, personal databases, spreadsheets)
What is Transform?
Converting the extracted data into the appropriate form, cleaning data
What is Load?
Putting the data in the data warehouse
Inmon Model
EDW approach (top-down)
Highly consistent dimensional view of data
Large-scale and scope of project
Up-front cost
Long duration; may be inflexible and unresponsive to changing business needs during implementation
Flexible to support organizational changes as a whole
Kimball Model
Data mart approach (bottom-up)
Emphasizes the value of the data warehouse to business users as quickly as possible
Focuses on each individual business process making it a quick return on investment
Lacking the big picture of enterprise data warehousing i.e. missing some dimensions/redundant dimensions
Lower cost
Fairly simple