Information Management and BI 3(b) Flashcards
What are “Data Warehouses” used for:
(1) BI
(2) Dashboard systems
(3) Data mining
(4) Decision Support Systems (DSS)
(5) Executive Information Systems (EIS)
(6) Online Analytical Processing (OLAP) “cubes” (data analysis, “slice and dice” approach)
What are primary purposes of a “Data Warehouses”?
- Used to clean, transform, catalog, and make avail data for strategic, analytics
- For mgmt and business professionals
What is the Extract, Transform, Load (ETL) process?
(1) Extract data from various sources outside of DW
(2) Transform data into usable form for DW (data analytics, data mining or other BI purposes)
(3) Load transformed data into DW
What is a “Data Mart”?
- Data marts are subsets of DW or BI type database but w/ purpose to assist users in analyzing data
- Purpose related to business user, esp mgrs and they become the “owner”
What are “Data Marts” created for?
- Easy access to frequently needed data
- Collective view of a grp of users
- To improve end-user response time
- To offload DW data to separate computer for greater efficiency
- Security (separate authorized data subset)
- Expediency
- Politics
What are 4 steps to “Data Preparation”?
CCTR
(1) Data Consolidation
- ID sources of data
(2) Data Cleaning
- Scrub for missing values, outliers and inconsistencies
(3) Data Transformation
- Takes clean data and transforms to usable form for data mining or BI
(4) Data Reduction
- Reduce volume of data loaded into DW
What are 3 steps to “Data Consolidation”?
CSI
(1) Collect and map data
- Based on thorough
understanding of data structure and flows
(2) Select needed records and variables from various sources
(3) Integrate data into target db
What are 3 steps to “Data Cleaning”?
(1) Impute Missing Values
- If values are missing, must determine if ignore or enter imputed value
(2) Reduce Noise in Data
- Correct outliers and errors
(3) Eliminate Inconsistencies
What are 3 steps to “Data Transformation”?
(1) Normalize Data
- All variables (columns, fields) treated equally by data analyses
(2) Discretize / Aggregate Data
- Convert numbers to categories (high, medium, low)
(3) Construct New Attributes
What are 3 steps to “Data Reduction”?
(1) Reduce Number of Variables
(2) Reduce Number of Records
- Too many records decrease speed of using DW
(3) Balance Skewed Data
- Stratified over random sampling