Data Warehouse (2) Flashcards
Part 2
Explain the Data Warehouse characteristics
- supports Decision Support data storage
- includes data extraction and integration filter
- conform to uniform structures & formats to avoid data conflicts
- supports decision making
What is Multidimensional Data Analysis?
- processing of data to view as part of Multidimensional Structure
- data related in different ways
What are the OLAP client/server components?
OLAP GUI
- in end user’s computer
- client side
- main contact between system and end user
- provides interface allowing end user query DW contents
Analytical Processing Logic (APL)
- client side (speed) server side (better admin)
- performs complex transformations required for business data analysis e.g aggregation, MDA
Data Processing Logic (DPL)
- server level
- presents data analysis requests to proper data objects in DW
What is MOLAP
Multidimensional Online Analytical Processing
- has OLAP functionality using MDBMS
- functionality: stores and analyzed Multidimensional data
- has special techniques to store data in matrix-like arrays of n-dimensions
Checklist when selecting either using ROLAP or MOLAP
- implementation price
- supported hard/soft
- hard/soft and DBMS compatibility
- performance
- availability, extent, type of admin tools
- support DB schema
- handle current & projected DB size
- DB architecture
- Resources
- Scalability
- flexibility
- cost of ownership
Explain Multidimensional cubes and purpose of slice and dice
Multidimensional analysis focus on specific slices of the cube. E.g product manager, product dimension.
Intersection of slices create smaller cubes allows dicing of multidimensional cube. Therefore, examining cubes allow precise analysis of variable components
Common performance improvement for Star Schema
Normalized Dimensional Tables
- semantic simplicity
- ease end user navigation through dimensions
- e.g a dimension can be simplified in 3NF using data filtering operations
Creating & Maintaining Multiple Fact Tables
- speeds query operations
- save processor cycles at run time = speeds analysis
- e.g region, city, state is pre-computed at data loading phase at run-time
Demoralising Fact Tables
- improves data access performance = save data storage
- stores in one record what usually takes many records in different tables
- e.g computing total sales in all region required access to Region Sales aggregates for summarizing. May have 300,000 records which means summarizing 300,000. Denormalized tables is better for aggregation.
YEAR_TOTAL YEAR_ID MONTH_1 MONTH_2 MONTH_3
Table Partitioning & Replication
- required when DSS is implemented in dispersed geographic areas
- Partitioning: splits a table into subsets of rows and columns which is placed in or near the client computer to improve data
- Replication: makes a copy of a table and places it in a different location for same reason
What is Data Mining?
- analyzes large sets of data
- uncovers hidden trends, patterns and relationships
- forms computer models to simulate and explain the results and use the models to support decision making
- can also be the basis to create advanced predictive data models
What are the 4 phases of Data Mining?
DATA PREPARATION
- data sets used by data mining operation are identified
- data is cleansed from any impurities
DATA ANALYSIS and CLASSIFICATION
- studies data to identify common data characteristics or patterns
- data mining tool applies specific algorithms to find:
• groupings, classifications, sequences
• dependencies, links, relationships
• patterns, trends, deviations
KNOWLEDGE ACQUISITION - uses results of previous phase - data mining tool selects the appropriate modelling or knowledge acquisition algorithms - typical algorithms used are based on: • neural networks • decision trees • rules induction • genetic algorithms • regression trees • memory - based reasoning
PROGNOSIS PHASE - data mining findings used to predict future behaviour and forecast business outcomes - e.g giving % of chances - findings presented in: • decision tree • neural net • forecasting model