Week 13 UAS Flashcards
What is Data Warehousing?
A subject-oriented, integrated, time- variant, and non-volatile collection of data in support of management’s decision- making process.
Characteristics of Data Warehouse?
- Subject-oriented
The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). - Integrated
The integrated data source must be made consistent to present a unified view of the data to users. - Time-variant
Data in the warehouse is accurate and valid only at some point in time or over some time interval. - Nonvolatile
Data is not updated in real time but is refreshed from operational systems on a regular basis.
Benefits of Data Warehousing?
- Potential high returns on investment
- Competitive advantage
- Increased productivity of corporate decision- makers
Comparison of OLTP Systems?
Main Purpose: support operational processing
Data age: current
data latency: real-time
data granularity: detailed data
data processing: predictable pattern of data operations and queries. High level of transaction throughput.
reporting: predictable, one-dimensional, static reporting
users: serves large number of operational users
Comparison of Data warehousing?
Main Purpose: support analytical processing
Data age: historic
data latency: time-variant
data granularity: detailed data, lightly and highly summarized data
data processing: less predictable pattern; medium to low level of transaction throughput.
reporting: unpredictable, multidimensional, dynamic reporting
users: serves lower number of managerial users
Problems of Data Warehousing?
- Underestimation of resources for data ETL
- hidden problems with source systems
- required data not captured
- increased end-user demands
- data homogenization
- high deman for resources
- data ownership
- high maintenance
- long-duration projects
10 complexity of integration
Data Warehouse Architecture?
- Operational Data
- Operational Data Store
- ETL Manager
- Warehouse Manager
- Query Manager
- Detailed Data
- Lightly and Highly Summarized Data
- Archive/Backup Data
- Metadata
- End-User Access Tools
End-User Access Tools?
- reporting and query tools
- application developments tools
- online analytical processing (OLAP) tools
- data mining tools
What is Data Mart?
A database that contains a subset of corporate data to support the analytical requirements of a particular business unit (such as the Sales department) or to support users who share the same requirements to analyse a particular business process (such as property sales).
Benefits:
- To give users access to the data they need to analyze most often.
- To improve end-user response time due to the reduction in the volume of data to be accessed.
- To provide appropriately structured data as dictated by the requirements of the end- user access tools.
Data Warehousing Tools and Technologies?
- Extraction
- Transformation
- Loading
- ETL Tools
- Data profiling and data quality control
- Metadata Management
The requirements for a data warehouse DBMS?
- Load performance
- Load processing
- Data quality management
- Query performance
- Terabyte scalability
- Mass user scalability
- Networked data warehouse
- Warehouse administration
- Integrated dimensional analysis
- Advanced query functionality
ETL (Extraction, Transformation, Loading) processes?
- Extraction
- Transformation
- Loading
The data destined for an data warehouse must first be extracted from one or more data sources, transformed into a form that is easy to analyze and consistent with data already in the warehouse, and then finally loaded into the data warehouse.
4 main operations in Data Mining?
- Predictive modelling
a. Classification
b. Value prediction - Database segmentation
a. Demographic clustering
b. Neural clustering - Link analysis
a. Association discovery
b. Sequential pattern discovery
c. Similar time sequence discovery - Deviation detection
a. Statistics
b. Visualization
example of application:
Retail / Marketing
- Identifying buying patterns of customers
- Finding associations among customer demographic characteristics
- Predicting response to mailing campaigns
- Market basket analysis
what is OLAP?
online analytical processing (OLAP) is the dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data.
Phases of the CRISP-DM Model?
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment