Lecture 4 Flashcards
OLTP?
Online Transaction Processing
OLAP
Online Analytical Processing
Where is OLTP used?
E-commerce
Banking
Airline Reservations
OLTP (User facing)
Real-Time
Low latency
Highly-concurrent
OLTP (Tasks)
Relatively small set of standard transactional queries
OLTP (Data access pattern)
Random leads
Updates
Writes
Where is OLAP used?
Business intelligence
Data mining
OLAP (Back-end processing)
Batch workloads
Less concurrency
OLAP (Tasks)
Complex analytical queries
Ad-hoc
OLAP (Data access pattern)
Table scans
Large amounts of data involved per query
OLAP Example?
Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer
Downsides of co-existing OLTP and OLAP workloads?
Poor memory management
Conflicting data access patterns
Variable latency
Solution for having both OLTP and OLAP with no downsides?
Separate databases
User-facing OLTP database for high-volume transactions
Data warehouse for OLAP workloads
How to connect OLTP with OLAP?
ETL
What does ETL stand for?
Extract
Transform
Load
`What is Data Warehouse?
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
Characteristics of DWs?
Subject oriented Integrated Time-variant Nonvolatile Summarized Not normalized Metadata Web based Client/server
What is Data Mart?
A departmental small-scale DW that stores only limited/relevant data
Two types of Data Mart?
Dependent data mart
Independent data mart
Dependent data mart?
A subset that is created directly from a data warehouse
Independent data mart?
A small data warehouse designed for a strategic business unit or a department
Other DW Components?
ODS
Oper marts
EDW
Metadata
ODS?
Operational Data Stores
A type of database often used as an interim area for a data warehouse
Oper Marts?
An operational data mart
EDW?
Enterprise Data Warehouse
A data warehouse for the enterprise
Metadata?
Data about data
Types of DW Architectures?
Three-Tier Architecture
Two-Tier Architecture
Three-Tier Architecture?
- Data acquisition software (backend)
- The DW that contains the data and the software
- Client (Front-end) software that allows users to access and analyze data from the warehouse
Two-Tier Architecture?
First two tiers in 3Tier architecture are combined into one
Example of Three-Tier Architecture?
Client Workstation
Application Server
Database Server
Example of Two-Tier Architecture?
Client workstation
Application and database server
What does Web-based DW Architecture contain?
Client (Web browser)
Web server -> Web pages
Application server
Data warehouse
Data warehouse development approaches?
Inmon Model
Kimball Model
Inmon Model?
EDW approach (top-down)
Kimball Model?
Data mart approach (bottom-up)
Characteristics of DM approach?
One subject area
Months to develop
Costs 10k to 100k plus
Low to medium difficulty
Data prerequisite is common within the business area
Sources include only some operational and external systems
Size can range from megabytes to gigabytes
Time horizon - near current and historical data
Data transformation - low to medium
Characteristics of EDW approach?
Several subject areas
Years to develop
Costs 1mil plus
Hard difficulty
Data prerequisite is common across the enterprise
Sources include many operational and external systems
Size can range from gigabytes to petabytes
Time horizon - historical data
Data transformation - high
Tables in DW?
Fact table
Dimensional tables
Fact table?
A very large accumulation of facts such as sales
Dimension tables?
Smaller, generally static information about the entities involved in the facts
Representation of Data in DW?
Dimensional Modelling
Star Schema
Snowflakes Schema
Dimensional Modelling?
A retrieval-based system that supports high-volume query access
Star Schema?
The most commonly used and the simplest style of dimensional modeling
Contain a fact table surrounded by and connected to several dimension tables
Snowflakes Schema?
An extension of star schema where the diagram resembles a snowflake in shape
What is multidimensionality?
The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product
Multidimensional presentation contents?
Dimensions
Measures
Time
Dimensions?
Product Salespeople Market Segments Business units
Measures?
Money
Sales volume
Head count
Inventory profit
Time?
Daily
Weekly
Monthly
Two classes of fact-table attributes?
Dimension attributes
- The key of a dimension table
Dependent attributes
- A value determined by the dimension attributes of the tuple
Warehouse Data Models?
Relations
Stars and snowflakes
Cubes
Warehouse Operators?
Slice and dice
Roll-up, drill down
Pivoting
Other
Slice?
A subset of a dimensional array
Dice?
A slice on more than two dimensions
Drill down/Up?
Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
Roll Up?
Computing all of the data relationships for one or more dimensions
Pivot?
Used to change the dimensional orientation of a report or an ad hoc query-page display
ROLAP?
Relational Online Analytical Processing
MOLAP?
Multi-Dimensional Online Analytical Processing