3 - Data Warehouses and OLAP Flashcards
Business Intelligence and Data Management
Business Intelligence
- consilidate, analyse and present data for decision support
- corresponds to descriptive analytics according to the definitions here
- relies on extensive operational data sources, usually consolidated in a data warehouse
Business Intelligence and Data Management
Steps
- collect and store data in a database system
- extract, transform and load into a central data warehouse
- analyse and present data for business intelligence
Database Systems
- Store large data sets
- Examples: all orders accepted in the last five years, all of an online bookseller’s customers’ addresses
- this data is stored in relational tables - Analyse these data sets
- SQL (Structured Query Language) can access and analyze relational data
- Examples: list all customers in Berlin who ordered at least three times a year; what products produced the highest revenue during the last five years
Databases in IT Systems
Presentation:
- user interface in an app(lication) or in a web browser
Logic:
- computation as implemented in a programming language
Storage:
- Data Bases system
Components of a Database system
data base
+ data base management system
= data base system
Tasks of the Database Management System
- interpret queries
- optimise queries
- ensure integrity
- control access
- manage simultaneous access
- manage back-ups
- manage files
Tasks of the Database Management System
Interpret Queries
Translate SQL from the logic layer into “low-level”-routines
Tasks of the Database Management System
Optimise Queries
Search for the “best” way to access data given an SQL-query, access statics, and index data
Tasks of the Database Management System
Ensure Integrity
Enforce that data can only be manipulated according to defined integrity conditions
Tasks of the Database Management System
Control Access
Test whether the user is authorized to access the data
Tasks of the Database Management System
Manage simultaneous access
Synchronise simultaneous access to data and restrict data manipulation that would lead to conflicts
Tasks of the Database Management System
Manage back-ups
Ensure that system break-downs or errors have limited consequences
Tasks of the Database Management System
Manage files
Control the allocation of memory and access to the hard drive
The relational data model
Describing a relation (=table)
- define attributes, data types, primary keys (unique identifiers) and table name: e.g. Students (Student-ID, First Name, Last Name)
- there can be multiple relations with the same type of data, e.g. when splitting up data from multiple years across data bases
Data Warehouses
What’s a data warehouse?
- A data warehouse is a data database with reporting and query functions that stores operative and historical data
Data Warehouses
Data in a data warehouse
- data is extracted from diverse operational systems and processed to support management reports and analyses (-> OLAP, Data mining)
- data warehouses are multi-dimensional and offer long-term storage for historical, cleaned, validated, synthetic, operative data from internal and external sources
The ETL Process
What does ETL stand for?
EXTRACT relevant data from diverse sources
TRANSFORM data into the format of the central database system
LOAD data into the central database system
The ETL process and data warehouses
The Data Warehouse is an organization-wide concept aiming to provide
- a central, complete, consistent data basis
- that is independent of operational databases and
- can support diverse analytics projects
Working with a data Warehouse
Steps
- select well-suited attributes from operative data
- add selected data from external sources
- transform and load data
- store data in dimensions
- manage database (as in operational data bases)
- search and analyse via reports or OLAP
Components of a Data-Warehouse-System
Data Marts
Central data warehouse
Enterprise data warehouse
Components of a Data-Warehouse-System
Data Marts
- databases targeted to the analytics requirements of a specific user group
- managed by a decentral team
- simple data model and development
- mostly self-contained
Components of a Data-Warehouse-System
Central data warehouse
- analytics database that feeds data into local data sets
- can still provide information for just a part of the organization
Components of a Data-Warehouse-System
Enterprise Data Warehouse
- provides data to support analytics across the organization
Hierarchical DWH-Architecture
- central data warehouse (CDWH) feeds and coordinates local data marts
- CDWH extracts, integrates, and allocates data
Data Marts:
- serve for queries and analyses
- are tailored to one functional part of an organization
Example: Customer Relationship management
CRM-Analytics examines customer data to support decisions on improving products, services, and the market interface
Consistent view of the customer by integrating view from all customer contacts (from call. centre, sales representative, website) in the data warehouse
Example: Customer Relationship management
Customer data is analysed to …
Customer data is analysed to
- identify customer preferences
- decide on offers (e.g. bundling, cross-selling)
- classify customers by profitability and potential to target marketing efforts
- target the degree of service (e.g. profitable customers are handled by key accounted, less profitable customers end in the call centre queue)
Example: Customer Relationship management
Analytics via …
OLAP
Data Mining
User Access to data warehouses
Decision makers have to be able to flexibly access data to support complex analyses
Ways of access
- Static reports
- query languages
- OLAP
User Access to data warehouses
Static reports
- can be parametrized
- any structural changes have to be implemented by programmers
User Access to data warehouses
Query languages
- standardized and powerful
- complex to learn
- e.g. SQL, QBE
User Access to data warehouses
OLAP
- enables flexible ad-hoc queries without requiring high technical expertise
(- insights that are not included in the reports)
OLAP: Online Analytical Processing
- frequent solution to present data from a warehouse
- data from multiple dimensions are clearly arranged
- common: cube representation for three dimensions
OLAP vs. OLTP
OLAP (Online Analytical Processing):
Query Method that lets users quickly access multiple dimensions of data for interactive analysis of data from data warehouses
OLTP (Online Transactional Processing):
Processing of transactional data from operational databases
OLAP vs. OLTP:
OLTP
Data: Access: Granularity: Topicality: Main operations: Memory requirements: Tools:
Data: operational transactional data
Access: complex
Granularity: microscopic (original transactional data)
Topicality: complete current status
Main operations: read and write
Memory requirements: large
Tools: SQL
OLAP vs. OLTP
OLAP
Data: Access: Granularity: Topicality: Main operations: Memory requirements: Tools:
Data: management-relevant data for analytics
Access: easy
Granularity: macroscopic (transformed and aggregated data)
Topicality: historical snapshot
Main operations: read
Memory requirements: smaller
Tools: proprietary
OLAP Standard functions
- different approaches to representation
- specific cube operations allow browsing the data
OLAP Standard functions
Different approaches to representation
- absolute vs. relative
- 3D-analysis using multiple hierarchical levels
- diverse indicator computations
OLAP Standard functions
Specific cube operations allow browsing the data
Drilling:
- drill up/down: vary the degree of aggregation along one dimension
- drill through: access the level of detail given by the operational data base
Pivoting (rotating):
- switch rows and columns
Slicing:
- reduce the number of dimensions
Dicing:
- cut out parts of the current cube (“filter”)
OLAP
Slice
- cut out one horizontal slice from the cube
- any slice is another cube (and can be divided up further according to hierarchies)
- slicing can target any hierarchical level of a dimension
OLAP
Pivoting
- cubes can be turned as needed
- this can provide a better representation
OLAP
Drill down and dice
- one dimension is shown on another aggregation level (drilled down), often filtering to consider a particular subset of data (a dice)
Summary:
Business Intelligence
- supports analytics for decision makers (managers)
- Uses OLAP, statistics, and data mining
- Data marts provide function-specific extracts from the data warehouse
- the data warehouse provides an enterprise-wide database for analytics
- automated ETL process connects operational databases to the data warehouse
What is NoSQL?
- NoSQL databases, e.g. Mongo DB, do not split up data to ensure consistent links
- e.g. they store all information for one order in one row (customer name, address, product type, …)
Advantages of NoSQL
- availability
- partition tolerance
- speed
- provides direct input for data mining approaches that would span multiple relational tables