W03 Data Ware Houses and OLAP Flashcards
Data Munging
3 steps
need to know?
Selection
Pre-Processing
Transformation
How and Why data is stored?
Business Intelligence
3 steps
1 Collect and Store in database system
2 Extract, transform and load into central data warehouse
3 Analyze and present data for business intelligence
- > consolidate, analyse and present data for decision support
- > descriptive analytics
Database Systems
1 Store large data sets
- all addresses
- all orders
- all current contracts
- …
- stored in relational tables
2 Analyse data sets
- vis SQL
- list of all customers in berlin with revenue higher 10k
Databases in IT Systems (layers)
Presentation
-user interface in app or web browser
Logic
-computed as implemented in a programming language
Storage
-Data Base System
A Data Base System (equation)
Data Base + Data Base Management System =Data Base System
Tasks of the Database Management system
1 interpret queries (log layer (sql) to low level)
2 optimise queries for given sql query
3 ensure integrity
no disintegrating manipulations
4 control access: user authorization
5 manage simultaneous access
6 manage back-ups
7 manage files: memory allocation etc
The Relational Data Model
primary keys (unique identifiers)
Entity Relationship Model
One employee (with unique ID) attends to one (or more) customer with unique ID who each have only one employee….
Data Warehouses
- database with reporting and query functions
- stores operative and historical data
- data extracted form diverse operational systems
- processes to support management reports and analysis (OLAP)
- multi-dimensional
- long-term
- historical, cleaned, validated, synthetic, operative data
- internal and external sources
Operational Database vs Data Warehouse
operational data (day to day transactions) VS strategic data
complete data VS historical data
detailed data VS aggregate data
non-redundant data VS more read than edited
dynamic data VS ad-hoc queries
complex data model VS polished user interface
ETL Process
Extract
relevant data from diverse sources
Transform
data into the format of the central database system
Load
data into the central database systen
Data Warehouse is an organisation-wide concept aiming to provide
- complete, central, consistent data basis
- independent of operational databases
- support diverse analytics projects
working with a data warehouse
1 select well-suited attributes form operative data bases
2 add selected data from external sources
3 transform and load data
4 store data in dimensions
5 manage database
6 search and analyze via reports or OLAP
Components of a Data-Warehouse-System
1 Data Marts
2 Central Data Warehouse
3 Enterprise Data Warehouse
1 Data Marts
-databases targeted to analytics requirements of user group
-managed by a decentral team
simple data model and development
-mostly self-contained
2 Central Data Warehouse
- analytics database that feeds data into local data sets
- can still provide information for just a part of the organisation
3 Enterprise Data Warehouse
provides data to support analyitcs across organization
Hierarchical Data-Warehouse structure
central data warehouse (CDWH) feeds and coordinates local data marts
Exemplary Queries
-personal data mart
analyse customers according to dimension “location” in hierarchies “country” ; “region” ; “city”
CRM (Customer Relationship Management)
- what?
- how?
- used with what tools?
- used with what goals?
CRM Analytics: examine customer data to support decisions on improving products, services, and the market interface
- consistent view of customer by integrating all data sources in data warehouse
- OLAP and data mining
- identify preferences
- decide on offers (cross-selling)
- classify by profitability and potential for marketing
- determine degree of service
OLAP (online analytical processing)
what?
reports?
query languages?
OLAP?
enables decision makers to flexibly access data to support complex analyses
Static Reports
- can be parametrized
- structural changes have to be implemented by programmers
Query Languages
- standardised and powerful
- complex to learn
- SQL, QBE
OLAP enables flexible ad-hoc queries without requiring high technical expertise
OLAP cube
to present data from warehouse clear arrangement dimensions: time period area product
OLTP
OnLine Transactional Processing (from operational databases)
OLAP
OnLine Analyticsl Processing (lets users quickyl access multiple dimensions of data for interactive analysis of data from data warehouse)
Functions of OLAP
- representation approaches
- cube operations
- absolute vs relative
- 3d analysis, hierarchical levels
- diverse indicator computations
- drilling (aggregation level)
- pivoting (switching rows and columns)
- slicing (reduce number of dimensions)
- dicing (cut out parts of current cube, “filter”)
OLAP Slice
horizontal slice of OLAP cube
e.g. ONE product variant across areas and time
Pivoting
Results in better representation
Drill-Down and Dice
subset of data (a dice)
Limits of OLAP
machine-supported manual search
+limited number of hypotheses
+limited scaled applicability
“What drives cancellation probabilities?”
->need for manually-supported machine search -> Data Mining
Summary: Business Intelligence
1 Support analytics for managers
2 OLAP, statistics and datamining
3 data marts provide function-specific EXTRACTS from data warehouse
4 data warehouse provides enterprise-wide database for analytics
5 automated ETL process connects databases to datawarehouse
NoSQL
+availability
+partition tolerance
+speed
+direkt input for datamining
-can sacrifice consistency and integrity