3 - Data Warehouses and OLAP Flashcards

1
Q

Business Intelligence and Data Management

Business Intelligence

A
  • consilidate, analyse and present data for decision support
  • corresponds to descriptive analytics according to the definitions here
  • relies on extensive operational data sources, usually consolidated in a data warehouse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Business Intelligence and Data Management

Steps

A
  1. collect and store data in a database system
  2. extract, transform and load into a central data warehouse
  3. analyse and present data for business intelligence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Database Systems

A
  1. Store large data sets
    - Examples: all orders accepted in the last five years, all of an online bookseller’s customers’ addresses
    - this data is stored in relational tables
  2. Analyse these data sets
    - SQL (Structured Query Language) can access and analyze relational data
    - Examples: list all customers in Berlin who ordered at least three times a year; what products produced the highest revenue during the last five years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Databases in IT Systems

A

Presentation:
- user interface in an app(lication) or in a web browser

Logic:
- computation as implemented in a programming language

Storage:
- Data Bases system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Components of a Database system

A

data base
+ data base management system
= data base system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tasks of the Database Management System

A
  • interpret queries
  • optimise queries
  • ensure integrity
  • control access
  • manage simultaneous access
  • manage back-ups
  • manage files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tasks of the Database Management System

Interpret Queries

A

Translate SQL from the logic layer into “low-level”-routines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Tasks of the Database Management System

Optimise Queries

A

Search for the “best” way to access data given an SQL-query, access statics, and index data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Tasks of the Database Management System

Ensure Integrity

A

Enforce that data can only be manipulated according to defined integrity conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Tasks of the Database Management System

Control Access

A

Test whether the user is authorized to access the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tasks of the Database Management System

Manage simultaneous access

A

Synchronise simultaneous access to data and restrict data manipulation that would lead to conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Tasks of the Database Management System

Manage back-ups

A

Ensure that system break-downs or errors have limited consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tasks of the Database Management System

Manage files

A

Control the allocation of memory and access to the hard drive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The relational data model

A

Describing a relation (=table)

  • define attributes, data types, primary keys (unique identifiers) and table name: e.g. Students (Student-ID, First Name, Last Name)
  • there can be multiple relations with the same type of data, e.g. when splitting up data from multiple years across data bases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Warehouses

What’s a data warehouse?

A
  • A data warehouse is a data database with reporting and query functions that stores operative and historical data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Warehouses

Data in a data warehouse

A
  • data is extracted from diverse operational systems and processed to support management reports and analyses (-> OLAP, Data mining)
  • data warehouses are multi-dimensional and offer long-term storage for historical, cleaned, validated, synthetic, operative data from internal and external sources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The ETL Process

What does ETL stand for?

A

EXTRACT relevant data from diverse sources

TRANSFORM data into the format of the central database system

LOAD data into the central database system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The ETL process and data warehouses

A

The Data Warehouse is an organization-wide concept aiming to provide

  • a central, complete, consistent data basis
  • that is independent of operational databases and
  • can support diverse analytics projects
19
Q

Working with a data Warehouse

Steps

A
  1. select well-suited attributes from operative data
  2. add selected data from external sources
  3. transform and load data
  4. store data in dimensions
  5. manage database (as in operational data bases)
  6. search and analyse via reports or OLAP
20
Q

Components of a Data-Warehouse-System

A

Data Marts
Central data warehouse
Enterprise data warehouse

21
Q

Components of a Data-Warehouse-System

Data Marts

A
  • databases targeted to the analytics requirements of a specific user group
  • managed by a decentral team
  • simple data model and development
  • mostly self-contained
22
Q

Components of a Data-Warehouse-System

Central data warehouse

A
  • analytics database that feeds data into local data sets

- can still provide information for just a part of the organization

23
Q

Components of a Data-Warehouse-System

Enterprise Data Warehouse

A
  • provides data to support analytics across the organization
24
Q

Hierarchical DWH-Architecture

A
  • central data warehouse (CDWH) feeds and coordinates local data marts
  • CDWH extracts, integrates, and allocates data

Data Marts:

  • serve for queries and analyses
  • are tailored to one functional part of an organization
25
Q

Example: Customer Relationship management

A

CRM-Analytics examines customer data to support decisions on improving products, services, and the market interface

Consistent view of the customer by integrating view from all customer contacts (from call. centre, sales representative, website) in the data warehouse

26
Q

Example: Customer Relationship management

Customer data is analysed to …

A

Customer data is analysed to

  • identify customer preferences
  • decide on offers (e.g. bundling, cross-selling)
  • classify customers by profitability and potential to target marketing efforts
  • target the degree of service (e.g. profitable customers are handled by key accounted, less profitable customers end in the call centre queue)
27
Q

Example: Customer Relationship management

Analytics via …

A

OLAP

Data Mining

28
Q

User Access to data warehouses

A

Decision makers have to be able to flexibly access data to support complex analyses

Ways of access

  • Static reports
  • query languages
  • OLAP
29
Q

User Access to data warehouses

Static reports

A
  • can be parametrized

- any structural changes have to be implemented by programmers

30
Q

User Access to data warehouses

Query languages

A
  • standardized and powerful
  • complex to learn
  • e.g. SQL, QBE
31
Q

User Access to data warehouses

OLAP

A
  • enables flexible ad-hoc queries without requiring high technical expertise
    (- insights that are not included in the reports)
32
Q

OLAP: Online Analytical Processing

A
  • frequent solution to present data from a warehouse
  • data from multiple dimensions are clearly arranged
  • common: cube representation for three dimensions
33
Q

OLAP vs. OLTP

A

OLAP (Online Analytical Processing):
Query Method that lets users quickly access multiple dimensions of data for interactive analysis of data from data warehouses

OLTP (Online Transactional Processing):
Processing of transactional data from operational databases

34
Q

OLAP vs. OLTP:
OLTP

Data:
Access:
Granularity:
Topicality:
Main operations:
Memory requirements:
Tools:
A

Data: operational transactional data

Access: complex

Granularity: microscopic (original transactional data)

Topicality: complete current status

Main operations: read and write

Memory requirements: large

Tools: SQL

35
Q

OLAP vs. OLTP
OLAP

Data:
Access:
Granularity:
Topicality:
Main operations:
Memory requirements:
Tools:
A

Data: management-relevant data for analytics

Access: easy

Granularity: macroscopic (transformed and aggregated data)

Topicality: historical snapshot

Main operations: read

Memory requirements: smaller

Tools: proprietary

36
Q

OLAP Standard functions

A
  • different approaches to representation

- specific cube operations allow browsing the data

37
Q

OLAP Standard functions

Different approaches to representation

A
  • absolute vs. relative
  • 3D-analysis using multiple hierarchical levels
  • diverse indicator computations
38
Q

OLAP Standard functions

Specific cube operations allow browsing the data

A

Drilling:

  • drill up/down: vary the degree of aggregation along one dimension
  • drill through: access the level of detail given by the operational data base

Pivoting (rotating):
- switch rows and columns

Slicing:
- reduce the number of dimensions

Dicing:
- cut out parts of the current cube (“filter”)

39
Q

OLAP

Slice

A
  • cut out one horizontal slice from the cube
  • any slice is another cube (and can be divided up further according to hierarchies)
  • slicing can target any hierarchical level of a dimension
40
Q

OLAP

Pivoting

A
  • cubes can be turned as needed

- this can provide a better representation

41
Q

OLAP

Drill down and dice

A
  • one dimension is shown on another aggregation level (drilled down), often filtering to consider a particular subset of data (a dice)
42
Q

Summary:

Business Intelligence

A
  • supports analytics for decision makers (managers)
  • Uses OLAP, statistics, and data mining
  • Data marts provide function-specific extracts from the data warehouse
  • the data warehouse provides an enterprise-wide database for analytics
  • automated ETL process connects operational databases to the data warehouse
43
Q

What is NoSQL?

A
  • NoSQL databases, e.g. Mongo DB, do not split up data to ensure consistent links
  • e.g. they store all information for one order in one row (customer name, address, product type, …)
44
Q

Advantages of NoSQL

A
  • availability
  • partition tolerance
  • speed
  • provides direct input for data mining approaches that would span multiple relational tables