BIS II - Data Warehouses Flashcards

1
Q

Data warehouse – Definition

A
  1. Collection of databases
  2. Pool of data to support decision making
  3. Repository of current and historical data
  4. “Single version of truth”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Warehouse – Characteristics

A
  1. Subject oriented
  2. Integrated
  3. Time-variant (time series)
  4. Nonvolatile
  5. Metadata
  6. Web based, client/server
  7. Relational/multi-dimensional
  8. Real-time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Major Components of Data Warehousing Process

A
  1. Data sources: data are sourced from multiple independent operational “legacy” systems
  2. Data extraction and transformation: data are extracted and properly transformed using custom-written or commercial software called ETL
  3. Data loading: data are loaded into staging area, where they are transformed and cleansed; the data is then ready to load into the DW or data marts
  4. Comprehensive database: essentially the enterprise data warehouse to support all decision analysis; provides summarized and detailed information from multiple sources
  5. Metadata: are maintained so that they can be easily accessed by IT personnel and end users; incl. software programs about data and rules for summaries, that are easy to index and search
  6. Middleware tools: enable access to the data warehouse; power users may write their own scripts (e.g. SQL); Many front-end applications allow interaction with data stored in the DW
  7. Applications (Visualization), e.g. with OLAP,
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Integration and the ETL Process: Data Integration (Second Step in Data Warehousing Process)

A

ETL = Extract Transform Load
- A decision maker needs access to multiple sources of data that must be integrated
- Data Integration comprises three major processes:
o Data access
o Data federation
o Change capture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Data Warehouses

A

Data mart
- Departmental small-scale “DW” that stores only limited/relevant data
- Two types:
o Dependent data mart: subset that is created directly from a data warehouse
o Independent data mart: small data warehouse designed for strategic business unit or a department

Operational data Stores (ODS)

  • Type of database used as an interim area for a data warehouse
  • Similar to short-term memory (DW is a “long-term memory”)

Enterprise Data Warehouse (EDW)
- A data warehouse for the enterprise

Metadata

  • Data about data
  • In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Analysis of data in data warehouse

A

OLTP (Online Transaction Processing)
• Capturing and storing data from ERP, CRM, POS
• The main focus is on efficiency of routine tasks
OLAP (Online Analytical Processing)
• Converting Data into information for decision support
• Data cubes, drill-down/rollup, slice & dice, …
• Requesting ad hoc reports
• Conducting statistical and other analyses
• Developing multimedia-based applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(!) Analysis of data in data warehouse - OLAP vs. OTLP

A

OLTP:
Purpose: to carry out day-to-day business functions
Data Source: transaction database, normalized data repository, primarily focused on efficiency and consistency
Reporting: routine, periodic, narrowly focused reports
Resource Requirements: ordinary relational database
Execution Speed: fast; recording of business transactions and routine reports

OLAP:
Purpose: to support decision making, to provide answers for business and management queries
Data Source: data warehouse/data mart; a non-normalized data repository, primarily focused on accuracy and completeness
Reporting: ad hoc, multidimensional, broadly focused reports and queries
Resource Requirements: multiprocessor, large-capacity, specialized databases
Execution Speed: slow, resource intensive, complex, large-scale queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

OLAP Operations: Slice

A
  • • Definition: a subset of data corresponding to a single value set for one or more dimensions not in the subset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

OLAP Operations: Dice

A
  • Definition: a slice on more than two dimensions

* (e.g. putting filters on two columns in an excel or “SELECT * FROM xxx WHERE xxx AND xxx” in SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Other OLAP Operations

A

Drill Down/Up: Navigating among levels of data ranging from the most summarized (up) to the most detailed (down)
Roll Up: Computing all of the data relationships for one or more dimensions. A formula might be needed.
Pivot: used to change the dimensional orientation of a report or an ad hoc query-page display

How well did you know this?
1
Not at all
2
3
4
5
Perfectly