Article: Introduction to Data warehousing Flashcards

1
Q

Decision Support System

A
A set of expandable, interactive IT techniques and tools designed for processing and analyzing data and for supporting managers in decision making. In practice the system provided value estimates for each alternative.
Different forms of DSS: 
- Passive 
- Active
- Collaborative
- Model-driven
- Etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data warehouse

A

A collection of data that supports decision-making processes. A Data-warehouse is:
- Subject-oriented
- Integrated
- Consistent
- Shows evolution over time
- Not vollatile
IT systems can be divided into transactional (OLTP) and analytical (OLAP). The type of query required in data warehouses is OLAP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data warehousing architectures

A
  • Single layer architecture
  • Two-layer architecture (Primary DW or Corporate DW)
  • Three-layer architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Single layer architecture

A

Not often used in practice. Goal is to minimize the amount of data stored by removing data redundancies. Composed of a source layer, data warehouse and analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Two-layer architecture

A

Composed of a source layer, data staging, data warehouse (with meta-data and data marts), and analysis layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Three-layer architecture

A

The third layer is reconciled data layer (or operational data store). This materalizes operational data obtained after integrating and cleansing source data. Main advantage: creates a common reference data model for the whole enterprise. Simultaneously, it sharply separates the problems of source data extraction and integration from those of data warehouse population. However, reconciled data leads to more redundancy of operational source data. This architecture is composed of a source layer, data staging, reconciled layer, loading, DW layer and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Central architecture vs Federated architecture

A
  • The amount of interdependent information exchanged between business units. When there is a lot of interdependent data, people tend to implement an Enterprise-wide architecture.
  • Time available for the implementation. When there is little time, people tend to use a Federated architecture.
  • Minor role of data-warehouse played in enterprise strategies. When the role is small, people tend to choose independent data marts.
  • The frequent need for integrating pre-existing data-warehouses can require a federated architecture type.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Extract, Transform, Load (ETL)

A

The ETL process takes place in the staging layer. Process takes place when a data-warehouse is populated for the first time, after that it occurs every time the data-warehouse is regularly updated. 4 different steps:

  1. Extract
  2. Cleanse - rectifying data values with rectification and homogenization.
  3. Transform - managing data formats.
  4. Load
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multi-dimensional modelling

A

This concept looks at the measures and dimensions of values of a ‘fact’, and is the basis for the OLAP cube. Measure is a value that is numeric and can be used for calculating. Dimension is associated with hierarchy of aggregation levels, such as date. Using these dimensions and measures, you can perform restrictions (selections) and aggregations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Meta-data

A

Data to define other data. Can be classified in two categories:

  • Internal meta-data
  • External meta-data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Internal meta-data

A

The users that are interested in internal meta-data are system administrators, because it contains information about data sources, transformation processes, population policies, logical and physical schemata, constraints and user profiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

External meta-data

A

The users that are interested in external meta-data are end-users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

After data-warehouse

A
  • Reports
  • OLAP
  • Dashboards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reports

A

A query and a layout -> a query generally implies a restriction and an aggregation of multidimensional data. Data-warehouses add value to reports, because they have one version of the truth (aggregated values are reliable and correct), and performance is improved by separating transaction processes and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

OLAP

A

A way to explore and analyze the data in the data-warehouse by means of multi-dimensional modelling. The most common operators are roll-up, drill-down, slice-and-dice, pivot, drill-across and drill-through.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dashboards

A

GUI that displays a limited amount of relevant data in a brief and easy-to-read format.

17
Q

ROLAP

A

Relational OLAP, an implementation based on relational DBMS’s.
Advantages:
- Can handle large amounts of data.
- Can leverage functionalities inherent in the relational database.
Disadvantages:
- Performance can be slow.
- Limited by SQL functionalities.

18
Q

MOLAP

A

Multi-dimensional OLAP, an implementation on multi-dimensional DBMS’s.
Advantages:
- Excellent performance
- Can perform complex calculations
Disadvantages:
- Limited in the amount of data it can handle
- Requires additional investment.

19
Q

HOLAP

A

Hybrid OLAP, an implementation using both relational and multi-dimensional DBMS’s.

20
Q

Other data-warehouse issues

A
  • Quality: data in the warehouse should be accurate, up-to-date, complete, consistent, available, traceable, and clear. Also documentation is an important topic, because most documentation is not standardized yet.
  • Security
  • Evolution