Lecture 2 Flashcards

1
Q

Cardinality

A

The way tables are related. 3 types:

  • One-to-one (or zero-to-one)
  • One-to-many
  • Many-to-many (M-to-N) often via a Bridge table.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bridge table

A

Decomposes the M-to-N table in to one-to-many relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CIF and Programming languages composed of

A
  • Production (also called back-end)
  • Assembly, logistics, and storage (includes Data-warehousing, is a DBMS)
  • Processing, analysis and consumption (also called front-end)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SQL basic form

A
Select [fields]
from [one or more tables]
Where [criteria]
Group by [product]
Having [criteria]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Join

A

Generally composed of an equality comparison between foreign key and primary key of related tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Different ways of creating joins (Venn Diagrams)

A
  • Inner join (result is overlap)
  • Left outer join (result is left)
  • Right outer join (result is right)
  • Full outer join (result is everything)
  • Minus (result is left or right without overlap)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Trends in the database world

A
  1. From disk-based to in-memory databases (SAP HANA).
    History: CPU is relatively fast, memory is very slow. -> the further away from the CPU, the slower.
    Current/future: memory became cheaper, and larger. Therefor it is possible to store databases in-memory. However there is a technical movement -> instead of working in rows, they use columns, which makes it faster.
  2. From local databases to cloud databases (Azure, Google Cloud etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data warehouse

A

A database that is maintained separately from the organization’s operational databases for the purpose of managerial decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data warehousing

A

The process of constructing and using data warehouses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we need a separate data warehouse?

A
  1. Data quality: If you want to make use of the data, you want to have high data quality.
  2. Historical data: The problem is time-related. Mostly, databases in the back-end do not have much historical data.
  3. Data consolidation: You want to aggregate the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Components of a data warehouse framework

A
  1. Production
  2. ETL
  3. Data warehouse
  4. Data marts
  5. Meta data
  6. BI Applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Production

A

Contains the different databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ETL

A

Extract, Transform, Load. Within the component, you want to clean, convert, and link the data to improve the data quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data warehouse:

A

The data in this part is more ‘formed’ to support the information requirements for managers and decision-making. This data is cleaned, ordered and linked. -> Star and Snowflake modelling. This part does not have up-to-date data. There is a certain delay because of the ETL process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Marts

A

A subset of a data warehouse. There are less details, history and dimensions. It is user/group oriented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dependent Data Mart

A

Dependent on the central warehouse.

17
Q

Independent Data Mart

A

Independent from the central warehouse and is directly linked to the ETL.

18
Q

Metadata

A

Data about the data (such as location, meaning, transformations etc.)

19
Q

BI applications

A

Connected with the data warehouse or data mart. Two trends:

  • Integrated BI/BA-suits
  • Enterprise information portals.
20
Q

What is a datawarehouse

A
  • Subject oriented
  • Integrated
  • Time-variant
  • Nonvolatile
    Collection of data in support of management’s decision-making process.
21
Q

Subject oriented

A
  • Focussing on the analysis of data or decision makers not on on daily operations/transition processing.
  • Providing a simple view around particular subjects by excluding data that are not useful in the decision support.
22
Q

Integrated

A
  • Constructed by integrating multiple heterogenous data sources.
  • Data cleaning and data integration techniques are applied.
    + Ensure consistency in naming conventions.
    + When data is moved to the data warehouse, it is converted.
23
Q

Time-variant

A
  • Time horizon for the warehouse is significantly longer than that of operational systems.
    + Operational database
    + Data warehouse data: provide information from historical perspective.
  • Every key structure in the data warehouse.
    + Contains an element of time.
24
Q

Non-volatile

A
  • Operational updates of data do not occur in the data warehouse environment. This requires 2 operations:
    + Initial loading (SQL: Insert)
    + Access of data (SQL: Select)
25
Q

Different types of data warehouses

A
  • Direct BI on source systems
  • Canned data warehouse (in ERP system)
  • Independent data marts
  • Bus architecture
  • Enterprise data warehouse
  • Hub & spoke
  • Federated data warehouse
  • Data lake
26
Q

Central data warehouse

A

A pure architect -> Top down -> Enterprise data warehouse approach.

  • Has one integrated warehouse.
  • Advantage is that it has a simple ETL process.
  • Entity-relational modelling.
  • Examples: Hub & Spoke and Federated data warehouse
27
Q

Federated architecture

A

Architecture is composed of data marts (no central data warehouse) -> Bottom up -> Canned data warehouse.
- Start at the requirements of a single department, and after that they scale up. Data warehouse = collection of data marts. Typical feature = dimensional modelling. Cheap to build.
- Dimensional modelling: consistency achieved by conformed dimensions.
- Problems:
+ very much ETL has to be done.
+ low data consistency -> solution is data mart bus architecture, which links the different data marts by conformed dimensions.
- Examples: Independent data marts, bus architecture, and the canned data warehouse are within this category.

28
Q

Data mart vs Enterprise data warehouse

A

No one-size fits all strategy to data-warehousing. Depends on:

  • Management’s information needs -> when management want a central overview, the enterprise data warehouse approach is probably better.
  • Information interdependence between organisational units.