Data Management - What I missed from his slides Flashcards by Joost Kok

What is business intelligence?

Data driven decision making

- Transforming data into meaningful information/knowledge to support business decision making

How well did you know this?

Not at all

Perfectly

What is knowledge?

Processed data or information that is applicabale to a business decision problem

How well did you know this?

Not at all

Perfectly

What are descriptive analytics, predictive analytics and prescriptive analytics?

Using data to understand past and present
Predict future behaviour based on past performance
Make decisions or recommendations to achieve the best performance

How well did you know this?

Not at all

Perfectly

How is business intelligence a product?

It provides information and knowledge that enables decision making

How well did you know this?

Not at all

Perfectly

How is business intelligence a process?

You have to extract information and knowledge from data. This requires a series of actions.

How well did you know this?

Not at all

Perfectly

What is a database management system (DBMS)?

The software that controls the data.

How well did you know this?

Not at all

Perfectly

What is the difference between a super key and a candidate key?

A superkey is any combination of attributes that identifies each row.

A candidate key is a superkey without any unnecessary attributes

How well did you know this?

Not at all

Perfectly

What is cardinality?

The way tables are related. Their relationship.

How well did you know this?

Not at all

Perfectly

What is the difference in terms of time between a data warehouse and an operational system?

Operational: Current value data

Data warehouse: More historical perspective.
In a data warehouse, every key structure contains an element of time (whether you know it or not).

How well did you know this?

Not at all

Perfectly

What are 4 conceptual characteristics of a data warehouse?

Subject-oriented
Integrated
time-variant
nonvolatile (no updates)

How well did you know this?

Not at all

Perfectly

What is an independent data mart architecture?

Different data marts are seperately designed and built in a nonintegrated fashion. They are loaded with data from the source layer.

How well did you know this?

Not at all

Perfectly

Wat is a bus architecture?

Similar to an independent data mart system, except that the data marts have conformed dimensions.

How well did you know this?

Not at all

Perfectly

What is a hub-and-bespoke architecture?

You have datamarts that are loaded with data from the reconciled layer.

One of the most used architectures. Much attention to scalability and extensibility.

Reconciled layer: already integrated, consistent and correct data.

+ creates a common reference model
- leads to more redundancy

How well did you know this?

Not at all

Perfectly

What is a centralized architecture?

It is an implementation of the hub and bespoke, in which the reconciled layer and the data marts are collapsed into a single physical repository.

How well did you know this?

Not at all

Perfectly

What is a federated architecture?

You have datamarts filled by the operational data sources. BUT the data marts are integrated (virtually or physically) after that.

Happens often when there is a merger or acquisition and some datawarehouse needs to continue to exist.

How well did you know this?

Not at all

Perfectly

What is the difference between the grouped name ‘Data Mart Approach’ (bottom-up) and ‘Enterprise data warehouse approach’ (top-down?

Study These Flashcards

Data mart approach means you have a collection of data marts. Uses dimensional modelling.
Enterprise warehouse means that (even though you might have data marts) they are integrated. Uses Entity-relational modeling.

What is a data lake?

Study These Flashcards

Similar to a datawarehouse, except that the data is stored in its natural/raw form untill it is needed.

What two types of loads are their in the ETL-loading stage?

Study These Flashcards

Integral load
All records are taken and processed by comparing them with the records in the datawarehouse
Delta/incremental load
Only the changed records are taken and processed systematically in the datawarehouse.

What are the different generations of ETL tools?

Study These Flashcards

Generation 0 - do it yourself (SQL procedures)
Generation 1 - code generators
Generation 2 - ETL engines

What is the difference between ‘Query & reporting’ , ‘OLAP’ and ‘Data Mining Models’?

Study These Flashcards

Query and reporting gives static reports through SQL.

OLAP gives interactive analysis of multidimensional data

Data mining discovers new valuable knowledge
(which customer is at risk of leaving?)

What is a lattice of cubes?

Study These Flashcards

If you roll up the base cube over all its associated dimensions and dimension hierarchies.

Bottom most cube = base cub Cb

Topmost cube = apex = Ct = contains only one cell

How do you calculate how many cubes there are in a multidimensional database if there are no hierarchies?

Study These Flashcards

You have n dimensions.

2^n

How do you calculate how many cubes there are in a multidimensional database if there are hierarchies?

Study These Flashcards

You have n dimensions and each dimension has L levels.

(L + 1)^n

What is the purpose of a star schema?

Study These Flashcards

To limit the number of joins in a multi-dimensional model to increase query performance.

What is given in a star schema?

You have the fact table in the middle. This table has a list of all the dimension keys. These dimension keys are linked to other tables that are the dimension table. These have lists of their layers. The fact table also include the measure keys.

What is a snowflake schema?

Similar to a star schema, except that data is more normalized. This means that some dimension tables are linked to a secondary dimension table further specifying a specific layer in the dimension. In the first dimension table that means that there is an extra 'dimension key' to link to the secondary table.

How do you calculate density in a cube?

Number of actual rows / number of potential rows. If the Fact sales has dimension Time, Product and Store and you set the dimension on Time(day), products and Stores. There are 4 days, 5 products and 3 stores. Potential rows: 4 * 5 * 3 Actual rows = given = 35 Densitity = 35 / 65 = 53.33%

How can you deal with slow changes in dimension attributes?

1. Overwrite old values 2. Add version number field as part of the key 3. Add new columns to the dimension table.

Data Management - What I missed from his slides Flashcards

(28 cards)