data warehousing 2/3 Flashcards

1
Q

What is metadata in the context of data warehousing?

A

Metadata describes the structure, lineage, and definitions of data stored in the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three types of metadata in a data warehouse?

A

Business Metadata: Data ownership, business definitions.
Technical Metadata: Table/column names, data types, keys.
Operational Metadata: Data state (active, archived), lineage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between ETL and ELT?

A

ETL (Extract, Transform, Load): Data is transformed before loading into the warehouse.
ELT (Extract, Load, Transform): Data is loaded first, then transformed within the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the strengths of ETL processes?

A

Shorter development time.
Targeted data extraction.
Wide availability of tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of data staging in a warehouse?

A

Acts as a temporary space for data transformation and integration before loading into the warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a data mart, and how does it differ from a data warehouse?

A

A data mart is a subset of a data warehouse tailored for specific departments or functions.
It is smaller, faster to build, and focused on departmental needs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the types of data marts?

A

Independent: Stand-alone, specific to one department.
Dependent: Derived from the central data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the key index structures in data warehousing?

A

Inverted Index: Maps terms to their data locations for quick lookup.
Bitmap Index: Efficient for columns with few distinct values.
Join Index: Pre-computed joins between fact and dimension tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between inverted and bitmap indexes?

A

Inverted Index: Links terms to data records.
Bitmap Index: Uses binary vectors for efficient intersections of query results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three key security aspects of data warehousing?

A

Confidentiality, integrity, and availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can users be classified in a data warehouse security framework?

A

By role, department, job type, or grade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are common reasons for ‘dirty’ data in a data warehouse?

A

Dummy values, missing data, cryptic data, non-unique identifiers, and violation of business rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of parsing in data cleaning?

A

Identifying and isolating data components, such as splitting names or addresses into parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is data standardized during cleaning?

A

By applying consistent formats using business rules, such as replacing nicknames or adding prefixes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data enrichment?

A

Enhancing existing data by incorporating external sources to provide additional context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the role of scoring in a data warehouse?

A

Assigning probabilities to events like customer churn or likelihood of purchase.

17
Q

What are refresh techniques in data warehousing?

A

Full Extract: Reloads the entire source table (expensive).
Incremental Extract: Updates based on changes since the last extract.

18
Q

What is an inverted index used for?

A

To associate index terms with data locations, enabling fast query lookups.

19
Q

When is a bitmap index most effective?

A

For columns with a small number of distinct values.

20
Q

What are join indexes, and how do they improve performance?

A

Pre-computed joins that link fact and dimension tables, reducing query execution time.

21
Q

What are the steps in managing security for a data warehouse?

A

Define access rights based on roles and sensitivity levels.
Use encryption and filtration for data transfer.
Implement auditing to monitor access and changes.

22
Q

What is the purpose of a security audit in data warehousing?

A

To review and test the adequacy of system controls and recommend changes to policies.

23
Q

What are the three types of security classifications for data objects?

A

Public, confidential, and top secret.

24
Q

What legal requirements should be considered in data warehousing?

A

Handling sensitive data, limiting third-party sharing, and adhering to privacy regulations.

25
Q

What is volume analysis in the context of data warehousing?

A

Estimating storage needs, including current usage and future growth over time.

26
Q

What is usage analysis, and why is it important?

A

Analyzing transaction types and frequencies to optimize system performance and identify constraints.

27
Q

What are the three main types of OLAP architecture?

A

Relational OLAP (ROLAP).
Multidimensional OLAP (MOLAP).
Hybrid OLAP (HOLAP).

28
Q

What is the difference between ROLAP and MOLAP?

A

ROLAP uses relational databases and is suitable for large datasets.
MOLAP uses pre-aggregated arrays and is faster for smaller, predefined queries.