5.2 Quiz: Data Warehousing & Business Intelligence Flashcards

1
Q

A good datawarehouse schema to track events is

a star schema.
a snowflake schema.
a factless fact table.
a fact constellation.

A

a factless fact table.

Star Schema:
Definition: A data warehouse schema design where a central fact table is connected to dimension tables in a star-like structure.
Characteristics: Simple, with one main fact table at the center and dimension tables surrounding it. Facilitates efficient querying and reporting.

Snowflake Schema:
Definition: A variation of the star schema where dimension tables are normalized into multiple related tables, creating a snowflake-like shape.
Characteristics: More normalized than a star schema, which can save storage space. However, it may introduce additional complexity in querying.

Factless Fact Table:
Definition: A fact table that lacks measures (numeric values) but captures the relationships between dimensions.
Use Cases: Useful for scenarios where you want to record events or associations without numerical data, such as tracking attendance or relationships between entities.

Fact Constellation:
Definition: A data warehouse design with multiple fact tables connected to shared dimension tables.
Characteristics: Involves multiple star or snowflake schemas sharing common dimensions. Suitable for complex analytical scenarios where different types of facts need to be analyzed together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A data warehouse is integrated because information from different functional domains within an organization are brought together with external data in one centralized system.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which OLAP operation is shown below?
2x8x4 -> 2x3x2

Dice
Roll-up
Slice
Drill-down

A

Dice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLTP databases are usually normalized in order to optimize query performance. This is necessary to support users with quick results for complex queries. A normalized structure will improve the speed with which information can be retrieved from the database.

True
False

A

False

normalized to reduce redundancy and improve data integrity

denormalization= For systems where quick query performance is a priority,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Analytics is an example of

verification-based Business Intelligence.
discovery-oriented Business Intelligence.

A

discovery-oriented Business Intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An independent datawarehouse pulls data directly from the datawarehouse.

True
False

A

False ->indepent directly
dependent -> directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hybrid OLAP (HOLAP) tries to combine the best of both MOLAP and ROLAP. An RDBMS can then be used to store the detailed data in a relational data warehouse whereas the pre-computed aggregated data can be kept as a multidimensional array managed by a MDBMS.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a star schema, the foreign key in a dimension table, refers to a single record in the fact table.

True
False

A

False;corresponds to multiple records

As the name suggests, a star schema has one large central fact table which is connected to various smaller dimension tables. As illustrated in this figure, the fact table has multiple foreign keys referring to each of the dimension tables, implementing a 1:N relationship type. The primary key of the fact table consists of the composition of all these foreign keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which statement about Hybrid OLAP (HOLAP) is CORRECT?

Hybrid OLAP (HOLAP) tries to combine the best of both MOLAP and ROLAP.

In HOLAP, an RDBMS can be used to store the detailed data in a relational data warehouse whereas the pre-computed aggregated data can be kept as a multidimensional array managed by a MDBMS.

In HOLAP , the OLAP analysis can first start from the multidimensional database. If more detail is needed (e.g., during drill-down), the analysis can shift to the relational database.

All statements are correct.

A

All statements are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which statement about data marts is NOT CORRECT?

A data mart is a scaled down version of a data warehouse aimed at meeting the information needs of a homogeneous small group of end-users such as a department or business unit (e.g., marketing, finance, logistics, HR, etc.).

A data mart typically contains some form of aggregated data and is used as the primary source for report generation and analysis by this end-user group.

Data marts provide focused content such as finance, sales or accounting information, in a format tailored to the user group at hand.

Data marts can also improve query performance by offloading complex queries, and therefore workload, from other data sources (e.g., a data warehouse).

All statements are correct.

A

All statements are correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A snowflake schema

normalizes the dimension tables.
denormalizes the dimension tables.

A

normalizes the dimension tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Consider the following statement : a datawarehouse always contains up to date information. This statement is

True
False

A

false

A data warehouse is a large, centralized repository of data that is used for reporting and analysis. Unlike operational databases (OLTP databases), which are designed for transactional processing and often contain real-time or near-real-time data, data warehouses are typically updated periodically, such as daily, weekly, or monthly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Consider the following statement: datawarehouses are usually designed at the logical level since there is no conceptual data model (e.g., ER, UML) available. This statement is

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In case front-end tools such as reporting applications and data mining software get direct access to the operational databases, i.e. without the use of a true datawarehouse and/or OLAP server, we denote this as a “virtual datawarehouse”.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The non-volatile characteristic of datawarehouse implies that the data is

read-only.
write-only.
both read-only and write-only.

A

read-only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Independent data marts are standalone systems, drawing data directly from the operational systems, external sources or a combination of both.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

OLAP is an example of verification based Business Intelligence (BI).

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What statement about Multidimensional OLAP (MOLAP) is NOT CORRECT?

MOLAP can be fast in terms of data retrieval but it needs more storage to accomplish this.

MOLAP scales easily when the number of dimensions increases.

MDBMSs make use of proprietary data structures and data manipulation languages (DML), so no universal SQL-like standard is provided for data handling, which impedes their adoption.

MDBMSs are not optimized for transaction processing. Updating, inserting or deleting data is usually quite inefficient.

A

MOLAP scales easily when the number of dimensions increases.

18
Q

Operational systems focus on

INSERT, UPDATE, DELETE and/or SELECT statements.
only INSERT statements.
only UPDATE statements.
only SELECT statements.
only DELETE statements.

A

INSERT, UPDATE, DELETE and/or SELECT statements.

19
Q

In a star schema, every foreign key out of the set of foreign keys forming the composed key of a fact table, refers to one dimension table.

True
False

A

True

20
Q

In OLAP, accessing information from 2 or more connected fact tables is an example of

drill-across
drill-up
drill-down
roll-across

A

drill-across

Drill-Across: This operation involves navigating through different fact tables to combine data from multiple sources. It allows users to analyze data from different dimensions and levels across separate but related fact tables.

21
Q

An Operational Data Store (ODS) provides no query facilities.

True
False

A

False

An Operational Data Store (ODS) typically does provide query facilities. An ODS is a database that is used for short-term storage and integration of operational data from various sources. It acts as an intermediate storage and processing layer between the operational systems and the data warehouse.

22
Q

A datawarehouse is non-volatile because the records are never overwritten.

True
False

A

True

23
Q

One part of the ETL process consists in cleaning, integrating and transforming the input data. For most organizations, the design and implementation of these ETL functionalities requires a lot of effort, given that most data sources in organizations not integrated and contain many errors.

True
False

A

True

24
Q

What statement about Multidimensional OLAP (MOLAP) is NOT CORRECT?

MOLAP stores the multidimensional data using a Multidimensional DBMS (MDBMS) whereby the data is stored in a multi-dimensional array-based data structure optimized for efficient storage and quick access.

In MOLAP, the array cells represent the index keys of the array, whereas the dimensions contain the actual fact data (e.g., sales).

In MOLAP, also totals can be precomputed and stored in the array.

A potential problem with MOLAP is that the array may get sparse with many 0 s if only a limited number of combinations of dimension values occur.

A

In MOLAP, the array cells represent the index keys of the array, whereas the dimensions contain the actual fact data (e.g., sales).

the array cells do not represent the index keys of the array; rather, the array cells contain the actual fact data (e.g., sales). The dimensions in MOLAP represent the index keys.

25
Q

Which OLAP operation is shown below?
4x4x4 -> 4x4x2

Dice
Roll-up
Slice
Drill-down

A

Roll-up

Roll-up= reduce dimension by aggregating it (aka reduce data, but keep dimension)
Drill down= opposite of roll-up, open up a dimension aka show observations.
Slice= remove a dimension (eg. from 3 dimension to 2 dimension)
Dice= selects multiple fixed vamues form a dimension
pivot= changes x and y-axis from place, keeps all data

26
Q

Which statement is NOT CORRECT?

A virtual data warehouse can be built as a set of SQL queries directly on the underlying operational data sources as you see here.

A virtual datawarehouse can be built as an extra layer on top of a collection of physical independent data marts as illustrated here.

A virtual datawarehouse should provide a uniform and consistent metadata model and data manipulation language (e.g., SQL).

A key advantage of virtualization is that it is easy to keep track of historical data.

A

A key advantage of virtualization is that it is easy to keep track of historical data.

Virtualization itself is a technology or approach that abstracts and separates the physical implementation details from the logical representation of data. It is not inherently focused on keeping track of historical data. Virtualization can be applied to various aspects of computing, including storage, network, and data.

tracking historical data is more closely associated with concepts like data warehousing and temporal databases.

27
Q

Normalisation is typically not applied to datawarehouses because of query performance: users want to have quick results for complex queries and a denormalized structure will improve the speed at which the results are retrieved from the database.

True
False

A

True

28
Q

What type of schema is shown below?
middle fact table linked to 3 table which in turn are linked to other tabales

Snowflake schema
Star schema
Multidimensional schema
HOLAP schema

A

Snowflake schema

29
Q

What type of schema is shown below?
3 tables linked to a middle table

Snowflake schema
Star schema
Multidimensional schema
HOLAP schema

A

star schema

30
Q

Which statement about query and reporting is NOT CORRECT?

Query and reporting tools provide a user-friendly graphical user interface (GUI) where the business user can graphically and interactively design a report.

It is not a business user doing the query and reporting, but the IT expert.

Some tools provide an intermediate Query by Example (QBE) facility that sits in between the database and the business concepts. The idea is that a query is composed in a user-friendly way by visualizing database tables whereby the business user can enter conditions for each field that needs to be included in the query. This can then be translated to a formal data manipulation language such as SQL.

Query and reporting tools implement innovative visualization techniques aimed at making interesting data patterns stand out more prominently.

A

It is not a business user doing the query and reporting, but the IT expert.

31
Q

Which statement is CORRECT?

Traditional databases were primarily developed for these operational decisions and have a strong transactional focus in which many transactions need to be processed in small units of time.

At the tactical level, decisions are made by middle management with a medium-term focus (e.g., a month, a quarter, a year).

At the strategic level, decisions are made by senior management with long term implications (e.g., 1, 2, 5 years or more).

All statements are correct.

A

All statements are correct.

32
Q

An datawarehouse is called volatile or time-variant because new data is continuously added to the database.

True
False

A

False

33
Q

A virtual datawarehouse or data mart always contains

Outdated data
Up-to-date data

A

Up-to-date data

34
Q

A fact constellation schema can have more than one fact table.

True
False

A

True

35
Q

A data mart which pulls its data directly from a central data warehouse is a

dependent data mart
independent data mart

A

dependent

36
Q

Which statement about an Operational Data Store (ODS) is NOT CORRECT?

An Operational Data Store (ODS) deals with the disadvantage of data warehouses not containing up to date data.

An ODS can be considered as a staging area that provides query facilities.

Analysis tools that need data that is closer to real time, can query the ODS data as it is received from the respective source systems, before time consuming transformation and loading operations.

An ODS and datawarehouse can never be combined.

A

An ODS and datawarehouse can never be combined.

37
Q

Adding data to a datawarehouse usually happens daily or weekly by means of the so-called ETL-process. During this Extract, Transform and Load process, data from internal (operational) databases and data from external sources is processed in a so-called data staging area. This is a database used to manipulate the data before it is once and for all loaded into the (enterprise) datawarehouse.

True
False

A

True

38
Q

Which statement about pivot or cross-tables is NOT CORRECT?

A pivot or cross-table essentially cross-tabulates a set of dimensions in such a way that multidimensional data can be represented in a two-dimensional tabular format.

A pivot table also contains row and column totals.

BI tools also provide various user-friendly graphical facilities to customize the pivot table by dragging and dropping dimensions of interest.

Pivoting means that the rows and columns always stay fixed and cannot be rotated.

Pivot tables typically also provide drill-down facilities whereby either a dimension is further de-aggregated into more detail or new dimensions are added.

A

Pivoting means that the rows and columns always stay fixed and cannot be rotated.

39
Q

Which statement about the ETL (Extraction, Transformation and Loading) is NOT CORRECT?

According to a full extraction strategy, only the changes since the previous extraction are considered, which is also called Changed Data Capture or CDC.

During extraction, it is important to properly accommodate the different types of data sources, operating systems, and hardware environments from where the data is sourced.

The transformation usually entails the following activities: formatting, cleansing, aggregation and merging, and enrichment.

During the loading step, the data warehouse is populated by filling the fact and dimension tables hereby also generating the necessary surrogate keys to link it all up. Dimension rows should be inserted/updated before the fact rows can refer to them.

A

According to a full extraction strategy, only the changes since the previous extraction are considered, which is also called Changed Data Capture or CDC.

False. According to a full extraction strategy, the entire dataset is extracted from the source system, regardless of whether the data has changed since the previous extraction. This approach contrasts with incremental extraction strategies, where only the changes since the last extraction (Changed Data Capture or CDC) are considered.

40
Q

In a datawarehouse the data is structured, whereas in a data lake the data is unstructured.

True
False

A

True

41
Q

A snowflake schema is beneficial

if the dimension tables grow too large and a more efficient usage of storage capacity is required.

if it turns out that most queries don t make use of the outer level dimension tables and only need access to the dimension tables directly connected to the fact table.

both of the above.

A

both of the above.

42
Q

Multidimensional OLAP (MOLAP) stores the multidimensional data using a Multidimensional DBMS (MDBMS) whereby the data is stored in a multi-dimensional array-based data structure optimized for efficient storage and quick access.

True
False

A

True