Data warehousing Flashcards

1
Q

constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making.

A

data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Who first coined the term data warehouse in 1990?

A

Bill Inmon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is OLAP?

A

Online Analytical Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 features of Data Warehouse

A

Subject Oriented
Integrated
Time Variant
Non-volatile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.

A

Subject Oriented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.

A

Integrated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.

A

Time Variant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

This means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.

A

Non-volatile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T or F

A data warehouse does not require transaction processing, recovery, and concurrency controls, because it is physically stored and separate from the operational database.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T OR F

data warehouse helps business executives to organize, analyze, and use their data for decision making. A data warehouse serves as a sole part of a plan-execute-assess “closed-loop” feedback system for the enterprise management.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Warehouse Applications

A

Financial services
Banking services
Consumer goods
Retail sectors
Controlled manufacturing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

3 types of Data Warehouse

A

Information Processing
Analytical Processing
Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.

A

Information Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.

A

Analytical Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. involves data cleaning, data integration, and data consolidations.

A

Data Warehousing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Functions of Data Warehouse Tools and Utilities

A

Data Extraction
Data Cleaning
Data Transformation
Data Loading
Refreshing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Involves gathering data from multiple heterogeneous sources.

A

Data Extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Involves finding and correcting the errors in data.

A

Data Cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Involves converting the data from legacy format to warehouse format.

A

Data Transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.

A

Data Loading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Involves updating from data sources to warehouse.

A

Refreshing

24
Q

These are important steps in improving the quality of data and data mining results.

A

Data cleaning and data transformation

25
Q

simply defined as data about data. The data that are used to represent other data

A

metadata

26
Q

a road-map to data warehouse.
in data warehouse defines the warehouse objects.
acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.

A

Metadata

27
Q

an integral part of a data warehouse system.

A

Metadata repository

28
Q

It contains the data ownership information, business definition, and changing policies.

A

Business metadata

29
Q

It includes currency of data and data lineage. Currency of data refers to the data being active, archived, or purged. Lineage of data means history of data migrated and transformation applied on it.

A

Operational metadata

30
Q

It metadata includes source databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and purging rules.

A

Data for mapping from operational environment to data warehouse

31
Q

It includes dimension algorithms, data on granularity, aggregation, summarizing, etc.

A

The algorithms for summarization

32
Q

helps us represent data in multiple dimensions. It is defined by dimensions and facts. The dimensions are the entities with respect to which an enterprise preserves the records.

A

Data Cube

33
Q

contain a subset of organization-wide data that is valuable to specific groups of people in an organization. contains only those data that is specific to a particular group.

A

Data Mart

34
Q

The view over an operational data warehouse. It is easy to build and it requires excess capacity on operational database servers.

A

Virtual Warehouse

35
Q

They get the information from the data warehouses to measure the performance and make critical adjustments in order to win over other business holders in the market.

A

business analyst

36
Q

T OR F

A data warehouse provides us a consistent view of customers and items, hence, it helps us manage customer relationship.

A

TRUE

37
Q

To design an effective and efficient data warehouse, we need to understand and analyze the business needs and construct a business analysis framework.

A

TRUE

38
Q

this view allows the selection of relevant information needed for a data warehouse.

A

The top-down view

39
Q

This view presents the information being captured, stored, and managed by the operational system.

A

The data source view

40
Q

This view includes the fact tables and dimension tables. It represents the information stored inside the data warehouse.

A

The data warehouse view

41
Q

It is the view of the data from the viewpoint of the end-user

A

The business query view

42
Q

tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.

A

Bottom Tier

43
Q

In this tier, we have the OLAP server that can be implemented by ROLAP & MOLAP

A

Middle Tier

44
Q

This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.

A

Top-Tier

45
Q

It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information.

A

Online Analytical Processing Server (OLAP)

46
Q

Types of OLAP Servers

A

Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Specialized SQL Servers

47
Q

servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, this uses relational or extended-relational DBMS.

A

ROLAP/Relational OLAP

48
Q

uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse.

A

MOLAP/Multidimensional OLAP

49
Q

is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP. This servers allows to store the large data volumes of detailed information. The aggregations are stored separately in MOLAP store.

A

Hybrid OLAP

50
Q

provide advanced query language and query processing support for SQL queries over star and snowflake schemas in a read-only environment.

A

Specialized SQL Servers

51
Q

OLAP Operations

A

Roll-up
Drill-down
Slice and dice
Pivot (rotate)

52
Q

is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates.

A

Schema

53
Q

Each dimension is represented with only one-dimension table. This dimension table contains the set of attributes.

A

Star Schema

54
Q

Some dimension tables in this scheme are normalized.The normalization splits up the data into additional tables.

A

Snowflake Schema

55
Q

Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it becomes easy to maintain and the save storage space.

A

TRUE

56
Q

This has has multiple fact tables. It is also known as galaxy schema.

A

Fact Constellation Schema

57
Q

Why Do We Need a Data Mart?

A

To partition data in order to impose access control strategies.

To speed up the queries by reducing the volume of data to be scanned.

To segment data into different hardware platforms.

To structure data in a form suitable for a user access tool.