Data warehousing Flashcards
constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making.
data warehouse
Who first coined the term data warehouse in 1990?
Bill Inmon
According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.
TRUE
What is OLAP?
Online Analytical Processing
4 features of Data Warehouse
Subject Oriented
Integrated
Time Variant
Non-volatile
A data warehouse is subject oriented because it provides information around a subject rather than the organization’s ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.
Subject Oriented
A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.
Integrated
The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view.
Time Variant
This means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.
Non-volatile
T or F
A data warehouse does not require transaction processing, recovery, and concurrency controls, because it is physically stored and separate from the operational database.
TRUE
T OR F
data warehouse helps business executives to organize, analyze, and use their data for decision making. A data warehouse serves as a sole part of a plan-execute-assess “closed-loop” feedback system for the enterprise management.
TRUE
Data Warehouse Applications
Financial services
Banking services
Consumer goods
Retail sectors
Controlled manufacturing
3 types of Data Warehouse
Information Processing
Analytical Processing
Data Mining
A data warehouse allows to process the data stored in it. The data can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.
Information Processing
A data warehouse supports analytical processing of the information stored in it. The data can be analyzed by means of basic OLAP operations, including slice-and-dice, drill down, drill up, and pivoting.
Analytical Processing
supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. These mining results can be presented using the visualization tools.
Data Mining
the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. involves data cleaning, data integration, and data consolidations.
Data Warehousing
Functions of Data Warehouse Tools and Utilities
Data Extraction
Data Cleaning
Data Transformation
Data Loading
Refreshing
Involves gathering data from multiple heterogeneous sources.
Data Extraction
Involves finding and correcting the errors in data.
Data Cleaning
Involves converting the data from legacy format to warehouse format.
Data Transformation
Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.
Data Loading
Involves updating from data sources to warehouse.
Refreshing
These are important steps in improving the quality of data and data mining results.
Data cleaning and data transformation
simply defined as data about data. The data that are used to represent other data
metadata
a road-map to data warehouse.
in data warehouse defines the warehouse objects.
acts as a directory. This directory helps the decision support system to locate the contents of a data warehouse.
Metadata
an integral part of a data warehouse system.
Metadata repository
It contains the data ownership information, business definition, and changing policies.
Business metadata
It includes currency of data and data lineage. Currency of data refers to the data being active, archived, or purged. Lineage of data means history of data migrated and transformation applied on it.
Operational metadata
It metadata includes source databases and their contents, data extraction, data partition, cleaning, transformation rules, data refresh and purging rules.
Data for mapping from operational environment to data warehouse
It includes dimension algorithms, data on granularity, aggregation, summarizing, etc.
The algorithms for summarization
helps us represent data in multiple dimensions. It is defined by dimensions and facts. The dimensions are the entities with respect to which an enterprise preserves the records.
Data Cube
contain a subset of organization-wide data that is valuable to specific groups of people in an organization. contains only those data that is specific to a particular group.
Data Mart
The view over an operational data warehouse. It is easy to build and it requires excess capacity on operational database servers.
Virtual Warehouse
They get the information from the data warehouses to measure the performance and make critical adjustments in order to win over other business holders in the market.
business analyst
T OR F
A data warehouse provides us a consistent view of customers and items, hence, it helps us manage customer relationship.
TRUE
To design an effective and efficient data warehouse, we need to understand and analyze the business needs and construct a business analysis framework.
TRUE
this view allows the selection of relevant information needed for a data warehouse.
The top-down view
This view presents the information being captured, stored, and managed by the operational system.
The data source view
This view includes the fact tables and dimension tables. It represents the information stored inside the data warehouse.
The data warehouse view
It is the view of the data from the viewpoint of the end-user
The business query view
tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Bottom Tier
In this tier, we have the OLAP server that can be implemented by ROLAP & MOLAP
Middle Tier
This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.
Top-Tier
It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information.
Online Analytical Processing Server (OLAP)
Types of OLAP Servers
Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Specialized SQL Servers
servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data, this uses relational or extended-relational DBMS.
ROLAP/Relational OLAP
uses array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse.
MOLAP/Multidimensional OLAP
is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and faster computation of MOLAP. This servers allows to store the large data volumes of detailed information. The aggregations are stored separately in MOLAP store.
Hybrid OLAP
provide advanced query language and query processing support for SQL queries over star and snowflake schemas in a read-only environment.
Specialized SQL Servers
OLAP Operations
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates.
Schema
Each dimension is represented with only one-dimension table. This dimension table contains the set of attributes.
Star Schema
Some dimension tables in this scheme are normalized.The normalization splits up the data into additional tables.
Snowflake Schema
Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it becomes easy to maintain and the save storage space.
TRUE
This has has multiple fact tables. It is also known as galaxy schema.
Fact Constellation Schema
Why Do We Need a Data Mart?
To partition data in order to impose access control strategies.
To speed up the queries by reducing the volume of data to be scanned.
To segment data into different hardware platforms.
To structure data in a form suitable for a user access tool.