Data warehousing Flashcards
what is data warehousing
collection of data that helps analysts to make informed decisions in an organisation, allow us to analyse data in a multi-dimensional space, results in data mining and data generalisation
what kind of data do data warehouses store
very large store of historical data
features of data warehouses
subject oriented
integrated
time - variant
non - volatile
what does subject oriented mean
provides information around a subject rather than ongoing operations. focuses on modelling and analysis of data. examples of this are product, customers, suppliers, sales, revenue etc.
what does integrated mean
constructed by integrating data from many different sources such as relation databases or flat files
time - variant
data collected is identified with a particular time period e.g. in the last 12 months
non - volatile
data is not erased when new data is added
what is different in a Data warehouse over a database (only data warehouse points)
Queries are often complex and present a general form of data.
it is a database but there is no frequent updating
OLAP query only needs read-only access to stored data
Doesn’t require transaction processing, recovery and concurrency controls as its stored separately from the operational database
what is different in an operational database over a database (only operational database points)
Built for well-known tasks and work such as searching particular records and indexing.
Support concurrent processing of multiple transactions (OLTP)
Concurrency control and recovery mechanisms are needed.
Allows to read and modify operations
maintains current data - goes through frequent changes on a daily basis
Types of data warehouse
Information processing
Analytical processing
Data mining
What is information processing
allows processing of data stored in it
can be processed by querying, basic statistical analysis, reporting using cross tabs, tables, charts or graphs
What is Analytical processing
supports analytical processing of the data within it by means of OLAP operations
these are: slice and dice, drill down, drill up and pivot
What is data mining
support knowledge discovery by finding hidden patterns and associates.
Constructing analytical models, performing classification and prediction.
Results can be presented using visualisation tools
What is Enterprise Data Warehouse(EDW)
Services entire enterprise
What might an EDW environment have
An EDW(enterprise data warehouse)
an operational datastore
physical and virtual data marts
what are data marts
they are used by individual departments or groups
use a dimensional data model to build
what is an operational data store
Subject oriented system, optimised for looking up 1 or 2 records at a time for decision making.
allows access to more current data more quickly
a hybrid form of data warehouse containing integrated information
Is a data warehouse OLAP or OLTP
OLAP
Is a operational database OLAP or OLTP
OLTP
True or false? Historical processing of information is done by a Data warehouse
True, operational databases do day to day processing
does a data warehouse or operational database analyse the business
A data warehouse analyses the business and operational databases run the business
which would be larger in size, a data warehouse or an operational database
a datawarehouse(100GB - 100TB)
what does the top tier include
Data mining, reporting tools for analysis and business intelligence.
Query tools, reporting tools, analysis or data mining tools
what does the middle tier include
OLAP server
used to provide information to business analysts and managers
interacts with the bottom tier and passes on insights to the top tier tools
what does the bottom tier include
database server extracts data from different sources using a gateway
data sources fed into the bottom tier can include operational databases and other types of front end data
functions of data warehouse tools
data extraction data cleaning data transformation data losing refreshing
what is refreshing
updating data sources to the warehouse
data losing
sorting, summarising, consolidating, checking the integrity and building indices and patterns
data transformation
convert the data to warehouse format
data cleaning
finding and correcting errors in data
data extraction
gather data from many different sources
metadata
data about data which defines the data warehouse
used for building, maintaining and managing the data warehouse
Fact tables
contain measurements of business(sales, purchase orders, shipment)
can be quite large(50 billion records, 1-5 terabytes)
dimension tables
store the descriptions of the dimensions of the business (product, customer, vendor, store)
smaller than fact tables
what are data warehouses used for
analysis and separate to the day-to-day operational database