ADBS - Data Warehousing Flashcards
What is data warehousing?
Data warehousing is a set of hardware and software components used to better analyse data to make better business decisions.
What are the benefits of data warehousing?
- Understand business trends and make better forecasting decisions
- Bring better products to market in a more timely manner
- Analyse daily sales information and make quick decisions that can significantly affect your company’s performance
ROI - return on investment can range between 40% and 600%
Competitive Advantage - discover trends, customers and demands
Increase in productivity from decision makers - contains integrated database of consistent, subject oriented, historical data
Transforms data into meaningful information
Name 3 applications of data warehousing
- Sales and marketing analysis across all industries
- Inventory turn and product tracking in manufacturing
- Category management, vendor analysis, and marketing program effectiveness analysis in retail
- Profitability analysis or risk assessment in banking
- Claims analysis or fraud detection in insurance
What is a data warehouse?
A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data in support of management’s decisions. (Inmon05)
What is subject oriented?
data stored in a warehouse is about the major subjects of an enterprise (such as customers, products, and sales)
What is integrated?
source data for a warehouse is from different enterprise-wide applications systems. The data is often inconsistent, for example, in different formats. The source data must be integrated into a consistent format in order to present a unified view of the data to users.
What is non-volatile?
New data is always added as a supplement to the database, rather than a replacement.
The database continually absorbs this new data, incrementally integrating it with the previous data.
What is time-variant?
data in a warehouse is only accurate and valid at some point in time or over some time interval.
What is Operational Data?
data used to run business. This data is normally stored, retrieved, and updated by an Online Transactional Processing (OLTP) system. An OLTP system may be, for example, a reservations system, an accounting application, or an order entry application.
What are the characteristics of operational data?
Updated often and through online transactions
Non-historical data (not more than three to six months old)
Optimised for transactional processing
Highly normalised in the relational database for easy update, maintenance, and integrity
What is informational data?
Informational data is typically stored in a format that makes analysis much easier. Analysis can be in the form of decision support (queries), report generation, executive information systems, and more in-depth statistical analysis.
What are the characteristics of informational data?
Summarised operational data
De-normalised and replicated data
Infrequently updated from the operational systems
Optimised for decision support applications
Possibly “read only” (no updates allowed)
Stored on separate systems to lessen impact on operational systems
What are differences between operational data and informational data
Operational Data
current, detailed, dynamic, repetitive processing, transaction driven, application oriented, day to day, serves clerical/operational users
Informational
historical data, summarised, static data, ad hoc unstructured heuristic, analysis driven, subject oriented, managerial, strategic
Name 10 problems of data warehousing
Underestimate resources for data loading Hidden problems with source systems Required data not captured Increased end user demands data homogenisation high demand for resources - space data ownership - sensitivity is lost high maintenance time to implement difficulty in integration
Describe the architecture of a data warehouse
Operational data, ODS, Load Manager, Warehouse Manager, Query manager,
In the architecture, describe the operational data
data from multiple sources sych as hierarchical or network databases, VSAM, RMS, Relational DBMS, private data, data from workstations or servers
In the architecture, describe the operational data store
this is a repository of operational and integrated operational data used for analysis, already extracted and cleaned
In the architecture, describe the Load Manager
responsible for extracting and loading data into the warehouse
In the architecture, describe the Warehouse manager
Management of data in the warehouse analyzes data for consistency transforms and merges data create index and views aggregates data back up and archiving
In the architecture, describe the Query manager
management of user queries and directs to appropriate data source
consists of end user access tools, monitoring tools, OLAP, data mining
What is meta data?
data about data
map data to a common view of data
automate the production of summary tables
direct to most appropriate data source
helps to define data source, consists of technical and business data.
What is a data mart?
a subset of a data warehouse used to support the requirements of a particular department
Name three differences between a Data Mart and a Data Warehouse
focuses on requirements of users within department
does not contain detailed operational data
contains substantially less data making it easier to navigate
Name 5 reasons for implementing a data mart
reduced cost and time
users can access the data they need
less management of data - reduction in volume of data to be accessed
Less data - cleansing, loading, transformation and integration is easier
provide appropriately structured data in conjunction with end-user access tools
all users have the same collective view of the data in a particular department
easily targeted users for department as opposed to corporation