Mid Term Flashcards
Data Mart Advantages
Faster implementation
Earlier return on investment
Less risk of failure
Gives the project team time to learn and grow.
Data Warehouse Advantages
A truly corporate effort, an enterprise view of data
Inherently architected, not a union of disparate data marts
Single, central storage of data about the content
Centralized rules and control
May see quick results if implemented with iterations
Data Warehouse Disadvantages
Takes longer to build even with an iterative method
High exposure to risk of failure
Needs high level of cross-functional skills,
High outlay without proof of concept
Data Mart/Bottom-Up Disadvantages
Each data mart has its own narrow view of data
Permeates redundant data in every data mart
Perpetuates inconsistent and irreconcilable data
Proliferates unmanageable interfaces (Not required)
Phases of data warehouse development lifecycle.
Planning, Analysis, Design, Implementation
What is the most costly lifecycle activity
ETL - You spend at least 75% of your time on that.
What does ETL stand for?
Extract, Transform, Load
Which Data Warehouse Lifecycle phase is analogous to logical design?
Dimensional modeling
Which two data warehouse lifecycle phases have parallel levels of activities?
Design and implementation, which are the last two phases.
The data warehouse architecture that does not rely on the creation of a new data structure is the what?
Federated.
What is a data mart?
A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.
What is a data warehouse?
Central repositories of integrated data from one or more disparate sources.
What activities are required in the analysis phase?
Analytic themes, which breaks requirements into categories;
bus matrix, which is the business processes mapped to the data needed;
Information packet development, maps which dimensions are needed for each fact table.
What are the different types of data architectures, and what do they mean?
Federated: a collection of independent database systems are united into a loosely coupled federation in order to share and exchange information.
Hub and Spoke: The communication is not made between pairs of applications but between each application (spoke) and the central hub.
Data Mart Bus: A bus architecture is composed of a set of tightly integrated data marts that get their power from conformed dimensions and fact tables. A bus architecture uses top-down planning and a grid of business functions and dimensions to deliver a set of tightly integrated data marts.
Centralized Enterprise-Wide: A centralized data architecture means the data from each domain/subject (i.e. payroll, operations, finance) is copied to one location (i.e. a data lake under one storage account), and that the data from the multiple domains/subjects are combined to create centralized data models and unified views.
Independent Data Marts
What are junk dimensions?
Junk dimensions are used for dimensions that only involve a few responses, and which don’t fit squarely in any other dimensions.
What is a degenerate dimension?
When dimensions are stored in a fact table because there are so many.
What are different options for slowly changing dimensions?
Zero option: do nothing, which is easiest, but you have inaccurate data.
One option: overwrite it.
Two: you retain the former value, but you create a new row. Problematic because you can’t duplicate a surrogate key. So instead, you have to identify the new information via a business key.
Three: instead of creating a new row, you create an additional column for “current” data.
What are the basic dimensional model schemas?
Star schema and snowflake schema.
What is Business Intelligence?
Business intelligence encompasses both the applications and methods used for the gathering, cleansing and storing of data, as well as the transformation of that data into usable knowledge capable of informing key business decisions and strategies.
What is integrated data?
Integrated data refers to data that is pulled from many disparate data sources and organized to depict a useful, unified view of the business or parts of the business. The practice of data integration refers to the act of pulling data together to construct a unified view of the business data, and it can be done in multiple different ways.
Who was Ralph Kimball?
Ralph Kimball is a top authority within the study of data warehousing, and a proponent of the bottom-up approach to data warehousing described above. He published the book “The Data Warehousing Toolkit” in 1996.
What occurs in the Data Warehousing planning phase?
Define Initial project scope, Project Goals/Objectives, Project Success
Organize Team, Team roles, Team members
Organize Project, Project cost, Project schedule
What are the requirements of Data Warehousing analysis?
Interview Summaries - Prose summarizing interviews
Analytics - Analysis requirements grouped into “categories”
DW Bus Matrix - Business processes mapped to data needed (dimensions used, eg time, student)
DM Information Package - Prioritized processes
What is a degenerate dimension?
An attribute (dimension) stored in fact table
Attribute does NOT link to a dimension table
Typically a high-cardinality attribute
Typically groups records
Examples: PO#, Order#, Shipment#, …
Often used for drill-downs and/or data mining (e.g. Market Basket Analysis)