Mid Term Flashcards
Data Mart Advantages
Faster implementation
Earlier return on investment
Less risk of failure
Gives the project team time to learn and grow.
Data Warehouse Advantages
A truly corporate effort, an enterprise view of data
Inherently architected, not a union of disparate data marts
Single, central storage of data about the content
Centralized rules and control
May see quick results if implemented with iterations
Data Warehouse Disadvantages
Takes longer to build even with an iterative method
High exposure to risk of failure
Needs high level of cross-functional skills,
High outlay without proof of concept
Data Mart/Bottom-Up Disadvantages
Each data mart has its own narrow view of data
Permeates redundant data in every data mart
Perpetuates inconsistent and irreconcilable data
Proliferates unmanageable interfaces (Not required)
Phases of data warehouse development lifecycle.
Planning, Analysis, Design, Implementation
What is the most costly lifecycle activity
ETL - You spend at least 75% of your time on that.
What does ETL stand for?
Extract, Transform, Load
Which Data Warehouse Lifecycle phase is analogous to logical design?
Dimensional modeling
Which two data warehouse lifecycle phases have parallel levels of activities?
Design and implementation, which are the last two phases.
The data warehouse architecture that does not rely on the creation of a new data structure is the what?
Federated.
What is a data mart?
A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.
What is a data warehouse?
Central repositories of integrated data from one or more disparate sources.
What activities are required in the analysis phase?
Analytic themes, which breaks requirements into categories;
bus matrix, which is the business processes mapped to the data needed;
Information packet development, maps which dimensions are needed for each fact table.
What are the different types of data architectures, and what do they mean?
Federated: a collection of independent database systems are united into a loosely coupled federation in order to share and exchange information.
Hub and Spoke: The communication is not made between pairs of applications but between each application (spoke) and the central hub.
Data Mart Bus: A bus architecture is composed of a set of tightly integrated data marts that get their power from conformed dimensions and fact tables. A bus architecture uses top-down planning and a grid of business functions and dimensions to deliver a set of tightly integrated data marts.
Centralized Enterprise-Wide: A centralized data architecture means the data from each domain/subject (i.e. payroll, operations, finance) is copied to one location (i.e. a data lake under one storage account), and that the data from the multiple domains/subjects are combined to create centralized data models and unified views.
Independent Data Marts
What are junk dimensions?
Junk dimensions are used for dimensions that only involve a few responses, and which don’t fit squarely in any other dimensions.