BI and Data Warehousing Flashcards
The concept of a data warehouse is seen as a way to provide the following three things:
- Reduce data redundancy
- Improve the consistency of information
- Enable an enterprise to use its data to make better decisions
Planning, implementation, and control processes to provide decision support data and support knowledge workers engaged in reporting, query, and analysis is known as:
Data warehousing and business intelligence
The primary driver for data warehousing is to support these three things:
- Operational functions
- Compliance requirements
- BI activities
The implementation of a Data Warehouse should follow these eight guiding principles:
- Focus on business goals
- Start with the end in mind
- Think and design globally; act and build locally
- Summarize and optimize last, not first
- Promote transparency and self-service
- Build metadata with the warehouse
- Collaborate
- One size does not fit all
A data warehouse is a combination of two primary components:
- An integrated decision support database
- The related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources
Unstructured data refers to data that is not predefined through a ______?
data model
Bill Inmon approach - Inmon defines a data warehouse as a “_____ ______, integrated, time variant and non-volatile collection of data in support of managements decision making process”
subject oriented
Kimball defines a warehouse as a “copy of ______ data specifically structured for query and analysis”
transaction
Kimball’s approach calls for what type of model?
Dimensional
In a data warehouse, data that needs correction is rejected, corrected at its source, and ideally ____?
re-fed through the system
Often the content for the data dictionary comes directly from the ______ model.
logical
Documented data lineage serves these three purposes:
- Investigation of the root causes of data issues
- Impact analysis for system changes or data issues
- Ability to determine the reliability of data, based on its origin