Data Warehousing and Business Intelligence Management (Including Big Data) Flashcards
The definition of _______________________: is the planning, implementation, and control processes to provide decision support data and support knowledge workers engaged in reporting, query, and analysis.
Data Warehousing and Business Intelligence
What are the goals of Data Warehousing and Business Intelligence?
- To build and maintain the technical environment and technical and business processes needed to deliver
integrated data in support of operational functions, compliance requirements, and business intelligence
activities. - To support and enable effective business analysis and decision making by knowledge workers.
What are two models and approaches to data warehousing
Inmon and Kimball
One of the component parts of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse is organized based on major business entities, rather than focusing on a functional or application.
subject oriented
A component part of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse unified and cohesive. The same key structures, encoding and decoding of structures, data definitions, naming conventions are applied consistently throughout the warehouse
Integrated
A component part of Inmon’s definition of a data warehouse includes: ___________________ where the data warehouse stores data as it exists in a set point in time. Records in the DW are like snapshots. Each one reflects the state of the data at a moment of time. This means that querying data based on a specific time period will always produce the same result, regardless of when the query
is submitted.
Time variant
In Inmon’s DW, __________________ , describes records are not normally updated as they are in operational systems. Instead, new data is appended to existing data. A set of records may represent different states of the same transaction.
non volatiles (records)
When warehouses were first established, cost and space considerations drove the need to ____________ data. This data can be persistent (stored in a table) or non-persistent (rendered in a view) in contemporary DW environments. The deciding factor in whether to persist data is usually performance.
Summarize
The focus of operational systems is current data. Warehouses contain __________ data as well. Often they house vast amounts of it.
Historical
in the context of the Corporate Information Factory (CIF):
__________ perform operational processes. Pulls from these systems bring data into the data warehouse and the operational data stores (ODS) where it can be analyzed.
Applications
in the context of the Corporate Information Factory (CIF): ________ is a database that stands between the operational source databases and the target databases. It is where the extract, transform, and load effort takes place. It is not used by end users. Most data in this area is transient, although typically there is some relatively small amount of persistent data.
Staging Area
in the context of the Corporate Information Factory (CIF):
In the _________ layer, data from disparate sources is transformed
so that it can be integrated into the standard corporate representation / model in the DW and ODS.
integration
An ______________ is integrated database of operational data. It may be sourced directly from applications or from other databases. These databases generally contain current or near term data
(30-90 days), while a DW contains historical data as well (often several years of data). Data in this database is volatile, while warehouse data is stable. Not all organizations use these. They evolved as to meet the need for low latency data. These sometimes serve as the primary source for a data warehouse; it may
also be used to audit a data warehouse.
ODS Operational Data Store
_____________ provide data prepared for analysis. This data is often a sub-set of warehouse data designed to support particular kinds of analysis or a specific group of data consumers. For example, these can aggregate data to support faster analysis. Dimensional modeling (using denormalization techniques) is often used to design user-oriented data marts.
Data marts
An ________ is a data mart focused on tactical decision support. It is
sourced directly from an ODS, rather than from a DW. It shares characteristics of the ODS: it contains current or near-term data. Its contents are volatile.
OpDM Operational Data Mart
The _________ provides a single integration point for corporate data to support management decision-making, and strategic analysis and planning. The data flows into this from the application systems and ODS, and flows out to the data marts, usually in one direction only. Data that needs correction is rejected, corrected at its source, and ideally re-fed through the system.
DW Data Warehouse
_____________________ is the other primary pattern for DW development. It defines a data warehouse simply as “a copy of transaction data specifically structured for query and analysis”
Kimball’s Dimensional Data Warehouse
___________ are comprised facts, which contain quantitative data
about business processes (e.g., sales numbers), and dimensions, which store descriptive attributes related to fact data and allow data consumers to answer questions about the facts (e.g., how many units of product X were sold this quarter?)
Dimensional Models
Also known as star schemas
The ______________ shows the intersection of business processes that generate fact data and data subject areas that represent dimensions.
DW bus matrix
The enterprise DW bus matrix can be used to represent the long-term data content requirements for the DW/BI system, independent of technology. This tool enables an organization to scope manageable development efforts. Each implementation builds an increment of the overall architecture.
Is the below part of the Kimball or Inmon Data warehouse?
—Operational source systems: operational / transactional applications within the enterprise create the data that is integrated into the Operational Data Store (ODS) and then into Data Warehouse
—Data staging area: clean, combine, standardize, conform dimensions, sort, and sequence
—Data presentation area: datamarts linked by DW bus of conformed dimensions
—Data access tools that focus on end users’ data requirements
Components of Kimball Data Warehouse
Kimball’s Data Warehouse Bus represents _______ or ______ dimensions unifying multiple datamarts
shared or conformed
Name Data Storage areas
—Staging area: intermediate data store between original data source and centralized data repository; data is staged before transformation, integration, and prep for loading into warehouse —Reference and Master Data conformed dimensions
—Central Warehouse: maintains historical atomic data as well as latest instance of batch run. Considerations include:
○ Relationship between business key and surrogate keys for performance
○ Creation of indices and foreign keys to support dimensions
○ Change data capture (CDC) techniques used to detect, maintain, and store history
—Operational Data Store (ODS): lower latency for operational use; single time window
—Data mart: presents a departmental or functional subset of data warehouse
—Cubes: support Online Analytical Processing (OLAP); can be relational, multi-dimensional, or hybrid
Kimball or Inmon: single data warehouse layer with atomic level data
Inmon
Kimball or Inmon: departmental data marts informing conformed dimensions and facts
Kimball