T3 – Data Warehousing Flashcards
What is the definition of the data warehouse?
“The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
From the Characteristics of Data Warehousing: what is subject oriented?
data are organized by detailed subject, containing information relevant for decision support.
This ability to define a data warehouse by subject matter makes the data warehouse subject oriented.
e.g. sales, products, or customers. For example, to learn more about your company’s sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like “Who was our best customer for this item last year?”
From the Characteristics of Data Warehousing: what is integrated?
data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
From the Characteristics of Data Warehousing: what is time variant?
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse’s focus on change over time is what is meant by the term time variant.
From the Characteristics of Data Warehousing: what is nonvolatile?
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred
What in short are the 4 defining characteristics of the DWH?
4 Defining Characteristics:
- integrated: data will be connected
- subject-oriented: measured things depend on subject we are treating
- non-volatile: once data is loaded in data warehouse, it is not updated (tasks will be performed on top of data)
- relevant to some moment in time: uploading data over time, creating history
What are three main types of data warehouses?
- Data marts
- Operational Data Stores (ODS)
- Enterprise Data Warehouses (EDW)
What is a data mart?
- Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department.
- A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g. marketing, operations).
- Two possibilities:
- Dependent data mart
- Independent data mart
What is a dependent data mart?
- = subset that is created directly from the DWH
- Has the advantages of using a consistent data model & provides quality data
- Dependent data marts support the concept of a single enterprise-wide data model, but the DWH must be constructed first
- A dependent data mart ensures that the end user is viewing the same version of the data that are accessed by all other DWH users
What is an independent data mart?
- Lower-cost, scaled-down version of a DWH
- A small DM designed for a strategic busines unit or a department, but is source is not an EDW
- potential disadvantage: each department has their own and e.g. different departments might end up with different numbers -> as e.g. marketing calculates a specific number differently then finance (there is no universal standard)
What are OPERATIONAL DATA STORES (ODS)?
- A type of database often used as an interim area for a DWH
- Unlike the static content of a DWH, the contents of an ODS are updated through the course of business operations.
- An ODS is used for short-term decisions involving mission-critical applications rather then for the medium and long-term decisions associated with a EDW
- An ODS consolidates data from multiple source systems and provides a near-real time, integrated view of volatile, current data.
What are OPER MARTS?
- An operational data mart.
- An oper mart is a small-scale data mart typically used by a single department or functional area in an organization.
- Oper marts are created when operational data need to be analyzed multidimensionally.
- The data for an oper mart comes from an ODS
What is an ENTERPRISE DATA WAREHOUSE
(EDW)?
- An enterprise data warehouse (EDW) is a large-scale DWH that is used across the enterprise for decision support.
- The large-scale nature provides integration of data from many sources into a standard format for effective BI and decision support applications.
- EDW are used to provide data for many types of DSS, including CRM, SCM, BPM, BAZM, PLM, KMS, etc.
What is Metadata?
- Data about data.
- In a DWH, metadata describe the contents of a DWH and the manner of its acquisition and use
- We can classify metadata in: technical vs business metadata
- We can also make a distinction between:
- Syntactic metadata – (i.e. data describing the syntax of the data)
- Structural metadata – (i.e. data describing the structure of the data)
- Semantic metadata – (i.e. data describing the meaning of the data in a specific domain)
- Metadata design, creation and usage can involve ethical questions
What are the Two Data Warehousing Strategies?
- Enterprise-wide warehouse, top down, the Inmon methodology
- Data mart, bottom up, the Kimball methodology
What is the Data Mart Strategy (bottom up, Kimball)?
- The most common approach
- Begins with a single mart and architected marts are added over time for more subject areas
- Relatively inexpensive and easy to implement
- Can be used as a proof of concept for data warehousing
- Can perpetuate the “silos of information” problem
- Can postpone difficult decisions and activities
- Requires an overall integration plan
What is the Enterprise-Wide Strategy (top down, Inmon)?
- A comprehensive warehouse is built initially
- An initial dependent data mart is built from subset of the data in the warehouse
- Additional data marts are built from subsets of the warehouse data
- Complex project: expensive, time consuming & prone to failure
- When successful: results in an integrated, scalable warehouse
Note: if organization has ERP and that is the only system available, it is easy to implement the enterprise-wide strategy
What is the difference between OLTP and DWH in terms of indexes?
- OLTP: few
- DWH: many
What are Indexes?
Indexes are data structures which hold field values from the indexed column(s) and pointers to the related record(s). This data structure is then sorted and binary searches are performed to quickly find the record.
What is the difference between OLTP and DWH in terms of joins?
- OLTP: many
- DWH: some