quiz 1 Flashcards
datawarehouse vs. datamart
datawarehouse: application independent centralized planned data-lightly denormalized multiple subjects many int/ext sources flexible data oriented
DataMart: specific application decentralized organic highly denormalized one central topic few sources short life
what are the 3 datawarehouse architechtures
independant datamart, Dependent Data Mart and Operational Data Store
Logical Data Mart and Real-Time Data Warehouse.
all have some form of ETL
Fact and Dimension Table Keys
Use surrogate keys as primary keys
Integer keys, sequential
Advantages of surrogate keys
Buffer from operational changes Integrate multiple source systems Improve performance Handle null or unknown conditions Dimension attribute change tracking
Snowflake Schemas
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions
adv of normalization of Dim Tables
Advantages of Normalization of DimTables
Saves space by reducing redundancy
Need to update only once
disadvant of normalization of Dim Tables
Disadvantages of normalization of DimTables
Model becomes complex
Numerous joins make query slow
Characteristics of Data Warehouse
Subject-oriented
Focused on business related subjects and organizational activity
Ex: Customers, employees, products, suppliers
Integrated
Data from different sources are stored in a consistent format
Requires addressing naming conflicts, unit discrepancies
Time-variant
Maintains historical data
Each source may contain data at different time points –day, week, month etc. Used to analyze trends.
Nonvolatile
Users cannot change data after it is entered. It is periodically updated.
Structured for query and analysis
queries
//Selecting total sales amount across all dimensions.
SELECT [Measures].[Sales Amount] ON COLUMNS
FROM [Adventure Works DW2017];
//Slicing the cube to select total sales amount on due date 2014.
SELECT [Measures].[Sales Amount] ON COLUMNS
FROM [Adventure Works DW2017]
WHERE [Due Date].[Calendar Year].&[2014];
//Selecting total sales amount on columns and due dates on rows.
SELECT [Measures].[Sales Amount] ON COLUMNS,
[Due Date].[Calendar Year].[Calendar Year] ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty sales amount on columns and due dates on rows.
SELECT [Measures].[Sales Amount] ON COLUMNS,
NONEMPTY([Due Date].[Calendar Year].[Calendar Year]) ON ROWS
FROM [Adventure Works DW2017];
//Selecting sales amount and Tax amount on columns and due dates 2013 and 2014 on rows.
SELECT
{[Measures].[Sales Amount], [Measures].[Tax Amt]} ON COLUMNS,
{[Due Date].[Calendar Year].&[2013], [Due Date].[Calendar Year].&[2014]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting sales amount and Tax amount on columns and due dates 2013 and 2014 on rows for All-Purpose Bike Stand.
SELECT
{[Measures].[Sales Amount], [Measures].[Tax Amt]} ON COLUMNS,
{[Due Date].[Calendar Year].&[2013], [Due Date].[Calendar Year].&[2014]} ON ROWS
FROM [Adventure Works DW2017]
WHERE ([Dim Product].[Model Name].&[All-Purpose Bike Stand]);
//Selecting nonempty sales amount on rows and due dates on columns.
SELECT NONEMPTY([Due Date].[Calendar Year].[Calendar Year]) ON COLUMNS,
{[Measures].[Sales Amount]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty sales amount on rows and due dates on columns including sum of all years.
SELECT {NONEMPTY([Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
{[Measures].[Sales Amount]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty count of model names on rows and due dates on columns including sum of all years.
SELECT
{NONEMPTY([Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
NONEMPTY([Dim Product].[Model Name].[Model Name]) ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty count of model names including sum of all products on rows and due dates on columns including sum of all years.
SELECT
{NONEMPTY( [Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
{NONEMPTY([Dim Product].[Model Name].[Model Name]),[Dim Product].[Model Name]} ON ROWS
FROM [Adventure Works DW2017]
WHERE [Measures].[Sales Amount];
Periodic data
Periodic data are never physically altered or deleted once they have been added to the store
Database limitations
Relational databases consists of tables Tables are flat one-dimensional Cross-tabs can be 2-dimensional How to represent multi-dimensional tables? The job of data warehousing/OLAP
Dimensional Modeling
Star Schemas: Dimensional modeling implemented in relational DBMS (ROLAP)
OLAP (online analytical processing): Dimensional modeling in a multi-dimensional DB environment
OLAP cubes deliver superior query performance because of pre-calculations, indexing strategies, and other optimizations.
Dimension tables
Dimension tables often have many columns (attribute)
Each dimension table contains data for one dimension
Dimension table often represent hierarchical relationships
Product roll up into brands and then into categories
Each dimension is defined by a single primary key (PK)
PK serves as the basis of referential integrity with the given fact table to which it is joined.
Joining Facts and Dimensions
example is suppliers, parts, shipments.
s1-s10 is the suppliers(dim) pk
p1-p10 is parts(dim) pk
combine both s1,p1 to make pk of the fact table which is shipments.
Fact Table Examples
Retail – number of units sold, sales amount
Telecommunications – length of call in minutes, average number of calls
Banking – average monthly balance
Insurance – claims amount
Extending Star Schema
DW may also store pre-aggregated data
Separate fact table (monthly, weekly..)
Same fact table (level field + nulls)
Fact constellations – multiple fact tables that share dimensions
DB Server Issues – Index structures, parallel processing, complex sql queries etc.
OLAP Buzzwords
OLAP Buzz words Cube – A multidimensional structure consisting of “Data Cubes” Dimension – Sides of a cube Measure – Facts in a fact table Aggregation – Projection of the cube
what is a cube?
The Cube in the Dimensional Data Warehouse is a Summary of Tables That Exist in the Relational Database
what is aggregation
Is this like the SQL “Group By” Clause?
The Customer Dimension has been Collapsed/Projected/Aggregated onto the other two Dimensions
OLAP Operations
Cube slicing – come up with 2-D view of data by filtering (fixing) a dimension.
Cube dicing – come up with a small cube (sub-cube) by selecting a subset of all dimensions.
Drill-down – going from summary to more detailed views.
Roll-up – going from detailed views to a summary view.
Pivoting (rotating) – to rotate the cube across a dimension to see various faces.
what is a slice?
A Slice Limits the Members Over Which the Aggregation is Made
Is this like the SQL “Where” Clause?
SLICE
Users can slice the data cube to obtain a simple 2-dimensional table. Slice is for the product named shoes. Other views can be developed using ‘drag and drop’ operations.
what is a cube?
a cube is a summary of tables that exist in a relational database.
what is a side of a cube?
a dimension
what provides the elements of a cube
fact table
what does an aggregate do to a cube
colapses a dimension. IT’s a sql group by clause. it’s a projection.