quiz 1 Flashcards

1
Q

datawarehouse vs. datamart

A
datawarehouse:
application independent
centralized
planned
data-lightly denormalized
multiple subjects
many int/ext sources
flexible 
data oriented
DataMart:
specific application
decentralized
organic
highly denormalized
one central topic
few sources
short life
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the 3 datawarehouse architechtures

A

independant datamart, Dependent Data Mart and Operational Data Store
Logical Data Mart and Real-Time Data Warehouse.
all have some form of ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fact and Dimension Table Keys

A

Use surrogate keys as primary keys

Integer keys, sequential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages of surrogate keys

A
Buffer from operational changes
Integrate multiple source systems
Improve performance
Handle null or unknown conditions
Dimension attribute change tracking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Snowflake Schemas

A

In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

adv of normalization of Dim Tables

A

Advantages of Normalization of DimTables
Saves space by reducing redundancy
Need to update only once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

disadvant of normalization of Dim Tables

A

Disadvantages of normalization of DimTables
Model becomes complex
Numerous joins make query slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Characteristics of Data Warehouse

A

Subject-oriented
Focused on business related subjects and organizational activity
Ex: Customers, employees, products, suppliers
Integrated
Data from different sources are stored in a consistent format
Requires addressing naming conflicts, unit discrepancies
Time-variant
Maintains historical data
Each source may contain data at different time points –day, week, month etc. Used to analyze trends.
Nonvolatile
Users cannot change data after it is entered. It is periodically updated.
Structured for query and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

queries

A

//Selecting total sales amount across all dimensions.
SELECT [Measures].[Sales Amount] ON COLUMNS
FROM [Adventure Works DW2017];
//Slicing the cube to select total sales amount on due date 2014.
SELECT [Measures].[Sales Amount] ON COLUMNS
FROM [Adventure Works DW2017]
WHERE [Due Date].[Calendar Year].&[2014];
//Selecting total sales amount on columns and due dates on rows.
SELECT [Measures].[Sales Amount] ON COLUMNS,
[Due Date].[Calendar Year].[Calendar Year] ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty sales amount on columns and due dates on rows.
SELECT [Measures].[Sales Amount] ON COLUMNS,
NONEMPTY([Due Date].[Calendar Year].[Calendar Year]) ON ROWS
FROM [Adventure Works DW2017];
//Selecting sales amount and Tax amount on columns and due dates 2013 and 2014 on rows.
SELECT
{[Measures].[Sales Amount], [Measures].[Tax Amt]} ON COLUMNS,
{[Due Date].[Calendar Year].&[2013], [Due Date].[Calendar Year].&[2014]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting sales amount and Tax amount on columns and due dates 2013 and 2014 on rows for All-Purpose Bike Stand.
SELECT
{[Measures].[Sales Amount], [Measures].[Tax Amt]} ON COLUMNS,
{[Due Date].[Calendar Year].&[2013], [Due Date].[Calendar Year].&[2014]} ON ROWS
FROM [Adventure Works DW2017]
WHERE ([Dim Product].[Model Name].&[All-Purpose Bike Stand]);

//Selecting nonempty sales amount on rows and due dates on columns.
SELECT NONEMPTY([Due Date].[Calendar Year].[Calendar Year]) ON COLUMNS,
{[Measures].[Sales Amount]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty sales amount on rows and due dates on columns including sum of all years.
SELECT {NONEMPTY([Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
{[Measures].[Sales Amount]} ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty count of model names on rows and due dates on columns including sum of all years.
SELECT
{NONEMPTY([Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
NONEMPTY([Dim Product].[Model Name].[Model Name]) ON ROWS
FROM [Adventure Works DW2017];
//Selecting nonempty count of model names including sum of all products on rows and due dates on columns including sum of all years.
SELECT
{NONEMPTY( [Due Date].[Calendar Year].[Calendar Year]), [Due Date].[Calendar Year]} ON COLUMNS,
{NONEMPTY([Dim Product].[Model Name].[Model Name]),[Dim Product].[Model Name]} ON ROWS
FROM [Adventure Works DW2017]
WHERE [Measures].[Sales Amount];

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Periodic data

A

Periodic data are never physically altered or deleted once they have been added to the store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Database limitations

A
Relational databases consists of tables
Tables are flat
one-dimensional
Cross-tabs can be 2-dimensional
How to represent multi-dimensional tables?
The job of data warehousing/OLAP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dimensional Modeling

A

Star Schemas: Dimensional modeling implemented in relational DBMS (ROLAP)

OLAP (online analytical processing): Dimensional modeling in a multi-dimensional DB environment

OLAP cubes deliver superior query performance because of pre-calculations, indexing strategies, and other optimizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dimension tables

A

Dimension tables often have many columns (attribute)

Each dimension table contains data for one dimension

Dimension table often represent hierarchical relationships
Product roll up into brands and then into categories

Each dimension is defined by a single primary key (PK)

PK serves as the basis of referential integrity with the given fact table to which it is joined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Joining Facts and Dimensions

A

example is suppliers, parts, shipments.
s1-s10 is the suppliers(dim) pk
p1-p10 is parts(dim) pk
combine both s1,p1 to make pk of the fact table which is shipments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fact Table Examples

A

Retail – number of units sold, sales amount
Telecommunications – length of call in minutes, average number of calls
Banking – average monthly balance
Insurance – claims amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Extending Star Schema

A

DW may also store pre-aggregated data
Separate fact table (monthly, weekly..)
Same fact table (level field + nulls)
Fact constellations – multiple fact tables that share dimensions
DB Server Issues – Index structures, parallel processing, complex sql queries etc.

17
Q

OLAP Buzzwords

A
OLAP Buzz words
Cube – A multidimensional structure consisting of “Data Cubes”
Dimension – Sides of a cube
Measure – Facts in a fact table
Aggregation – Projection of the cube
18
Q

what is a cube?

A

The Cube in the Dimensional Data Warehouse is a Summary of Tables That Exist in the Relational Database

19
Q

what is aggregation

A

Is this like the SQL “Group By” Clause?

The Customer Dimension has been Collapsed/Projected/Aggregated onto the other two Dimensions

20
Q

OLAP Operations

A

Cube slicing – come up with 2-D view of data by filtering (fixing) a dimension.
Cube dicing – come up with a small cube (sub-cube) by selecting a subset of all dimensions.
Drill-down – going from summary to more detailed views.
Roll-up – going from detailed views to a summary view.
Pivoting (rotating) – to rotate the cube across a dimension to see various faces.

21
Q

what is a slice?

A

A Slice Limits the Members Over Which the Aggregation is Made
Is this like the SQL “Where” Clause?

22
Q

SLICE

A

Users can slice the data cube to obtain a simple 2-dimensional table. Slice is for the product named shoes. Other views can be developed using ‘drag and drop’ operations.

23
Q

what is a cube?

A

a cube is a summary of tables that exist in a relational database.

24
Q

what is a side of a cube?

A

a dimension

25
Q

what provides the elements of a cube

A

fact table

26
Q

what does an aggregate do to a cube

A

colapses a dimension. IT’s a sql group by clause. it’s a projection.