Class 2 - Chapter 3 Flashcards

1
Q

Characteristics of Data Warehouse

A

Subject-oriented
Focused on business related subjects and organizational activity
Ex: Customers, employees, products, suppliers
Integrated
Data from different sources are stored in a consistent format
Requires addressing naming conflicts, unit discrepancies
Time-variant
Maintains historical data
Each source may contain data at different time points –day, week, month etc. Used to analyze trends.
Nonvolatile
Users cannot change data after it is entered. It is periodically updated.
Structured for query and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 different data Warehouse Architectures

A

independant Data Mart- each data mart has own
ETL process. not centralized storage location. stored in different datamarts, users have access to all data

Dependenat Data Mart and Operational Data Store
ETL process where data is parititoned, same format, standardization because loaded from same systems.

Logical Dat Mart and Real-Time Data Warehouse. real time update of the data. you get data from sources and save in data warehouse, different sections for different data mart. It’s not actual data but a view. more of a dashboard thing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

OLAP

A

multidimensional view of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is aggregation?

A

is like the SQL “Group BY’ summing up a 2 dimension view. so if you had 3 dimensions, you aggregate 1 dimension to make 2 dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

slice

A

fix one attribute of one dimension. filter male. fix one and becomes 2 dimensional view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

dice

A

take subset of the data. you have cube that you slice and create a subcube.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

drill down

A

data has heirarchies, you have say date dimension. you can drill down to week.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

rollup

A

you aggregate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

foru step dimensional modeling process.

Step 1?

A

select business process. talk to users. sequence of tasks to create outputs from input. production process, procurement process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

foru step dimensional modeling process.

Step 12?

A

Declare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Business process?

A

Business Process
What is a business process?
Sequence of tasks or activities that take a set of inputs and convert them into desired output.
Procurement, Fulfillment, Production, etc.
Common characteristics
Action verbs
Supported by IT systems
Input/Output/Metrics/KPI
Examples: Retail -> sales, Healthcare -> treatment
. Interview the actors. operations person, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to select Business Process?

A

QUICK WINS
Activities that give the best return based on the effort. They are valuable and fundamental to your success.
MAJOR PROJECTS
Activities what provide long term returns but may be more complex to execute.
FILL-INS
Unimportant activities including everyday tasks.
TIME WASTERS
These activities are time-consuming and require resources that are better spent on other things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Step 2: Declare the Grain

A

Grain: Level of detail associated with fact table measurements
What level of granularity should be used?
“How do you describe each row in a fact table?”

Note: Include lowest level (atomic) of detail as it can be constrained and rolled up in every possible way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Step 2: Declare the Grain (Contd.)

A

Examples
One row per scan of individual product on customer sales transaction
One row per product under promotion per store each day.
One row per bank account each month.
One row per individual boarding pass scanned at an airport gate
Insurance?
Transportation? Ride Share Service?
Education?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

fact table

A

has all the measures, the grain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Step 3: Identify Dimensions

A

Context of measurements
Who, what, why, where, when, how
Data types – fact vs dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Natural Key

A

– System generated operational identifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Step 4: Identify Facts

A

What is the process measuring?
Key performance metrics
Typical facts are numeric additive figures
Business requirements vs source data
Examples? Quantity sold, current balance, unit price.

Note: facts in a design must be true to the grain defines in Step 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Fully Additive facts:

A

Measures can be summed across any dimensions Examples?

Sales, Cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Semi Additive Facts

A

Semi-Additive: Measures can be summed across some dimensions, but not all. Examples?
Balance amount across time dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

grain

A

level of detail you want to capture, should be same across all rows in fact table.

22
Q

Non Additive Facts

A

Non-Additive: Measures cannot be summed across any dimension. Example?
Ratios, unit price, salary,

23
Q

Dimensional table

A

doesnt usually have facts, more like product description, location, when, where purchased. connected to fact table using pk/fk. fact table has fk of other tables.

24
Q

Wall mart example:

A
What are the data captured by Walmart retail stores?
Point of Sale data: Sales Transactions
Products Sold
Promotions
Customer
Goods delivery data:
Vendor
Products
Time of delivery
Product return data:
Products returned
Damage/exchange
25
Q

Retail: Step 1 Select Business Process

A

What are the business requirements?
Logistics of ordering, stocking, and selling products while maximizing profit.

Start with most critical process as well as the most feasible.
Process: POS Retail Sales Transactions Process
Which products are selling?
in which store?
on which days?
under which promotion?
These will be fore dimension tables

26
Q

Retail: Step 2 Declare Grain

A

“What level of data detail should be made available in a dimensional model?”
Use most detailed atomic information provides maximum analytical flexibility because it can be rolled up in every single way possible.
In many situations users are not given access to atomic data which reduces its capability.

Most granular data in retail -> POS transactions at product level.

27
Q

Retail: Step 3 Identify Dimensions.

Typically what are the dimensions for a retail store?

A
Typically what are the dimensions for a retail store?
Product
Store
Date/Time
Payment Method
Promotion
Cashier
28
Q

Retail: Step 4 Identify Facts

A
“Facts must be true to grain”
Quantity
Regular unit price
Discount unit price
Net unit price
Extended discount price
Extended Sales dollar amount
Extended cost dollar amount
Extended gross profit
29
Q

what is retail dimension table

A

dimensions: date, payment method, product, promotion, cashier . in star in center is Retail Sales Facts

30
Q

Modeling Time Dimension

A

Is date dimension needed?
Almost all businesses capture time series of performance metrics.
It is frequently used for partitioning data.
Unlike other dimensions you can build the date dimension in advance.
There are many date attributes not supported by the SQL date function such as week numbers, fiscal periods, seasons, holidays, and weekends.

31
Q

Is date dimension needed?

A

Yes, mostly. Date dimension that can be created before you create other tables.Its standardized on the different databases. You need to standardize date format. Traverse over date dimensin. can capture time series of perfrmance metrics. you can partition the data.

32
Q

what is Date Dimension

A
What should date dimension capture?
Day, Month, Year
Weekday, Weekend
Day of Week
Day of Month
Calendar vs Fiscal

Date vs. Time of Day dimensions
Date -> 20yrs -> 7300 approx.
Time of Day -> 20 x 365 x 24 x 60 = 10,512,000

33
Q

what is Product Dimension

A

Product dimension contains every single SKU (Stock Keeping Unit)
Hierarchy: Product

34
Q

Numeric Values as Attributes or Facts

Should Standard List Price be in Product dimension or in the fact table?

A

Standard List Price change infrequently.
Standard List Price is non-additive, however, if multiplied by quantity, it gives sales amount which is additive.

Note:
If the numeric value is used primarily for calculation purpose, it likely belongs to the fact table.
If the stable numeric value is used predominantly for filtering and grouping, it should be treated as a dimensional attribute.

35
Q

what is Store Dimension

A

Describes every store
Primary geographic dimension
Geographic Hierarchy
Internal District Hierarchy

36
Q

what is Promotion Dimension

A

Most interesting dimension, often called causal dimension
Promotion conditions: price reduction, end aisle display, news paper ad, coupons

Whether a promotion is effective?
Need to capture each combination of promotion
Modeled as separate dimensions or combined
Null values in fact or dimension tables

37
Q

what is causal dimension

A

Most interesting dimension, often called causal dimension

Promotion conditions: price reduction, end aisle display, news paper ad, coupons

38
Q

what is Schema Extensibility

A

Example: Retailer adds frequent shopper program – how would this be supported?
Add new dimension table, surrogate keys: Prior to Frequent shopper Program, Frequent shopper Not Identified

Modeling at most granular level allows extensibility
Extension schemas by adding
New dimensions
New dimension attributes
New measured facts
39
Q

What are Factless Fact Table

A

Some business processes track events without any real measures. If the event happens, we get an entry in the system; if not, there is no row.

Example: Employee hiring and firing, student attending a class.

The fact tables that track these events typically do not have any actual fact measurements, so they are called factless fact tables.

We usually add a column called EventCount containing number 1 to count the number of events by summing the EventCount fact.

40
Q

Do you need a Promotion Dimension

A

Yes, because in product dimension you would have

a lot of rows with NULL values. How do you connect with Fact Table.

41
Q

DD

A

attribute that doesnt have it’s own dimension. POST Transaction# . generally system generated values. can be used as serigate key.

42
Q

Factless Fact Tables – Promotion Coverage

A

What products were on promotion but did not sell?
Add a Promotion Coverage fact table to include product, date, store and promotion attributes for products which are under promotion.

43
Q

Factless Fact Tables – Promotion Coverage

A

The Promotion Coverage fact table is a factless fact table. Each row of this table shows the product on promotion regardless of whether the product sold.
The grain is different from fact table. not just what was sold but was on promotion.

Question: Find the products which were on promotion but not sold.
Step1: Query the Promotion Coverage table to identify the products under promotion on a given day.
Step 2: Determine what products sold from POS sales fact table on that day.
Step 3: Find the difference between above two lists of products.

44
Q

Fact and Dimension Table Keys

A

Use surrogate keys as primary keys
Integer keys, sequential
Natural key: operational system identifier -> store number, time stamp

45
Q

What are advantages of surrogate keys

A
Advantages of surrogate keys
Buffer from operational changes
Integrate multiple source systems
Improve performance
Handle null or unknown conditions
Dimension attribute change tracking
46
Q

snowflake schema

A

hierarchy. you are normalizing the tables.
Advantages of Normalization of DimTables
Saves space by reducing redundancy
Need to update only once.
Disadvantages of normalization of DimTables
Model becomes complex
Numerous joins make query slow

47
Q

slicing query example:

A

-Slicing: This query generates a two dimensional slice by fixing (filtering) one of the dimension value.
SELECT CustomerID, ProductID, StoreID, Quantity FROM FactPurchase_t WHERE CustomerID=1;

48
Q

dicing query example:

A

–Dicing: This query selects a subset over all dimensions.
SELECT CustomerID, ProductID, StoreID, Quantity FROM FactPurchase_t WHERE CustomerID=1 and ProductID=11 and StoreID=101;

49
Q

all types rollup. cube

A

[dbo].[DimCustomer_t]

50
Q

grouping sets

A

instead of union, to merge into reports
-The GROUPING SETS operator generates a result set equivalent to that generated by a UNION ALL of multiple simple GROUP BY clauses.
SELECT CustomerID, ProductID, StoreID, SUM(Quantity) AS TotalCount FROM FactPurchase_t GROUP BY GROUPING SETS(CustomerID, ProductID, StoreID);