Week 9 Flashcards

1
Q

what a data warehouse

A

subject oriented
integrated
time varaint
non updatable

COLLECTION OF DATA1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 CHARACTERISTICS

A

subject oriented: CUSTOMERS/PATIENTS,STUDENTS

integrated: consistent naming conventions, info from multiple sources

time varaint: can study trends and changes

non updatable: read only, periodicaly refershed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

why do we need daata warehousing 2 NEEDS FOR WAREHOUSING

A
  1. searaintg operational and info systems and data
  2. integrated company wife view of high quality info form disprate databaseds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is OLTP

A

OPERATIONAL SYSTEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OLTP/ operational system characterisitcs
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume

A
  • run the business on a current basis

current representaiton of state of the business

clerks, salespersons, admins

narrow, planned and simple updates and queries

erformance: throguhput, availability, relaibility, alignment with business rules

many constant updates and quesies on one or fw table rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

INFO systems/ OLAP
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume

** FOR DATA MINING

A

support managerial desciion making

historical point in time SNAPSHOTS + predictions

managers, business analysts, customers

broad, ad hoc, complex quieries ad analysis

ase and low cost of flexible access and use

periodic batch updates and queries requring many or all rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the issue with comany wide views?

A

incosnistent key structures, snyonyms, FREE FORM VS STRUCTURED FIEDLS, MISSING/INCSSINSTNT DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 DATA MART ARCHITECTURES

A

-INDEPENDANT
-DEPENDANT/OPERATIONAL
-LOGICAL & REAL-TIME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a data mart

A

smaller data structure that stores the data for a particualr purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A DATA WAREHOUSE IS JUST ASSEMBLED VERSION OF DATA MARTS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is ETL

A

extract, trasnform, and load (ETL) process of combining data from multiple sources into large central repo (DATA WAREHOUSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

INDEPENDANT DATA MART ARCHIETECTURE:

WHAT IS flow of data from source system to end user

A

source data systems (compters, internal, external)

> extract >

data staging areas (proceessing) / ttransform; PUT INTOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO DM

> load >

data & metadata storage area (data warehouse consisting od several smaller marts)

>

end user presentation tools : ad-hoc query, OLAP tools, modeling, visualzing, business performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

limitaitons of indepednant data marts

A
  • separate ETL processes for each mart leads to redundant data & prcoessing
  • incosistency between marts
  • difficult to drill down for related facts
  • excess scaling costs when ore app are built
  • high cost fo robratining consstisncy between marts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DEPENDANT DATA MART WITH OPERATIONAL DATA STORE ARCHITECTURE

ODS provides options for transofrming current data, single ETL for ent data warehouse (EDW), dependant data marts loaded from EDW

A

source data systems

> extract >

data staging area

> load into> data and metadata storage area
feed> end user presentation tools

–Data and metadata storage area set up–
instead of separate data mart for everything ,there is one central data base (enterprise EDW), AND YOU LOAD OUT DIFFERENT DATA MARTS AFTERWARDS!

–Data storage area– IS ODS!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is ODS

A

smaller data warehouse that stores recently ahppened data for operationa luse

FOR ANALYTICAL PURPOSE NOT TRANSACTIONAL PURPOSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

logical DATA MART AND REAL TIME WAREHOUSE

A

There is a lot of data! so updating data real time must be really hard

this is very similar to dependant data mart with ODS,b ut the ODS & EDW [data staging area & data/metadata storage area] they BECOME ONE

DATA MARTS ARE NOT SEPEARATE DATABASE, BUT LOGICLA VIEWS OF THE DATA WAREHOUSE!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

pros of Logical data mart & real time wahreohsueo

A

EASIER TO CREATE NEW DATA MARTS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

logical DATA MART AND REAL TIME WAREHOUSE ARCHITECTURE

A

source data systems

> extract>

data staging area + data/metadata storage area

> feed>

end user presentaiton tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

data warehouse vs data marts: scope comparison

A

dw: applciation independant, centralized/ent wide, planned

dm: specfiici dss applicaiton, decnetralized, organic/unplanned at times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

data warehouse vs data marts: subject and sources comparison

A

DW: data is historical, detialed, summarized
- lightly denomalized (STILL TONS OF SEPARATER DATa)
- multi subjects
- many internal and external sources

DM: data has some history, detail and summarizd
- highly denomrialize (less tables!)
- one central subject or conecrn to usesr
- few internal and external sources
HIGHLY DENORMALIZED DATA MARTS= MROE EFFIECTIN AT TIMES CUZ REPEATED DUPLICATED DATA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DW vs DM : other comaprisons

A

DW- flexible, data orientedd, long llife, large, single complex structure

DM: restircitve, project oreindted, sohrtlife, starts small become large, multi-simi-complex structures together complex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

3 layer data archietecture for data warehous

A

TOP: Derived data (from DATA mart METADATA)

MIDDLE: reconciled data (from EDW metadata)

BOTTOM: operational data (from oeprational metadata)

YOU MOVE UPWARDS!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DW data characteristics

A

Status vs Event Data:
- status: before and after images
- event: something that causes change to status (like a transaction)

Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

transient vs periodic daata

A

Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Derived data is ?

A

DATA MART

26
Q

oBJECTIVES OF DERIVED DATA

A

EASE OF USE FOR DS applications

fast response to predefined user queries

Customized data for particualr target audiences

Ad-hoc query support

data mining capabilities

27
Q

Characteristics of derived data

A

detailed/periodic data

aggregate (for summary)

distributed (to deparmetnetal servers)

28
Q

most common derived data model?

A

dimensional model -> star scema

29
Q

What is star schema

A

CONTAINS A FACT TABLE AS THE CENTRE, AN ASSOCIATIVE ENTITIY

and the dimensions taht are relevant ot this fact are the tables that come off the centre

one dimension: only one “arm” of tables comming off the fact table… if you have multi dimension- yo uhave tables coming out of the tables

30
Q

what are dimesnion table keys

A

SURROGATE KEYS (NON-intelligent & non-business related) BECAUSE

  • keys may change
  • helps keep track of non key attribute value
  • surrgoate keys are simpler and shorter
  • surrogate keys can be same length

these are the alternative keys in the table- like AW000010001 then AW000010002…

31
Q

WHAT is granularity of the fact table

A

GRANULARITY- WHAT LEVEL OF DETAIL DO YOU WANT?

Transactional grain: finest level
Aggregated grain: more summarized

32
Q

Finer grains bring beter market based anlsysis caabitliy

Finer grain implies more dimensions tables, more rows in fact table..

A
33
Q

in web based commerce, the finest granularity

A

is A CLICK

34
Q

IS Date improatnt in fact table?

A

YES! for every fact, there should be a date, and for each of them there should be a key

so like diff attributes are: OrderDateKey, OrderDate

in the dimension table you need a table for DimDate

35
Q

some variations of the star schema

A

multi fact table: can improve performance, often used to store facts for differnt combo of dimensions, confromed dimensions

Factless Facts tables: no non key data, but foreign keys for associated dims, used for TRACKING EVENTS / INVETORY COVERAGE

36
Q

Role of Metadata
Operational metadata: Describe the data in the various operational systems that feed the enterprise DW. It may exist in a number of different formats.
EDW: Derived from the enterprise data model, describes the reconciled data layer and the rules for ETL operational data into reconciled data.
Data mart metadata: describes the derived data layer and the rules for transforming reconciled data into derived data

A
37
Q

factless fact table

A

no data in fact table, just keys assocaiting dimensions records, just erecords the currency of something

like attendance

38
Q

2 OPTIONS WHN NORMALIZING DIMENSION TABLES

A
  • include all info in a single denormalized table
  • normalize the dimensions
39
Q

Fixed product hierarchy

A

dimension hierarchies help to provide levls of aggregation for users who want to summary info in a DW

this is snowflake schema

40
Q

snowflake schema= searching is more efficeint

A
41
Q

slowly changing dimensions

KIMBALL APPROACH

A

for some dimensions taht change slowly youw ant to keep history of the past

Kimball’s approach:
— Create a new dimension table row each time the
dimension object changes, with all dimension
characteristics at the time of chanG

BASICALLY HAVE A STARTDATAE AND END DATE AS DIMESNSIONS

42
Q

ETL PROCESS

A

ETL = Extract, transform, and load
— Capture/Extract
— Scrub or data cleansing
— Transform
— Load and Index

43
Q

DATA RECONCILIATION PROCESS

A

Data reconciliation involves capture/extract, cleanse, transform, and
load/index.

https://d2l.ucalgary.ca/d2l/le/content/617907/viewContent/6720112/View
36

44
Q

RECORD LEVEL TRANSFORMAITON FUNCTIONS

Transofrming info at the record level

A

Selection – the process of partitioning data according to
predefined criteria

Joining – the process of combining data from various sources
into a single table or view

Normalization – the process of decomposing relations with
anomalies to produce smaller, well-structured relations

Aggregation – the process of transforming data from detailed
to summary leve

45
Q

single filed transofrmations: basic

A

sourec record: datapoint x

goes through transformationg t

target record: f(x)

46
Q

single filed transofrmations: algortihimic

A

same as basic, but transformed by using some formula or logical expression

47
Q

single filed transofrmations table lookup

A

same as basic but transformation refecens a different table

48
Q

multifield taransformations

A

source record: select 2+ attributes

transofrmatinon T

Target record: combined the 2 atteibutes to one attribute

49
Q

multifield taransformations one source to many targets

A

source record: select 1

tranformaiotn to split it into separate ares

target recrod: 2+

50
Q

pros and cons of fact tabel

A

pro: excellent for ad hoc queries

cons: bad for onlin tps

51
Q

A class of database technology used to store textual and other unstructured data is called:

A

nosql

52
Q

DW IS A COLLECTION OF DMS

A
53
Q

DW does nnot have current data

A

it is not supporting operational data processing

54
Q

the independant data marts approach does not create one dataa warehous

A

creates many separaate data marts

55
Q

Operational data store (ODS)
An integrated, subject-oriented,
continuously updateable, current-
valued (with recent history),
enterprise-wide, detailed database
designed to serve operational
users as they do decision support
processing.

A
56
Q

A logical data mart is a(n):

A

data mart created by a relational view of a slightly denormalized data warehouse.

57
Q

FACTLESS FACT TBALES MAY APPLY WHEN

A

we are deleting correlated data.

58
Q

Data that are detailed, current, and intended to be the single, authoritative source of all decision support applications are called ________ data.

A

RECONCILED

59
Q

Informational and operational data differ in all of the following ways EXCEPT:

A

LEVEL OF DETAIL

60
Q

A data mart is a(n):

A

DW that is limited in scop

61
Q

All of the following are tasks of data cleansing EXCEPT:

A

creating FKS

62
Q
A