Week 9 Flashcards
what a data warehouse
subject oriented
integrated
time varaint
non updatable
COLLECTION OF DATA1
4 CHARACTERISTICS
subject oriented: CUSTOMERS/PATIENTS,STUDENTS
integrated: consistent naming conventions, info from multiple sources
time varaint: can study trends and changes
non updatable: read only, periodicaly refershed
why do we need daata warehousing 2 NEEDS FOR WAREHOUSING
- searaintg operational and info systems and data
- integrated company wife view of high quality info form disprate databaseds
what is OLTP
OPERATIONAL SYSTEM
OLTP/ operational system characterisitcs
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume
- run the business on a current basis
current representaiton of state of the business
clerks, salespersons, admins
narrow, planned and simple updates and queries
erformance: throguhput, availability, relaibility, alignment with business rules
many constant updates and quesies on one or fw table rows
INFO systems/ OLAP
- primary purppose
- type of data
- primary users
- scope of usage
- design goal
- volume
** FOR DATA MINING
support managerial desciion making
historical point in time SNAPSHOTS + predictions
managers, business analysts, customers
broad, ad hoc, complex quieries ad analysis
ase and low cost of flexible access and use
periodic batch updates and queries requring many or all rows
What is the issue with comany wide views?
incosnistent key structures, snyonyms, FREE FORM VS STRUCTURED FIEDLS, MISSING/INCSSINSTNT DATA
3 DATA MART ARCHITECTURES
-INDEPENDANT
-DEPENDANT/OPERATIONAL
-LOGICAL & REAL-TIME
What is a data mart
smaller data structure that stores the data for a particualr purpose
A DATA WAREHOUSE IS JUST ASSEMBLED VERSION OF DATA MARTS
what is ETL
extract, trasnform, and load (ETL) process of combining data from multiple sources into large central repo (DATA WAREHOUSE)
INDEPENDANT DATA MART ARCHIETECTURE:
WHAT IS flow of data from source system to end user
source data systems (compters, internal, external)
> extract >
data staging areas (proceessing) / ttransform; PUT INTOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO DM
> load >
data & metadata storage area (data warehouse consisting od several smaller marts)
>
end user presentation tools : ad-hoc query, OLAP tools, modeling, visualzing, business performance
limitaitons of indepednant data marts
- separate ETL processes for each mart leads to redundant data & prcoessing
- incosistency between marts
- difficult to drill down for related facts
- excess scaling costs when ore app are built
- high cost fo robratining consstisncy between marts
DEPENDANT DATA MART WITH OPERATIONAL DATA STORE ARCHITECTURE
ODS provides options for transofrming current data, single ETL for ent data warehouse (EDW), dependant data marts loaded from EDW
source data systems
> extract >
data staging area
> load into> data and metadata storage area
feed> end user presentation tools
–Data and metadata storage area set up–
instead of separate data mart for everything ,there is one central data base (enterprise EDW), AND YOU LOAD OUT DIFFERENT DATA MARTS AFTERWARDS!
–Data storage area– IS ODS!!
what is ODS
smaller data warehouse that stores recently ahppened data for operationa luse
FOR ANALYTICAL PURPOSE NOT TRANSACTIONAL PURPOSE
logical DATA MART AND REAL TIME WAREHOUSE
There is a lot of data! so updating data real time must be really hard
this is very similar to dependant data mart with ODS,b ut the ODS & EDW [data staging area & data/metadata storage area] they BECOME ONE
DATA MARTS ARE NOT SEPEARATE DATABASE, BUT LOGICLA VIEWS OF THE DATA WAREHOUSE!
pros of Logical data mart & real time wahreohsueo
EASIER TO CREATE NEW DATA MARTS
logical DATA MART AND REAL TIME WAREHOUSE ARCHITECTURE
source data systems
> extract>
data staging area + data/metadata storage area
> feed>
end user presentaiton tools
data warehouse vs data marts: scope comparison
dw: applciation independant, centralized/ent wide, planned
dm: specfiici dss applicaiton, decnetralized, organic/unplanned at times
data warehouse vs data marts: subject and sources comparison
DW: data is historical, detialed, summarized
- lightly denomalized (STILL TONS OF SEPARATER DATa)
- multi subjects
- many internal and external sources
DM: data has some history, detail and summarizd
- highly denomrialize (less tables!)
- one central subject or conecrn to usesr
- few internal and external sources
HIGHLY DENORMALIZED DATA MARTS= MROE EFFIECTIN AT TIMES CUZ REPEATED DUPLICATED DATA
DW vs DM : other comaprisons
DW- flexible, data orientedd, long llife, large, single complex structure
DM: restircitve, project oreindted, sohrtlife, starts small become large, multi-simi-complex structures together complex
3 layer data archietecture for data warehous
TOP: Derived data (from DATA mart METADATA)
MIDDLE: reconciled data (from EDW metadata)
BOTTOM: operational data (from oeprational metadata)
YOU MOVE UPWARDS!
DW data characteristics
Status vs Event Data:
- status: before and after images
- event: something that causes change to status (like a transaction)
Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated
transient vs periodic daata
Transient vs periodic data
- transient: changes to existing records are written over previous records (DESTORY PREVIOUS DATA)
- Periodic: data is never deleted after being added to the store, just updated