10/16 Class Flashcards
OLAP
online analytical process
main goal: to support ad-hoc but complex queries
Places key performance indicators(measures) into context(dimensions):
measures are pre-aggregated
data retrieval is significantly faster
OLTP
online transaction process
Why OLAP
dimension modeling is a natural presentation of data for business analytics.
OLAP Technology is very fast
Most reports run within 1-3 seconds
Speed advantage substantial in highly aggregated reports such as multi-year trends
Without OLAP, the burden is on the developer to extract relevant data and build aggregations
Pre-calculated results
Produces consistent information
Roll up(drill-up)
summarize data by climbing up a concept hierarchy or by reducing dimensions
example:drill up from city to state
drill up by reducing the location dimension
Drill down
analyze more detailed data by moving down a concept hierarchy or by adding dimension(s)
example:
drill down from city to dealer
drill down by adding the time dimension
slice
creates a slice form the cube by choosing a single value for one of the dimensions
dice
creates sub cube from the cub by choosing a two or more values for one or more of the dimensions
hadoop
big data warehouse volume velocity variety hadoop - apache open source software for reliable, scalable, distributed computing
mapreduce
programming model invented at google in 2012
read in input and produce a key value pair
example:
key value
map workers: certain computers are used for this
reduce worker: certain computers only do this
exam review
understand the difference between bitmap and B+tree
ETL
policies for data warehouse maintenance is on maintenance slide, user driven policy vs warehouse driven policy