Big Data Lecture 14 Data Cubes Flashcards
What is OLTP?
Online transaction processing, both read and write intensive system in real time, example is relational database.
What is OLAP?
Online analytical processing, we want to agreggate data, can be not fully online, example are Data Cubes.
What are the main differences of OLTP and OLAP in IO?
OLTP writes mostly, OLAP reads often.
What is the difference of OLAP and OLTP in detail of information it provides about data?
OLTP has detailed individual records, OLAP historical summarized consolidated data.
What is the difference of OLTP and OLAP in terms of size of transactions?
OLTP has small transactions, OLAP has few huge queries.
What is the difference of OLAP and OLTP in execution time?
OLTP is interactive, OLAP takes its time.
What is the difference of OLAP and OLTP in terms of duplication and denormalization?
OLTP is consistent, OLAP has redundant data.
What 4 adjectives define OLAP?
<ul><li>Subject oriented: usually for one use case (web analytics, sales...),</li><li>time-variant: the data tends to be historical,</li><li>integrated: loads data from many different sources,</li><li>non-volatile: the data is locked once it is loaded.</li></ul>
What is the main data shape in OLAP?
Data cube! Multidimensional object with many dimensions, which are indexed by members.<br></br>E.g. YEAR x PRODUCT x COUNTRY identifies a unique product sold.
What is a fact table?
Table that stores for each coordinate combination a value.
What is a measurement?
Value stored in the table.
What is slicing?
Taking a slice of the cube over some dimensions.
What is dicing?
Reformatting the table such that the axis of the table become the dimensions, this is called the cross-tabulated view.
How is aggregation from the POV of the cube?
It is like squeezing a cube along one dimension.
What is a roll-up?
We combine two categories in aggregation together, to create a lower dimensional view of the data.
What is a drill-down?
Opposite of roll up. We make the data more granular, by considering more subvariables in each category, making the view more precise, less aggregated.
What is the difference of ROLAP and MOLAP?
Relational OLAP (built on top of RDBMS) and Multidimensional OLAP (natively deals with multiple dimensions).
What is pivoting (and unpivoting)?
Pivoting takes dimensions and makes the associated values into a different measurements. Unpivoting reverses that.
What is a star schema of tables?
We keep one main table with all dimensions, and then we have <i>satellite tables </i>which reference the remaining data.
What is snowflake schema?
Within each table of the star schema, we further normalize the data into more tables.
What is MDX?
Stands for Multi-Dimensional eXpressions. The query language for Data Cubes.
How to roll up in SQL?
Use group by, and the dimensions that you want to keep insert into the groupings.
How to cube in SQL?
You can ‘GROUP BY GROUPING SETS’ and then denote all the combinations, or ‘GROUP BY CUBE’ and just input the attributes, it will produce the 2^n different sets we need to group on.
How to roll up in SQL?
You can use ‘GROUP BY ROLLUP’ that produces all the attributes for (n+1) types that we need.
How do authorities deal with fancy reports from companies to make the readeable to the system?
They let them produce the reports in XHTML, and make them annotate the important attributes for parsing. Later, data cubes can be used for simple aggregation online.