5 - Data Analysis in SQL Flashcards

1
Q

What are the main goals of OLAP?

A
  • Decision support to knowledge worker (analyse company data to discover patterns/trends)
  • based on integrated data warehouse [separate from OLTP systems -> periodic refresh)
    • > Organize and centralize corporate data (historical) from various heterogeneous sources (ETL)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compare OLAP applications with OLTP applications.

A

OLTP: many simple (point update) queries, small amounts of data, current data, fast update + high (transaction) throughput
OLAP: few complex queries (repeated), large data, current+historical data, fast calculation + load throughput + query tthroughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Data Warehouse and its basic architecture?

A

Separate datastore modeled by multidimensional historical data from various sources, supporting OLAP.
SOURCES -> ETL -> WAREHOUSE -> TOOLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the goals of multidimensional Data Modelling.

A
  • > set of numerical measures associated to a combination of (hierarchical) dimensions
  • > Dimensions group and qualify numerical fact data, aggregating measures on the groups

MOLAP (specific storage for multidimensional data) vs ROLAP (relies on RDBMS capabilities, enhanced SQL).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of the multidimensional cube in OLAP and its supported operations?

A
  • > way to arrange/interpret multidimensional data: each dimension in an axis
    • > points contain measures (cube cells)
  • > rollup/drill-down: increase/decrease level of aggregation (day month, country city)
  • > Slice/dice: selection and projection (fix one dimension/restrict (internal) one or more dim.)
  • > Split/Merge: rem/add dimensions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the main concepts of Multidimensional Data Modelling?

A

MD Schema: set of dimensions + set of measures (of a cube)

**dimensions: partially ordered set of category attributes (primary(finest granule), classification (hierarchy), dimension(additional info))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain and compare the terms Star Schema and Snowflake Schema.

A

Ways of modelling MD data in RDBMS
Star:
* fact table on the center has pointers to dimension tables and associated measures
* dimension attributes on dimension table -> more performance
Snowflake:
* Dimension tables are normalized, reflecting dimension hierarchy
* less redundant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the idea of slice and dice operations?

A

cub: join fact table with dimension tables
- > retrieve portions of interest from the cube (in WHERE clause)
- > slice: fix one dimension -> dice: restrict values to an internal

  • > choice of partitioning: GROUP BY
  • > drill-down, rollup common in slice/dice queries

drill-down: finer partitions
rollup: coarser partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main functionality provided by the SQL OLAP extensions?

A

extesions to GROUP BY: ROLLUP, CUBE, GROUPING SETS (grouping by different sets of attributes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give an overview over the ROLLUP operator.

A

Extends grouping semantics to combination of groups

GROUP BY ROLLUP (month, city, producer)

  • > producer within city within month (normal GROUP BY)
  • > City within month (producer = null)
  • > month (city, producer=null)
  • > grand total (all null)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give an overview over the CUBE operator.

A

Extends ROLLUP to produce multidimensional grouping (all group combinations)
GROUP BY CUBE (month, city, producer) = m,c,p+m,c+m,p+c,p+m+c+p+0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are grouping sets and how do they generalize other group operators?

A

Allow multiple groupings on a single scan (pass)
-> arbitrary definition of desired groups

ROLLUP (a,b,c) = GROUPING SETS ((a,b,c), (a,b), (a), ())
**() -> special syntax for grand total
CUBE(a,b,c) = GROUPING SETS((a,b,c),(a,b),(a,c),(b,c),(a),(b), (c), ())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the analysis of functional dependencies provide more flexibility to expressing aggregations in SQL queries?

A

GROUP BY rule: columns projected must either be agg functions or grouped
-> functional dependency analysis may detect that column value is unique within group, allowing violation of the rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly