Chapter 4 Test Deck Flashcards

1
Q

Data Warehouse

A

A subject-oriented, integrated, time-variant, and nonvoliatile collection of data in support of manament’s decision-making process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data warehousing

A

The process of constructing and using data warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Subject-Oriented

A

organized around Major subjects such as customer, product, sales
Focusing on modeling and analysis of data for decision makers
(OLTP - Online Transaction Processing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Integrated

A

Constructed by integrated multiple, heterogeneous data sources
relational databases, flat files, online transaction records
data cleaning and data integration techniques are applied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

TIme variant

A

Time horizon significantly longer than operational systems: current value data
data warehouse data: provide info from historical perspective 5-10 years
summarized and historical records, patterns
Contains an element of time, explicitly or implicitly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nonvolatile

A

A physically separate store of data from the operational environment
Update of data does not occur in the data warehouse environment
not require transaction processing, recovery, and concurrence control mechanisms
requires initial loading of data and access of data
OLAP- Online Analytical Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLTP

A
Clerk, IT professional
day to day operations
application-oriented E-R model
current, up-to-date detailed
repetitive
read/write
short simple transaction
tens
thousands
100MB-GB
transaction throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Olap

A
knowledge worker
decision support
subject-oriented star schema
historical multidimensional
ad-hoc
lots of scans
complex query
millions
hundreds
100Gb-TB
query throughput, response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DBMS

A

Tuned for OLTP access methods, indexing, concurrency control, recovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

warehouse

A

tuned for OLAP: complex OLAP queries, multidimensional view, consolidation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Enterprise warehouse

A

collects all of the information about subjects spanning the entire organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Mart

A

A subset of corporate-wide data that is of value to a specific group of users. Scope confined to specific, selected groups, such as marketing data market. Independent vs dependent data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

virtual warehouse

A

A set of views over operational databases

only some of the possible summary views may be materialized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data extractiom

A

Get data from multiple, heterogenous, and external sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data cleaning

A

detect errors in the data and rectify them when possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Transformation

A

Convert data from legacy or host format to warehouse format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Load

A

sort summarize consolidate, compute views, check integrity, and build indices and partitions

18
Q

Refresh

A

propagate the updates from the data sources to the warehouse

19
Q

Meta data

A

the data defining warehouse objects
stores: description of the structure of the data warehouse: schema, view, dimensions etc.
The algorithms used for summarization
the mapping from operational environment to the data warehouse
data related to system performance
operational meta-data
business data, terms, ownership of data charging policies

20
Q

operational meta-data

A

data lineage, history of migrated data and transformation path,
currency of data, active, archived, or purged,
monitoring information, warehouse usage statistics, error reports, audit trails

21
Q

A data cube

A

allows data to be modled and viewd in multiple dimensions
dimension tables and fact tables
it is n-dimensional, 4 d cubes can be a series of 3-d cubes
physical storage may differ from its logical representation

22
Q

Cuboid

A

data cube often referred as a cuboid

The lattice of cuboids forms a data cube

23
Q

base cuboid

A

n-d base cube

24
Q

apex cuboid

A

the top most 0-D cuboid which holds the highest-level of summarization

25
Star schema
A fact table in the middle connected, via foreign key to primary key relationship, to a ser of dimension tables
26
Snowflake schema
a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables forming a shape similar to snowflake Reduce redundancy
27
Fact Constellations
multiple fact tables share dimension tables, viewed as a collection of stars, therefore, called galaxy schema pr fact constellation
28
Data warehouses vs datamart
data wearhouse collects information about subjects that span the entire organization, it is enterprise-wide for warehouse, the fact constellation schema is commonly used a data mart is a departmental subset of the data warehouse, it is department-wide the star or snowflake is commonly used star schema is more popular and efficient, less joins
29
Roll up or drill up
summarize data by clibing up hierarchy or by dimension reduction
30
drill down or roll down
the reverse of roll up | from higher-level summary to lower level summary or detailed data or introducing new dimensions
31
slice and dice
slice: select form 1 D Dice: select 2 or more Ds from a subcube
32
pivot or rotate
reorient the cube, visualization 3d series of 2d blanes
33
drill across
involving more than one fact table
34
drill through
through the bottom level of the cube to its back-end related tables using SQl
35
Information processing
supports querying, basic statistical analysis, and reproting using crosstabs, tables, charts and graphs
36
analytical processing
multidimensional analysis of data warehouse data | support basic olap operations, slice-dice, drilling,pivoting
37
Data mining
knowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools
38
Relational OLAP or ROLAP
use relational or extended relaional DBMS to store and manage waehosue dat and client front end using OLAP middleware greater scalability than MOLAP technology
39
Multidimensional OLAP or MOLAP
sparse array-based multidimensional storage engine | fast indexing to precomputed summarized data
40
Hybrid OLAP or HOLAP
e.g. microsoft SQLServer combines ROLAP and MOLAP technology flexible e.g. low level relational, high level array
41
Specialized SQL servers
e.g. Redbricks | specialized support for SQL queries over star.snowflake schemas