Chapter 14 Flashcards

Question 1

Q

why replication?

Answer

A

increase availability

- improve performance

Question 2

Q

Eager (synchronous) replication

Answer

A

Transaction synchronizes with copies of replicated elements before commit
advantages: guarantees one-copy serializable execution; avoids inconsistencies
Potential problems: update overhead (reduced update performance and increase transaction response time); deadlocks; lack of scalability; cannot be used if nodes are disconnected (e.g. mobile databases) or unavailable

Question 3

Q

Lazy (asynchronous) replication

Answer

A

Changes introduced at one site are propagated (as separate transactions) to other sites only after commit

Advantages: Minimal update overhead (improved response time over eager replication); works also if sites are not connected or unavailable
Potential problems: Out-of-date data; conflicting updates on different replicas can cause inconsistencies between copies.

Question 4

Q

Single-master primary-copy replication

Answer

A

One replica as primary copy and the other as secondaries. Eager and lazy replication can be used for that

Question 5

Q

If a failure occurs in the primary copy:

Answer

A

option 1 - disallow updates until primary recovers

option 2 - a secondary takes over as primary

Question 6

Q

Multi-master replication

Answer

A

„update anywhere“ model.

„race“ each other to propagate the update to all the other nodes (potential for lost updates)

Question 7

Q

How to detect conflicts in multi-master replication?

Answer

A

based on timestamps.
each node compares old object timestamp of incoming replica updates with its own object timestamp. If they are the same, the update is accepted. If not, then the incoming update transaction is rejected and submitted for reconciliation

Question 8

Q

Reconciliation

What are the two approaches?

Answer

A

automatically, based on rules

Manually

Question 9

Q

what are the 2 alternatives for conflict detection?

Answer

A

1 - semantic synchronization: permit commutative transactions (e.g., processing checks at a bank has the same result independent of order)
provide acceptance criteria for detecting conflicts (pass the acceptance tests)

2 - avoids conflicts by implementing update strategies in the application:
fragmentation by key
fragmentation by time

Question 10

Q

What is the Data warehousing

goal?

Answer

A

materialized integration of data from numerous heterogeneous sources to enable powerful strategic data analysis

Question 11

Q

OLAP - online analytical processing

facts table:

Answer

A

events events or objects of interest (e.g. sales event, with info about the product sold, the store, the sales date and price)

Question 12

Q

OLAP dimension table:

Answer

A

objects can often be thought of as arranged in a multi-dimensional space or cube (e.g., sales events have store, product, and time period dimensions)

Question 13

Q

How is a Relational OLAP schema

Star structure?

Answer

A

dimension tables (linked to fact tables) tend to be small
fact table tends to be huge
measures (dependent attributes)

Question 14

Q

how is snowflake schema?

Answer

A

„normalized“ dimensions
it has multiple tables to avoid redundancy
it requires additional joins for OLAP queries

Question 15

Q

What olap group by and what it computes?

Answer

A

OLAP queries usually „group by“ the dimensions, compute aggregate values of measures

Question 16

Q

How data preparation steps (ETL) are counducted in a warehouse?

Answer

Study These Flashcards

A

Monitor discovers and reports changes in data sources -> extractors select and transport data from data sources into the staging area -> transformers perform standardization and integration of data -> loaders insert the data from the staging area into the main warehouse

Question 17

Q

What are the approaches to monitor data changes?

Answer

Study These Flashcards

A

log-based: DBMS writes info about updates into its transaction log and this log is analyzed to extract the change data;
Trigger-based: db trigger are used to gather change data
replication middleware: use the above approaches
audit columns: timestap-based
snapshot differentials: compares current state of data source with snapshot (expensive)

Chapter 14 Flashcards

(17 cards)