Chapter 14 Flashcards
why replication?
- increase availability
- improve performance
Eager (synchronous) replication
Transaction synchronizes with copies of replicated elements before commit
advantages: guarantees one-copy serializable execution; avoids inconsistencies
Potential problems: update overhead (reduced update performance and increase transaction response time); deadlocks; lack of scalability; cannot be used if nodes are disconnected (e.g. mobile databases) or unavailable
Lazy (asynchronous) replication
Changes introduced at one site are propagated (as separate transactions) to other sites only after commit
Advantages: Minimal update overhead (improved response time over eager replication); works also if sites are not connected or unavailable
Potential problems: Out-of-date data; conflicting updates on different replicas can cause inconsistencies between copies.
Single-master primary-copy replication
One replica as primary copy and the other as secondaries. Eager and lazy replication can be used for that
If a failure occurs in the primary copy:
option 1 - disallow updates until primary recovers
option 2 - a secondary takes over as primary
Multi-master replication
„update anywhere“ model.
„race“ each other to propagate the update to all the other nodes (potential for lost updates)
How to detect conflicts in multi-master replication?
based on timestamps.
each node compares old object timestamp of incoming replica updates with its own object timestamp. If they are the same, the update is accepted. If not, then the incoming update transaction is rejected and submitted for reconciliation
Reconciliation
What are the two approaches?
automatically, based on rules
Manually
what are the 2 alternatives for conflict detection?
1 - semantic synchronization: permit commutative transactions (e.g., processing checks at a bank has the same result independent of order)
provide acceptance criteria for detecting conflicts (pass the acceptance tests)
2 - avoids conflicts by implementing update strategies in the application:
fragmentation by key
fragmentation by time
What is the Data warehousing
goal?
materialized integration of data from numerous heterogeneous sources to enable powerful strategic data analysis
OLAP - online analytical processing
facts table:
events events or objects of interest (e.g. sales event, with info about the product sold, the store, the sales date and price)
OLAP dimension table:
objects can often be thought of as arranged in a multi-dimensional space or cube (e.g., sales events have store, product, and time period dimensions)
How is a Relational OLAP schema
Star structure?
dimension tables (linked to fact tables) tend to be small
fact table tends to be huge
measures (dependent attributes)
how is snowflake schema?
„normalized“ dimensions
it has multiple tables to avoid redundancy
it requires additional joins for OLAP queries
What olap group by and what it computes?
OLAP queries usually „group by“ the dimensions, compute aggregate values of measures