Distributed database management systems Flashcards

Question 1

Q

Distributed architectures

Answer

A

Distributed DB

diffrent DBMS server on different network nodes
- autonomous
- able to cooperate
Guaranteeing ACID properties require complex techniques

Client/server

simplest

Data replication

a replica is a copy of the data stored on a different network node
the replication server autonomously manages copy update
simpler architecture than distributed databse

Parallel architecture

Performance is the objective
multiprocessor machines vs cpu clusters

Data warehouses

decision support

Question 2

Q

Distributed systems relevant properties

Answer

A

Portability

Capability of moving a program from system to system
for dbs, this is guaranteed by the sql standard

Interoperability

capability of different dbms servers to cooperate on a given task
interaction protocols are needed

Question 3

Q

SQL execution types

Answer

A

Compile and go

query sent to server
query prepared (execution plan)
query executed
result shipped

Compile and store

query sent to server
query prepared (execution plan)
plan is stored for later use
query exectued
result shipped

Question 4

Q

Advantages of distributed database systems

Answer

A

Functional:

Appropriate localization of data and applications

Technological

Increased data availability
enhanced scalability

*

Question 5

Q

Data fragmentation types and properties (distribued database systems)

Answer

A

Given a relation R, a data fragment is a subset of R in terms of tuples, or schema or both.

Fragmentation criteria:

horizontal
- subsets of tuples with same schema of R
- obtained by predicate selection (Employee dept=”production)
- fragments are not overlapped
vertical
- subset of a schema of R
- selects a set of columns for each node
- primary keys are included
- all tuples are included
mixed

Properties:

completeness: each information is contained in at least a node
correctness: information in r can be rebuilt from its fragments

Question 6

Q

Transparency levels (distributed dbms)

Answer

A

desribe the knowledge of a data distribution

an application should operate differently depending on the trasparency level supported by the dbms

Fragmentation trasparency

application knows that fragmentation is going on but data distribution is not visible

allocation transparency

app. knows fragments but not the allocation
- not aware of replicas
- must enumerate all fragments

language transparency

programmer should select both fragment and allocation
higher level transparency queries are converted to this format

Question 7

Q

Transaction types in distributed dbms

Answer

A

remote request (read-only on single server)
remote transaction (single server)
distributed transaction (each sql statement is address to one single server)
- global atomicity is needed (2PC)
Distributed request
- each sql command may refer to data on different nodes
- distributed optimization is needed (performed by dbms receiving request)
- fragmentation transparency is in this class only

Question 8

Q

ACID properties in relation to distributed dbms

Answer

A

Atomicity

requires 2PC
all nodes partecipating in distributed transaction must implement the same change (commit or rollback)
failure: node, network, network partitioning

Consistency

constraints are currently enforced only locally

Isolation

requires strict 2PL and 2PC

Durability

Requires the extension of a local procedures to manage atomicity in presence of failure

Question 9

Q

2PC

Answer

A

Goal: coordination of the conclusion of a distributed transaction

Parallel with a wedding:

Priest coordinates agreement
Couple partecipate in agreement

One coordinator (transaction manager), several DBMS servers that partecipate (resource managers), any partecipant can be a coordinator (even client)

TM new log records:

Prepare (contains identity of all RMs partecipating)
Global commit/abort
Complete

RM new log record:

Ready
- RM is willing to perform commit of the transaction
- decision cannot be changed afterwards
- RM loses its control

Protocol

Phase 1
1. TM writes prepare record and sends prepapre message to all RMs
2. TM sets timeout
3. if RMs receive the message
  - state = reliable: write ready record, send the ready message
  - state = not reliable: send not ready, terminate protocol, rollback
  - state = crashed: no answer
4. TM collects incoming messages, if it receives not ready or timeout expires launches global abort
Phase 2
1. TM sends global decision to RMs and sets timout
2. RMs wait for glabal decision and when they receive it they commit/abort on the log and send an ACK back
3. TM waits for all acks to be received, if the timeout expires, another is set (sending again message)

Question 10

Q

Failure of partecipant and coordinator in distributed systems

Answer

A

RM

Warm restart procedure is modified with a new case
- if last record for transaction T is “ready”, then T does not know global decision
Recovery
- Ready list: collecting IDs of all transactions in ready state
- for all transactions in the list, the global decision is asked to corresponding TM

TM

Messages can be lost
- Prepare (out)
- Ready (in)
- Global decision (out)
Recovery
- if last record in the TM log is prepare: global abort decision is written in the log and sent to all RMs (alternative is redo phase, but is not implemented)
- if last record in TM log is glabal decision: repeat phase 2

Network problems:

in phase I -> abort
in phase II -> repetition of phase II

Question 11

Q

X-OPEN-DTP

Answer

A

protocol for coordination of distributed transactions
guarantees interoperability between heterogeneous DBMSs
one client, one TM, several RMs
defines intefaces between client and TM (TM interface) and between TM and RMs (XA interface)
RMs are passive and only answer to TM
2 optimization of 2PC:
- presumed abort: when no info is available in log, TM answers abort to a remote recovery request by RM
  - reduces number of sync log writes
- read only: used be RM that did not modify db during transaction
  - RM answers ready only to prepare request
  - does not write log and locally terminated protocol
  - TM will ignore the RM in phase 2
Heuristic decision: evolution in presence of TM failures
- during uncertainty window
- blocked transaction evoleves locally
- on TM recovery decisions are compared, if they differ, atomicity is lost and inconsitency is notified.
- resolving inconsitency is up to user applications.

Question 12

Q

Parallel DBMS

Answer

A

Through: multiprocessor systems or computer clusters

Queries can be efficiently parallelized: large table scans, group bys

Inter-query parallelism: different queries are scheduled on different processors, often used in OLTP systems

Intra-query: subparts of the same query are executed on different processors (OLAP, heavy queries)

Question 13

Q

DBMS benchamarks

Answer

A

TPC C (TPC E): emulates OLTP

TPC H: OLAP

TPCx-HS: Big data management (Hadoop clusters)

Question 14

Q

2PC: Uncertainty Window

Answer

A

It is between when the RM send the ready/not ready message until it receives the global decision from the TM.

Local resoruces are locked during this time. That is why the uncertainty windows should be as small as possible.