Distributed database management systems Flashcards

1
Q

Distributed architectures

A

Distributed DB

  • diffrent DBMS server on different network nodes
    • autonomous
    • able to cooperate
  • Guaranteeing ACID properties require complex techniques

Client/server

  • simplest

Data replication

  • a replica is a copy of the data stored on a different network node
  • the replication server autonomously manages copy update
  • simpler architecture than distributed databse

Parallel architecture

  • Performance is the objective
  • multiprocessor machines vs cpu clusters

Data warehouses

  • decision support
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Distributed systems relevant properties

A

Portability

  • Capability of moving a program from system to system
  • for dbs, this is guaranteed by the sql standard

Interoperability

  • capability of different dbms servers to cooperate on a given task
  • interaction protocols are needed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SQL execution types

A

Compile and go

  • query sent to server
  • query prepared (execution plan)
  • query executed
  • result shipped

Compile and store

  • query sent to server
  • query prepared (execution plan)
  • plan is stored for later use
  • query exectued
  • result shipped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Advantages of distributed database systems

A

Functional:

  • Appropriate localization of data and applications

Technological

  • Increased data availability
  • enhanced scalability

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data fragmentation types and properties (distribued database systems)

A

Given a relation R, a data fragment is a subset of R in terms of tuples, or schema or both.

Fragmentation criteria:

  • horizontal
    • subsets of tuples with same schema of R
    • obtained by predicate selection (Employee dept=”production)
    • fragments are not overlapped
  • vertical
    • subset of a schema of R
    • selects a set of columns for each node
    • primary keys are included
    • all tuples are included
  • mixed

Properties:

  • completeness: each information is contained in at least a node
  • correctness: information in r can be rebuilt from its fragments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Transparency levels (distributed dbms)

A

desribe the knowledge of a data distribution

an application should operate differently depending on the trasparency level supported by the dbms

Fragmentation trasparency

  • application knows that fragmentation is going on but data distribution is not visible

allocation transparency

  • app. knows fragments but not the allocation
    • not aware of replicas
    • must enumerate all fragments

language transparency

  • programmer should select both fragment and allocation
  • higher level transparency queries are converted to this format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Transaction types in distributed dbms

A
  • remote request (read-only on single server)
  • remote transaction (single server)
  • distributed transaction (each sql statement is address to one single server)
    • global atomicity is needed (2PC)
  • Distributed request
    • each sql command may refer to data on different nodes
    • distributed optimization is needed (performed by dbms receiving request)
    • fragmentation transparency is in this class only
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ACID properties in relation to distributed dbms

A

Atomicity

  • requires 2PC
  • all nodes partecipating in distributed transaction must implement the same change (commit or rollback)
  • failure: node, network, network partitioning

Consistency

  • constraints are currently enforced only locally

Isolation

  • requires strict 2PL and 2PC

Durability

  • Requires the extension of a local procedures to manage atomicity in presence of failure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2PC

A

Goal: coordination of the conclusion of a distributed transaction

Parallel with a wedding:

  • Priest coordinates agreement
  • Couple partecipate in agreement

One coordinator (transaction manager), several DBMS servers that partecipate (resource managers), any partecipant can be a coordinator (even client)

TM new log records:

  • Prepare (contains identity of all RMs partecipating)
  • Global commit/abort
  • Complete

RM new log record:

  • Ready
    • RM is willing to perform commit of the transaction
    • decision cannot be changed afterwards
    • RM loses its control

Protocol

  • Phase 1
    1. TM writes prepare record and sends prepapre message to all RMs
    2. TM sets timeout
    3. if RMs receive the message
      • state = reliable: write ready record, send the ready message
      • state = not reliable: send not ready, terminate protocol, rollback
      • state = crashed: no answer
    4. TM collects incoming messages, if it receives not ready or timeout expires launches global abort
  • Phase 2
    1. TM sends global decision to RMs and sets timout
    2. RMs wait for glabal decision and when they receive it they commit/abort on the log and send an ACK back
    3. TM waits for all acks to be received, if the timeout expires, another is set (sending again message)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Failure of partecipant and coordinator in distributed systems

A

RM

  • Warm restart procedure is modified with a new case
    • if last record for transaction T is “ready”, then T does not know global decision
  • Recovery
    • Ready list: collecting IDs of all transactions in ready state
    • for all transactions in the list, the global decision is asked to corresponding TM

TM

  • Messages can be lost
    • Prepare (out)
    • Ready (in)
    • Global decision (out)
  • Recovery
    • if last record in the TM log is prepare: global abort decision is written in the log and sent to all RMs (alternative is redo phase, but is not implemented)
    • if last record in TM log is glabal decision: repeat phase 2

Network problems:

  • in phase I -> abort
  • in phase II -> repetition of phase II
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

X-OPEN-DTP

A
  • protocol for coordination of distributed transactions
  • guarantees interoperability between heterogeneous DBMSs
  • one client, one TM, several RMs
  • defines intefaces between client and TM (TM interface) and between TM and RMs (XA interface)
  • RMs are passive and only answer to TM
  • 2 optimization of 2PC:
    • presumed abort: when no info is available in log, TM answers abort to a remote recovery request by RM
      • reduces number of sync log writes
    • read only: used be RM that did not modify db during transaction
      • RM answers ready only to prepare request
      • does not write log and locally terminated protocol
      • TM will ignore the RM in phase 2
  • Heuristic decision: evolution in presence of TM failures
    • during uncertainty window
    • blocked transaction evoleves locally
    • on TM recovery decisions are compared, if they differ, atomicity is lost and inconsitency is notified.
    • resolving inconsitency is up to user applications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parallel DBMS

A

Through: multiprocessor systems or computer clusters

Queries can be efficiently parallelized: large table scans, group bys

Inter-query parallelism: different queries are scheduled on different processors, often used in OLTP systems

Intra-query: subparts of the same query are executed on different processors (OLAP, heavy queries)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DBMS benchamarks

A

TPC C (TPC E): emulates OLTP

TPC H: OLAP

TPCx-HS: Big data management (Hadoop clusters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2PC: Uncertainty Window

A

It is between when the RM send the ready/not ready message until it receives the global decision from the TM.

Local resoruces are locked during this time. That is why the uncertainty windows should be as small as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly