Distributed DBMS Flashcards

1
Q

Advantages of DDBMS

A
  • Data is located near the sites of greatest demand
  • Faster data access
  • Process data at different sites
  • New sites can be added without affecting other sites
  • Cheaper to add nodes to a system than updating a mainframe
  • Less danger of SPOF
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Disadvantages of DDBMS

A
  • Complexity of management and control
  • Technological difficulty - replication, query optimization, transaction management
  • Increased storage requirements (for replication)
  • Higher cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Components of DDBMS

A
  • TP
  • DP
  • Communications network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Distributed processing

A

database’s logical processing is shared among
two or more physically independent sites via network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Distributed database

A

stores logically related database over two or
more physically independent sites via a computer network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Database fragment

A

database composed of many parts in distributed
database system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Database level fragmentation

A

Table1 in Location1, Table2 in Location2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Table level fragmentation

A

Same tables with different data in different locations
- e.g. Payroll & Ops tables in Halifax office with Halifax data, Payroll & Ops tables in Bedford office with Bedford data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Single site processing, single site data

A
  • TP and DP in one computer
  • End user has dumb terminal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multi-site processing, single site data

A
  • Multiple TP run on different computers sharing a single data repository (DP)
  • Accessed through LAN
  • Client/server architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multi-site processing, multi-site data

A
  • Fully distributed database management system
  • Support multiple DP and TP at multiple sites
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Homogeneous DDBMS

A
  • integrate multiple instances of same DBMS over a
    network
  • e.g. MySQL v5 on 3 locations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heterogeneous DDBMS

A
  • integrate different types of DBMSs over a network
  • e.g. MySQL in Asia, Oracle in EU, MSSQL Server in US
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fully heterogeneous DDBMS

A
  • support different DBMSs, each supporting different data model running under different computer systems
  • DB level fragmentation
  • e.g. T1T2 with Oracle in L1, T3T4 with Postgres in L2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Minimum desirable DDBMS transparency

A
  1. Distribution Transparency
  2. Transaction Transparency
  3. Failure Transparency
  4. Performance Transparency
  5. Heterogeneity Transparency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Distribution Transparency?

A

Distributed DB treated as a single logical database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Levels of Distribution Transparency

A

fragmentation (highest), location, and local mapping (lowest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Fragmentation Transparency

A

Query has no fragment name, no location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Location Transparency

A

Query has fragment name, no location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Local Mapping Transparency

A

Query has fragment name and location. Faster data retrieval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When to use local mapping transparency?

A

for security researchers to track data loss
for DBA to track dupe records

22
Q

What contains the entire description of Distributed DB?

A

distributed data dictionary (DDD) or distributed data
catalog (DDC)

23
Q

What is a distributed global schema?

A

common database schema to translate user
requests into subqueries

24
Q

Pros & cons of having large # factors in DDD (i.e. node name, IP address)

A

Pros: efficient data retrieval
Cons: need to update often

25
Q

Are rows in each fragment unique?

A

Yes

26
Q

What is Transaction Transparency?

A
  • Ensures database transactions will maintain the data integrity and consistency
  • Ensures transaction completed only when all database sites complete their part
27
Q

Remote request

A

Single SQL statement accesses data processed by a single remote database processor

28
Q

Remote transaction

A

Accesses data at single remote site composed of several requests

29
Q

Distributed transaction

A

Requests data from several different remote sites on network

30
Q

Distributed request

A

Single SQL statement references data at several DP sites

31
Q

Solution to concurrency control in distributed DBMS?

A

2PC / 3PC

32
Q

What is 2PC?

A

One node would be acting as a coordinator.

  • prepare phase: asking other nodes whether they can commit the proposed transaction.
  • commit phase: commanding other nodes to commit the proposed transaction.
  • if at any phase, a node aborts, the coordinator issues a global abort and tries again
33
Q

Why is 2PC a blocking protocol?

A

nodes are stuck waiting for coordinator’s global commit

34
Q

What is 3PC?

A
  • prepare
  • pre-commit (guarantees that nodes can commit)
  • commit
35
Q

What is Performance transparency?

A

allows a DDBMS to perform as if it were a
centralized database

36
Q

What is Failure transparency?

A

ensures the system will operate in case of
network failure

37
Q

Advantages of query optimization

A

Lower cost
- Access time (I/O) cost involved in accessing data from multiple remote sites
- Communication costs associated with data transmission
- CPU time cost associated with the processing overhead

38
Q

Replica transparency

A

hide multiple copies of data from the user

39
Q

Network latency

A

delay imposed by the amount of time required
for a data packet to make a round trip

40
Q

Network partitioning

A

delay imposed when nodes become
suddenly unavailable due to a network failure

41
Q

CAP theorem

A

Choose 2 from Consistency, Availability, Partition-Tolerance

42
Q

Data replication

A

Storage of data copies at multiple sites served by a computer network

43
Q

Strategies of data fragmentation

A

Horizontal, vertical, mixed

44
Q

How does horizontal fragmentation work?

A

divide by rows on a partition key (e.g. location)

45
Q

How does vertical fragmentation work?

A

divide by columns
- e.g. Suppose that the company is divided
into two departments: the service
department and the collections
department.

46
Q

How does mixed fragmentation work?

A

both horizontal & vertical

47
Q

Two modes of data replication

A

Push & pull

48
Q

What is push replication

A

originating DP node sends the changes to
the replica nodes to ensure that data is immediately updated

49
Q

When to use push replication

A

When consistency is important.
Latency involved in ensuring consistency.

50
Q

What is pull replication

A

the originating DP node sends “messages” to
the replica nodes to notify them of the update. The replica nodes decide when to apply the updates to their local fragment.

51
Q

When to use pull replication

A

When availability is important.
- data updates propagate more slowly to
the replicas
- temporary inconsistency