Distributed DBMS Flashcards
Advantages of DDBMS
- Data is located near the sites of greatest demand
- Faster data access
- Process data at different sites
- New sites can be added without affecting other sites
- Cheaper to add nodes to a system than updating a mainframe
- Less danger of SPOF
Disadvantages of DDBMS
- Complexity of management and control
- Technological difficulty - replication, query optimization, transaction management
- Increased storage requirements (for replication)
- Higher cost
Components of DDBMS
- TP
- DP
- Communications network
Distributed processing
database’s logical processing is shared among
two or more physically independent sites via network
Distributed database
stores logically related database over two or
more physically independent sites via a computer network
Database fragment
database composed of many parts in distributed
database system
Database level fragmentation
Table1 in Location1, Table2 in Location2
Table level fragmentation
Same tables with different data in different locations
- e.g. Payroll & Ops tables in Halifax office with Halifax data, Payroll & Ops tables in Bedford office with Bedford data
Single site processing, single site data
- TP and DP in one computer
- End user has dumb terminal
Multi-site processing, single site data
- Multiple TP run on different computers sharing a single data repository (DP)
- Accessed through LAN
- Client/server architecture
Multi-site processing, multi-site data
- Fully distributed database management system
- Support multiple DP and TP at multiple sites
Homogeneous DDBMS
- integrate multiple instances of same DBMS over a
network - e.g. MySQL v5 on 3 locations
Heterogeneous DDBMS
- integrate different types of DBMSs over a network
- e.g. MySQL in Asia, Oracle in EU, MSSQL Server in US
Fully heterogeneous DDBMS
- support different DBMSs, each supporting different data model running under different computer systems
- DB level fragmentation
- e.g. T1T2 with Oracle in L1, T3T4 with Postgres in L2
Minimum desirable DDBMS transparency
- Distribution Transparency
- Transaction Transparency
- Failure Transparency
- Performance Transparency
- Heterogeneity Transparency
What is Distribution Transparency?
Distributed DB treated as a single logical database
Levels of Distribution Transparency
fragmentation (highest), location, and local mapping (lowest)
Fragmentation Transparency
Query has no fragment name, no location
Location Transparency
Query has fragment name, no location
Local Mapping Transparency
Query has fragment name and location. Faster data retrieval.
When to use local mapping transparency?
for security researchers to track data loss
for DBA to track dupe records
What contains the entire description of Distributed DB?
distributed data dictionary (DDD) or distributed data
catalog (DDC)
What is a distributed global schema?
common database schema to translate user
requests into subqueries
Pros & cons of having large # factors in DDD (i.e. node name, IP address)
Pros: efficient data retrieval
Cons: need to update often
Are rows in each fragment unique?
Yes
What is Transaction Transparency?
- Ensures database transactions will maintain the data integrity and consistency
- Ensures transaction completed only when all database sites complete their part
Remote request
Single SQL statement accesses data processed by a single remote database processor
Remote transaction
Accesses data at single remote site composed of several requests
Distributed transaction
Requests data from several different remote sites on network
Distributed request
Single SQL statement references data at several DP sites
Solution to concurrency control in distributed DBMS?
2PC / 3PC
What is 2PC?
One node would be acting as a coordinator.
- prepare phase: asking other nodes whether they can commit the proposed transaction.
- commit phase: commanding other nodes to commit the proposed transaction.
- if at any phase, a node aborts, the coordinator issues a global abort and tries again
Why is 2PC a blocking protocol?
nodes are stuck waiting for coordinator’s global commit
What is 3PC?
- prepare
- pre-commit (guarantees that nodes can commit)
- commit
What is Performance transparency?
allows a DDBMS to perform as if it were a
centralized database
What is Failure transparency?
ensures the system will operate in case of
network failure
Advantages of query optimization
Lower cost
- Access time (I/O) cost involved in accessing data from multiple remote sites
- Communication costs associated with data transmission
- CPU time cost associated with the processing overhead
Replica transparency
hide multiple copies of data from the user
Network latency
delay imposed by the amount of time required
for a data packet to make a round trip
Network partitioning
delay imposed when nodes become
suddenly unavailable due to a network failure
CAP theorem
Choose 2 from Consistency, Availability, Partition-Tolerance
Data replication
Storage of data copies at multiple sites served by a computer network
Strategies of data fragmentation
Horizontal, vertical, mixed
How does horizontal fragmentation work?
divide by rows on a partition key (e.g. location)
How does vertical fragmentation work?
divide by columns
- e.g. Suppose that the company is divided
into two departments: the service
department and the collections
department.
How does mixed fragmentation work?
both horizontal & vertical
Two modes of data replication
Push & pull
What is push replication
originating DP node sends the changes to
the replica nodes to ensure that data is immediately updated
When to use push replication
When consistency is important.
Latency involved in ensuring consistency.
What is pull replication
the originating DP node sends “messages” to
the replica nodes to notify them of the update. The replica nodes decide when to apply the updates to their local fragment.
When to use pull replication
When availability is important.
- data updates propagate more slowly to
the replicas
- temporary inconsistency