Distributed Database Flashcards
What is the DB system evolution ?
Each unit defines and maintains its own data
Centralised DB Systems: Data is defines and administered centrally under the control of a single DBMS.
Distributed DBMS: Data cam be accessed at a set of distributed sites.
Why is distributed DB developed ?
Data in all units is accessible
Data is stored in proximity to locations where it is most often used.
Should improve shareability of data and efficiency of data access.
Should resolve islands of info problem.
What is a distributed Database and Distributed DBMS
Distributed DATABASE:
A logically inter-related collection of shared data(description of data), physically distributed over a computer network.
Distributed DBMS:
Software system that permits the management of distributed DB and makes the distribution transparent to users.
What are the components of DDB and DDBMS ?
DDB:
A logical DB split into a number of FRAGMENTS.
Each is stored on one or more comps under the control of a seperate DBMS
Computer communicate over a network.
DDBMS: User access the Distributed DB via applications:
LOCAL: Don’t require data from other sites
GLOBAL: Require data from other sites.
DDBMS have at least one global application.
What are the characteristics of DDBMS?
Collection of logically related shared data
Data split into fragments and those may be replicated and they are allocated to sites. and these are linked by communication network.
Data at each site under control of DBMS.
Each DBMS participated in at least one global application.
What is parallel DBMS?
can be designed to run across multiple processors and disks to improve performance
Parallel link multiple smaller machines to achieve the same throughout as a single larger machine with greater reliability.
Architectures are: Shared memory, Shared disk and Shared nothing.
Advantage and Disadvantage of DDBMS
Advantage:
Reflects organizational structure.
Improved availability and reliability.
Modular growth
Improved performance
Disadvantages:
Complexity
Cost(manpower needed)
Lack of standards and experience
DB design more complex
Issues while waiting for a response from a sent packet:
remote node fail
May have been lost
Response lost in network
Could time stamps save us ?
Each machine has its own clock- quartz crystal oscillator
These aren’t accurate so it has its own notion of time.
Machines can synchronize to a network time protocol(NTP) that allows the comp clock to to adjust to a time acc to group of servers
How would a node know who is its current leader?
1.ACquires a lease with timeout
2.One node can hold a lease at a time
3.Leader renew lease after timeout period
What are the types of DDBMS?
HOMOGENOUS:
All sites use same local DBMS product(SQL)
Easier to design and manage
Similar to Centralised DB over a Distributed system.
Provides incremental growth and performance
HETERGENEOUS:
Sites may run diff DBMS products with possibly diff underlying data models.
Occurs when sites-own DB and integration is an after-thought
Translations are needed for diff hardware and DBMS products
Typical sol is to use gateways
What are the functions of a DDBMS?
Extended communication service
Extended data dictionary and concurrency control
Distributed query processing
What are the key issues of the design of distributed DB ?
FRAGMENTATION:
relation may be / into sub - relations which are then distributed
ALLOCATION: each fragment is distributed and stored at site with optimal distribution
REPLICATION:may contain copy of frag at several sites.
What are the four strategies regarding placement of data
IN data allocation:
CENTRALIZED: single database and DBMS stored at one site with users distributed across the network.
PARTITIONED: DB partitioned into fragments and each is assigned to one site
COMPLETE REPLICATION: maintaining complete copies of DB at each site
SELECTIVE REPLICATION:combination of partitioning, replication, and centralization.
Why do we do fragmentation ?
Usage: applications work with views rather than entire relations
Efficiency: Data stored close to where it’s used. Data not needed by local applications not stored.
Parallelism: Transactions can be / into sub queries operating on seperate frags.
Security: Data not required by local applications is not stored and not available to unauthorized users.
ADVANTAGES:
Locality of reference
Improved performance and reliability
Minimal communication costs
DISADVANTAGES:
Performance: apps pulling data from many frags will be slower. Network delays.
Integrity: will be more difficult. Communicating updates to all replicas may be slow.
What are the three main types of fragmentation ?
1.HORIZONTAL: Tuples that belong to this are identified by selection query
EX: employee relation for a company with employees in london and new york
We could fragment-D1 and D3 department are stored in new york
D2-london workers
Reconstruct original table with UNION OPERATION.
This can lead to HOTSPOTS.
2.VERTICAL: if we want the relation in two vertical frags. The first frag-name, gender etc and the second frag can have salary ssn no.
3.HYBRID:
A mixed frag consists of hori frag that is subsequently verti fragmented or verti frag is subsequently hori fragmented
EX: store work related and personal frag at separate sites but also save verti frag of work-related data in new york or dublin
Reconstruct with outer joins and union operations.
Transparencies in DDBMS
The distribution should be transparent to the user
There is 3 types
DISTRIBUTION
TRANSACTION
PERFORMANCE