Distributed Databases Flashcards
Distributed Databases
Distributed DBMS provide access to data at all sites.
Lets say we have one store in Liverpool. This store might eventually spread to Manchester or London etc.
The concrete definition of a distributed database if a collection of multiple logically interrelated databases which is distributed over a computer network.
Advantages of distributed databases
- help provide us access to these different data sites
- we don’t have to specify where the data is from and we can just grab it from wherever
- can gives many users access to large datasets
- answer to queries faster by distributing tasks over the nodes
- easier to scale (just add new node)
Fragmentation
Split database in different parts which we can store at different nodes.
Horizontal Fragmentation
Fragmenting the database from top to bottom (rows).
Data is stored as tuples.
Vertical Fragmentation
Fragmenting the database based upon columns.
Data is stored as columns in other databases.
Fragmentation Transparency
The user does not see this fragmentation, just the full relations.
Entire Relation
Union of the fragments.
Fragmentation Advantage
Using these fragmentations can help resilience; if there is a failure in one store, there are other stores which hold the fragments of the database.
Types of replication
- full replication
- no replication
- partial replication
Full Replication
Each fragment is stored at every sight.
No Replication
Each fragment is stored at a unique site.
Partial Replication
Limit number of copies of each fragment, where we replicate only some fragments.
Types of transparency
- fragmentation transparency
- replication transparency
- locations transparency
- naming transparency
Fragmentation Transparency
Fragmentation is transparent to others.
Replication Transparency
Ability to copy data items at different sites where the replication is transparent to others.
Locations Transparency
The location where data is stored is transparent to users.
Naming Transparency
A given name has the same meaning everywhere in the system. Like the relation names must be the same everywhere.
Concurrency Control in DDBMS
Locks are the main contributor in terms of concurrency.
Types of locking distributions for DDBMS
- one computer grants all locks
- one computer grant many locks but with backups
- many computers with different authorities
- many computers with different authorities but with backups
One computer grants all locks
If the computer fails, we have to restart everything that is running, since we do not have backups.
There are too many transactions for one computer to handle.
One computer grant many locks but with backups
Solves the restarting problem since we have backups, but we need to keep everything synced.
Many computers work together to grant keys
It is not sure which computer to ask for which key.
Many computers work together to grant keys
The previous problem still remains, but now we also have to sync.
Voting
1) Each site with a copy of an item has a local lock that it can grant transactions for that item.
2) If a transaction gets over half the local locks for an item (since sites can hold the same item/lock), it receives a global lock on that item. If it does get this lock, it must tell the sites with a copy that it has the global lock.
3) If the transaction takes too long to receive the global lock, it must stop trying to get it.
4) The only drawback is that it requires a lot more communication.