Distributed Databases and MapReduce Flashcards
1
Q
Data Partitioning
A
- Data is partitioned or fragmented across multiple machines
2
Q
Data Replication
A
- Copies of the same data are made available on multiple machines
3
Q
Horizontal Fragmentation
A
- Divides up the rows of a collection of records
4
Q
Vertical Fragmentation
A
- Divides up the columns of a collection of records
5
Q
Advantages of a Distributed Database
A
- Improves performance
- High availability
- Modular growth
- Integrates data from multiple existing systems
6
Q
Challenges of a Distributed Database
A
- Distributing the data
- Efficient query execution
- Maintaining integrity constraints (PK, FK, etc)
- Replicated data remains consistent
- Managing distributed transactions
7
Q
Distributed Transaction
A
- A transaction that involves data stored at multiple sites
- One site serves as the coordinator
8
Q
Synchronous Replication
A
- Transactions are guaranteed to see the most up-to-date value of an item
9
Q
Asynchronous Replication
A
- Transactions are not guaranteed to see the most up-to-date value of an item
10
Q
Primary-Site Replication
A
- One replica is designated the primary replica
- Receives all writes and updates the secondary replicas
11
Q
Peer-to-Peer Replication
A
- More than one replica can be updated
12
Q
Synchronous Replication: Read-Any, Write-All
A
- When reading an item, access any of the replicas
- When writing an item, must update all of the replicas
13
Q
Synchronous Replication: Voting
A
- n = number of copies, w = copies written, r = copies read
- Need r > n - w
14
Q
Global Locks
A
- Shared and exclusive locks for a logical item
- No two transactions can hold a global exclusive lock for the same item
- Any number of transactions can hold a global shared lock for an item
15
Q
Centralized Locking
A
- One site manages the lock requests for all items in the distributed database
- The lock site can become a bottleneck