Distributed Databases and MapReduce Flashcards

1
Q

Data Partitioning

A
  • Data is partitioned or fragmented across multiple machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Replication

A
  • Copies of the same data are made available on multiple machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Horizontal Fragmentation

A
  • Divides up the rows of a collection of records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Vertical Fragmentation

A
  • Divides up the columns of a collection of records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Advantages of a Distributed Database

A
  • Improves performance
  • High availability
  • Modular growth
  • Integrates data from multiple existing systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Challenges of a Distributed Database

A
  • Distributing the data
  • Efficient query execution
  • Maintaining integrity constraints (PK, FK, etc)
  • Replicated data remains consistent
  • Managing distributed transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distributed Transaction

A
  • A transaction that involves data stored at multiple sites
  • One site serves as the coordinator
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Synchronous Replication

A
  • Transactions are guaranteed to see the most up-to-date value of an item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Asynchronous Replication

A
  • Transactions are not guaranteed to see the most up-to-date value of an item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Primary-Site Replication

A
  • One replica is designated the primary replica
  • Receives all writes and updates the secondary replicas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Peer-to-Peer Replication

A
  • More than one replica can be updated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Synchronous Replication: Read-Any, Write-All

A
  • When reading an item, access any of the replicas
  • When writing an item, must update all of the replicas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Synchronous Replication: Voting

A
  • n = number of copies, w = copies written, r = copies read
  • Need r > n - w
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Global Locks

A
  • Shared and exclusive locks for a logical item
  • No two transactions can hold a global exclusive lock for the same item
  • Any number of transactions can hold a global shared lock for an item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Centralized Locking

A
  • One site manages the lock requests for all items in the distributed database
  • The lock site can become a bottleneck
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Primary-Copy Locking

A
  • One copy of an item is designated the primary copy
  • The site holding the primary copy handles all lock requests for that item
17
Q

Fully Distributed Locking

A
  • A transaction acquires a global lock for an item by locking a sufficient number of the item’s copies
  • n = total copies, x = number locked for global exclusive lock, s = number locked for global shared lock
  • Need x > n / 2
  • Need s > n - x
18
Q

Distributed Deadlock Handling

A
  • Difficult to detect deadlock, so roll back a transaction if it waits too long (timeout)
19
Q

MapReduce

A
  • Splits the collection of records into subcollections that are processed in parallel
20
Q

Benefits of MapReduce

A
  • Parallel processing
  • Fewer data transfer across machines
  • Fault tolerance
21
Q

Mapper

A
  • Applies a map function to each record to create (key, value) pairs
22
Q

Reducer

A
  • Applies a reduce function to each (key, value list)
23
Q

Chaining MapReduce Jobs

A
  • Map the reduced results of the first job to an arbitrary constant key to create one reducer task