Week 9 L1 Flashcards

Question 1

Q

Problems with traditional database?

Answer

A

Single point of failure if machine, storage, or network breaks
Must scale up vertically to bigger machine.
Expensive, inflexible, one way

Question 2

Q

What do we mean by database on cluster?

Answer

A

Hundreds of connected commodity machines

Question 3

Q

Advantages of database on cluster

Answer

A

Data replicated across machines to provide resilience.
No SPOF: replicas on other nodes available
Can scale out horizontally by adding more machines (cheaper, flexible, scale in or out e.g. rent cloud services.

Question 4

Q

Why replicate data?

Answer

A

Resilience: databases and networks fail, but business must continue as normal
Performance improves to some extent.
By adding access to local replica or by balancing of workload

Question 5

Q

What is synchronous replication?

Answer

A

All replicas updated on every write.
Reads are guaranteed to be up to date so safe to read from any node.
A read must wait for all machines to be updated, so can be to slow for some applications.
Only used if reads MUST be up to date.
Works best for fewer writes e.g., online banking

Question 6

Q

What is Asynchronous replication

Answer

A

Writes propogates as soon as possible, but reads do not wait
This means read can be out of date
Eventual consistency
Works well if reads can be a little out of date e.g. social media posts
These methods include primary site , or peer to peer.

Question 7

Q

What is primary site method?

Answer

A

Used by mongo db
One replica is primary node and other nodes are secondary nodes.
All writes go to primary nodes and then propagated to secondaries.
Secondaries can be read but not written.
Not SPOF if primary fails other select new primary.
Reading from primary gives strict consistency, whereas reading from secondary may be stale.

Question 8

Q

Why is eventual consistency useful?

Answer

A

Reads spread across multiple secondary nodes, this increases performance.
Offline analytics can read historical data from secondary node to avoid overloading primary.

Question 9

Q

What is the peer to peer method?When is it used ?

Answer

A

All nodes are allowed to accept reads and writes ( no primary nodes)
This reduces latency in systems with high write rate (no primary node bottleneck)
Can cause inconsistency problems – say where two peers receive conflicting updates.
Used for high velocity write once apps or where data has one owner.

Question 10

Q

What is sharding?

Answer

A

Partitioning a database into subsets of data so the data is spread across the nodes in a cluster.
Might split data by location.

Question 11

Q

Advantages of sharding

Answer

A

paritions the database into subsets of data

allowing the data to be spread across nodes in a cluster

Question 12

Q

What is a sharding key?

Answer

A

This determines the distribution of records among the shards.
Based on one or more chosen fields
Shard keys can be chosen manually

Question 13

Q

How to choose a sharding key?

Answer

A

Sharding key must appear in every document.
Key should be splitable with high granularity.
Key should be uniformly distributed across records.
Key should relate to queries for fast performance.
Use compound key if no single key is suitable.

Question 14

Q

What is query isolation?

Answer

A

For a query where key values determine a single shard, read and writes are faster.
For queries that don’t include shard key, then all shards must be polled so these queries take longer to complete.
Knowledge of significant queries for application is important for choosing shard key.

Question 15

Q

Ranged bases sharding keys

Answer

A

divides data into contiguous ranges determined by shared key value. Documents with close shard key values are likely to be on the same chunk or shard.

Question 16

Q

Hash based sharding keys

Answer

Study These Flashcards

A

use hashed index of single fields as the shard key to partition data across sharded cluster. Provides more even data distribution across the shared cluster.

Week 9 L1 Flashcards

(16 cards)