Week 9 L1 Flashcards

1
Q

Problems with traditional database?

A

Single point of failure if machine, storage, or network breaks
Must scale up vertically to bigger machine.
Expensive, inflexible, one way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we mean by database on cluster?

A

Hundreds of connected commodity machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advantages of database on cluster

A

Data replicated across machines to provide resilience.
No SPOF: replicas on other nodes available
Can scale out horizontally by adding more machines (cheaper, flexible, scale in or out e.g. rent cloud services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why replicate data?

A

Resilience: databases and networks fail, but business must continue as normal
Performance improves to some extent.
By adding access to local replica or by balancing of workload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is synchronous replication?

A

All replicas updated on every write.
Reads are guaranteed to be up to date so safe to read from any node.
A read must wait for all machines to be updated, so can be to slow for some applications.
Only used if reads MUST be up to date.
Works best for fewer writes e.g., online banking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Asynchronous replication

A

Writes propogates as soon as possible, but reads do not wait
This means read can be out of date
Eventual consistency
Works well if reads can be a little out of date e.g. social media posts
These methods include primary site , or peer to peer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is primary site method?

A

Used by mongo db
One replica is primary node and other nodes are secondary nodes.
All writes go to primary nodes and then propagated to secondaries.
Secondaries can be read but not written.
Not SPOF if primary fails other select new primary.
Reading from primary gives strict consistency, whereas reading from secondary may be stale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is eventual consistency useful?

A

Reads spread across multiple secondary nodes, this increases performance.
Offline analytics can read historical data from secondary node to avoid overloading primary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the peer to peer method?When is it used ?

A

All nodes are allowed to accept reads and writes ( no primary nodes)
This reduces latency in systems with high write rate (no primary node bottleneck)
Can cause inconsistency problems – say where two peers receive conflicting updates.
Used for high velocity write once apps or where data has one owner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is sharding?

A

Partitioning a database into subsets of data so the data is spread across the nodes in a cluster.
Might split data by location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages of sharding

A

paritions the database into subsets of data

allowing the data to be spread across nodes in a cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a sharding key?

A

This determines the distribution of records among the shards.
Based on one or more chosen fields
Shard keys can be chosen manually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to choose a sharding key?

A

Sharding key must appear in every document.
Key should be splitable with high granularity.
Key should be uniformly distributed across records.
Key should relate to queries for fast performance.
Use compound key if no single key is suitable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is query isolation?

A

For a query where key values determine a single shard, read and writes are faster.
For queries that don’t include shard key, then all shards must be polled so these queries take longer to complete.
Knowledge of significant queries for application is important for choosing shard key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ranged bases sharding keys

A

divides data into contiguous ranges determined by shared key value. Documents with close shard key values are likely to be on the same chunk or shard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hash based sharding keys

A

use hashed index of single fields as the shard key to partition data across sharded cluster. Provides more even data distribution across the shared cluster.