Challenges with Relational Database Flashcards

1
Q

Throw hardware at the problem by adding more memory, adding faster processors, and upgrading disks

A

Vertical Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens when you max out on vertical scaling? And what are the new challenges you face because of that?

A

Add hardware in the form of additional boxes in a database cluster. Now we have new problems like:

  1. Data Replication
  2. Consistency during regular usage and failover times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the problem that arises when keeping a caching layer in front of a database?

A

Consistency problem between updates in cache and updates in database, which is even more amplified in a cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 main reasons why relational databases were successful?

A
  1. SQL

2. Transactions and ACID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is atomic in ACID?

A

All or nothing. The transfer requires a debit from one account and a credit to another account. This operation cannot be subdivided; they must both succeed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is consistent in ACID?

A

Data moves from one correct state to another correct state. if a transaction attempts to delete a customer and their order history, it cannot leave order rows that reference the deleted customer’s primary key; this is an inconsistent state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is isolate in ACID?

A

Transactions executing concurrently will not become entangled with each other; they each execute in their own space. If two different transactions attempt to modify the same data at the same time, then one of them will have to wait for the other to complete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is durable in ACID?

A

Once a transaction has succeeded, the changes will not be lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we handle transactions spanning multiple shards of a database?

A

A transaction manager to orchestrate across the multiple nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is 2 phase commit?

A

It is used to achieve consensus in distributed systems with 2 sets of interactions between hosts known as Prepare phase and commit phase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of operations is suitable for 2 PC?

A

It is useful only for operations that can complete very quickly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does 2 phase commit works?

A

Two-phase commit blocks. clients (“competing consumers”) must wait for a prior transaction to finish before they can access the blocked resource.
The protocol will wait for a node to respond, even if it has died. Possible to avoid waiting forever in this event, because a timeout can be set that allows the transaction coordinator node to decide that the node isn’t going to respond and that it should abort the transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is there any chance of an infinite loop in 2 PC?

A

Yes. A node can send a message to the transaction coordinator node agreeing that it’s OK for the coordinator to commit the entire transaction. The node will then wait for the coordinator to send a commit response (or a rollback response if, say, a different node can’t commit); if the coordinator is down in this scenario, that node conceivably will wait forever.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is the shortcoming of 2 PC handled?

A

With an idea of compensation. The operation is immediately committed, and then in the event that some error is reported, a new operation is invoked to restore the proper state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How the database schema gets polluted with complex relationships?

A

For many-many relation, we have to join the table. you have to create a join table. This pollutes a pristine data model, where you’d prefer to just have students and courses. forces you to create more complex SQL statements to join these tables together. The join statements, in turn, can be slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Three strategies for determining shard structure??

A
  1. Feature-based Shard
  2. Key-based Shard
  3. Lookup-Table
17
Q

What is feature-based sharding?

A

the data is split not by dividing records in a single table, but rather by splitting into separate databases the features that don’t overlap with each other very much. E.g the users are in one shard, and the items for sale are in another

18
Q

What is key-based sharding?

A

you find a key in your data that will evenly distribute it across shards

19
Q

What is lookup table sharding?

A

A.K.A directory-based sharding. one of the nodes in the cluster acts as a “Yellow Pages” directory. and looks up which node has the data you’re trying to access.
Disadvantage:
1. additional hop.
2.single point of failure.

20
Q

what is shard-nothing architecture?

A

When there is no shared state but each node in the distributed system is independent. there is no client contention for shared resources. E.g Bigtable database, MapReduce Implementation or Cassandra.

21
Q

How is scalability handled in RDBMS?

A

We often need to find a way to get rid of your joins, which means denormalizing the data, which means maintaining multiple copies of data, and seriously disrupting your design, both in the database and in your application.

22
Q

Different types of NoSQL DB?

A
  1. Key-Value Store: Redis
  2. Column Stores: Wide column store or column-oriented store. Data is stored by column rather than by row. e.g all customer addresses might be stored together, allowing them to be retrieved in a single query. (Apache Hbase, Apache Kudu, Apache Druid)
  3. Document Store: MongoDB, CouchDB
  4. Graph Database: Neo4j
  5. Object Database- Amazon S3
  6. Multimodel Database: They are based on one primary underlying database(Most often relational, key-value, or column store) and expose additional models as an API on top of that underlying database.
    e. g Azure Cosmos: Exposes document, wide column, and graph API on top of key-value store.
23
Q

What is polyglot persistence?

A

Multimodel database