Storage and Databases Flashcards

Question 1

Q

Relational Databases - Basics

Answer

A

Disk types:
- SSD is faster and more expensive than HDD
- SSD is used for frequently accessed/modified data, and HDD for rarely accessed/modified data

Relational database:
- A structured database where data is stored in tabular format
- Not all relational databases support SQL

Non-relational/NoSQL database: it’s free of imposed, tabular-like structure

Question 2

Q

Relational Databases - ACID

Answer

A

Atomicity: the operations of the transaction will either succeed or all fail
Consistency:
- The transaction cannot bring the database to an invalid state
- After the transaction is committed or rolled back, the rules for each record will still apply, and all future transactions will see the effect of the transaction
Isolation: the execution of multiple transactions concurrently will have the same effect as if they had been executed sequentially
Durability: any committed transaction is written to non-volatile storage. It won’t be undone by any hardware issue

Question 3

Q

Relational Databases - Indexes

Answer

A

Allow to perform certain queries faster
Can typically only exist in relational databases
They greatly sped up read queries with the downside of slightly longer writes, because they also take place in the relevant index

Question 4

Q

Relational Databases - Consistency types and tools

Answer

A

Strong consistency: refers to the consistency of ACID
Eventual consistency:
- Reads might return a view of the system that is stale
- Will guarantee that the state of the database eventually reflects writes within a time period (could be seconds or minutes)
Tools: Postgres, MySQL, MSSQL

Question 5

Q

Key-Value Stores

Answer

A

A flexible NoSQL database that’s often used for caching and dynamic configuration.
Tools: Etcd, Redis, ZooKeeper

Question 6

Q

Blob (Binary Large Object) Storage

Answer

A

They allow to store and retrieve data based on the name of the blob (unstructured data)
They might be slower than KV stores but values can be MB or GB large
Used to store large binaries, database snapshots, images, or other static assets a website might have
Only giant companies have infrastructure that supports it
Tools: Google Cloud Storage, Amazon S3

Question 7

Q

Time Series database (TSDB) and Spatial database

Answer

A

Time Series database (TSDB):
- Optimized for storing and analyzing time-indexed data
- Time indexed data: data points that occur at a given moment of time
- Tools: InfluxDB, Prometheus

Spatial database:
- Optimized for storing and querying spatial data, like locations on a map
- Rely on spatial indexes like quadtree to quickly perform queries like finding all locations in the vicinity of a region

Question 8

Q

Graph Database 1

Answer

A

Stores data following the graph data model
Data entries can have explicitly defined relationships
Performs complex and fast queries on deeply connected data

Question 9

Q

Graph Database 2

Answer

A

Often preferred over relational databases when dealing with data points that naturally form a graph and have multiple levels of relationships
Cypher:
- It’s a graph query language developed for the Neo4j graph database
- It’s the standard to be used in graph databases
Tools: Neo4j

Question 10

Q

Quadtree 1

Answer

A

A tree data structure that’s most commonly used to index two-dimensional spatial data
Each node has either zero (a leaf node) or four childer nodes
Nodes:
- Contain some form of spatial data, like locations on a map, with a specified maximum capacity
- When nodes aren’t at capacity they remain as leaf nodes
- Once they reach capacity, they are given four children nodes, and their data entries are split between those children

Question 11

Q

Quadtree 2

Answer

A

Good to query spatial data:
- It can be represented as a grid filled with rectangles that are recursively divided into four sub-rectangles
- Each node is represented by a rectangle, which represents a spatial region
Finding a location in a perfect quadtree runs in log(4)(x), where x is the total number of locations

Question 12

Q

Replication

Answer

A

Act of duplicating data from one database server to others
Most of the time used to increase redundancy and fault tolerance of regions or other types of locations
Other times to move data closer to clients to decrease latency of specific data

Question 13

Q

Sharding - Basics

Answer

A

Sometimes called data partitioning. Act of splitting a database into two or more pieces called shards
Typically done to increase the throughput of the database
A reverse proxy is usually used to route requests from application servers to database shards

Question 14

Q

Sharding - Strategies

Answer

A

Based on the client’s region
Based on the type of data being stored. For example, user data gets stored in one shard, payment data gets stored in another shard
Based on the hash of a column. Only for structured data

Question 15

Q

Replication and Sharding - Hot Spot

Answer

A

When distributing a workload across a set of servers, that workload spreads unevenly
This can happen if the ‘sharding key’ or ‘hashing function’ are suboptimal, or if the workload is naturally skewed

Storage and Databases Flashcards

(15 cards)