Storage and Databases Flashcards

1
Q

Relational Databases - Basics

A

Disk types:
- SSD is faster and more expensive than HDD
- SSD is used for frequently accessed/modified data, and HDD for rarely accessed/modified data

Relational database:
- A structured database where data is stored in tabular format
- Not all relational databases support SQL

Non-relational/NoSQL database: it’s free of imposed, tabular-like structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relational Databases - ACID

A
  • Atomicity: the operations of the transaction will either succeed or all fail
  • Consistency:
    - The transaction cannot bring the database to an invalid state
    - After the transaction is committed or rolled back, the rules for each record will still apply, and all future transactions will see the effect of the transaction
  • Isolation: the execution of multiple transactions concurrently will have the same effect as if they had been executed sequentially
  • Durability: any committed transaction is written to non-volatile storage. It won’t be undone by any hardware issue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relational Databases - Indexes

A
  • Allow to perform certain queries faster
  • Can typically only exist in relational databases
  • They greatly sped up read queries with the downside of slightly longer writes, because they also take place in the relevant index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relational Databases - Consistency types and tools

A
  • Strong consistency: refers to the consistency of ACID
  • Eventual consistency:
    - Reads might return a view of the system that is stale
    - Will guarantee that the state of the database eventually reflects writes within a time period (could be seconds or minutes)
  • Tools: Postgres, MySQL, MSSQL
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Key-Value Stores

A
  • A flexible NoSQL database that’s often used for caching and dynamic configuration.
  • Tools: Etcd, Redis, ZooKeeper
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Blob (Binary Large Object) Storage

A
  • They allow to store and retrieve data based on the name of the blob (unstructured data)
  • They might be slower than KV stores but values can be MB or GB large
  • Used to store large binaries, database snapshots, images, or other static assets a website might have
  • Only giant companies have infrastructure that supports it
  • Tools: Google Cloud Storage, Amazon S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time Series database (TSDB) and Spatial database

A

Time Series database (TSDB):
- Optimized for storing and analyzing time-indexed data
- Time indexed data: data points that occur at a given moment of time
- Tools: InfluxDB, Prometheus

Spatial database:
- Optimized for storing and querying spatial data, like locations on a map
- Rely on spatial indexes like quadtree to quickly perform queries like finding all locations in the vicinity of a region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Graph Database 1

A
  • Stores data following the graph data model
  • Data entries can have explicitly defined relationships
  • Performs complex and fast queries on deeply connected data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Graph Database 2

A
  • Often preferred over relational databases when dealing with data points that naturally form a graph and have multiple levels of relationships
  • Cypher:
    - It’s a graph query language developed for the Neo4j graph database
    - It’s the standard to be used in graph databases
  • Tools: Neo4j
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Quadtree 1

A
  • A tree data structure that’s most commonly used to index two-dimensional spatial data
  • Each node has either zero (a leaf node) or four childer nodes
  • Nodes:
    - Contain some form of spatial data, like locations on a map, with a specified maximum capacity
    - When nodes aren’t at capacity they remain as leaf nodes
    - Once they reach capacity, they are given four children nodes, and their data entries are split between those children
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quadtree 2

A
  • Good to query spatial data:
    - It can be represented as a grid filled with rectangles that are recursively divided into four sub-rectangles
    - Each node is represented by a rectangle, which represents a spatial region
  • Finding a location in a perfect quadtree runs in log(4)(x), where x is the total number of locations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Replication

A
  • Act of duplicating data from one database server to others
  • Most of the time used to increase redundancy and fault tolerance of regions or other types of locations
  • Other times to move data closer to clients to decrease latency of specific data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sharding - Basics

A
  • Sometimes called data partitioning. Act of splitting a database into two or more pieces called shards
  • Typically done to increase the throughput of the database
  • A reverse proxy is usually used to route requests from application servers to database shards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sharding - Strategies

A
  • Based on the client’s region
  • Based on the type of data being stored. For example, user data gets stored in one shard, payment data gets stored in another shard
  • Based on the hash of a column. Only for structured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Replication and Sharding - Hot Spot

A
  • When distributing a workload across a set of servers, that workload spreads unevenly
  • This can happen if the ‘sharding key’ or ‘hashing function’ are suboptimal, or if the workload is naturally skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly