Partitioning, Indexes, Proxies Flashcards
Data Partitioning
The process of dividing a large database into smaller, more manageable parts called partitions or shards)
data is partitioned based on criteria such as data range, data size, data type.
each partition is assigned to a separate processing node, which can perform operations on its assigned data subset independently of the others
Why do we use data partitioning
- improve performance and scalability of large-scale data processing applications
- balances workload across multiple servers
Horizontal Partitioning
Horizontal Partitioning = Sharding
partitions data into sets of rows
Vertical Partitioning
Vertical partitioning splits a database table into multiple partitions wherein each partition is a set of columns
This can reduce the amount of data that needs to be scanned and prevents us from frequently accessing data that’s not needed
Hybrid Partitioning
Combines vertical partitioning and sharding
Partition Criteria
the facts or criteria used to divide a large dataset into smaller parts or partitions
Consistent Hashing
A hashing scheme used in distributed systems
represents requestors and servers in a virtual ring known as a hashring
this keep the hash table independent from the number of servers available –> this minimizes key relocation when changes to scale occur, for example when more servers are added
Common problems with data partitioning
JOINS
Joins that span database partitions (which are spread across different machines) will be slow
REFERENTIAL INTEGRITY
issues with relationships between tables, especially when a row with a foreign key is deleted from one table but not another
Database Indexing
Indexes make it faster to search a table for the row or rows that we want
An index is a data structure that can be perceived as a table of contents
Indexes make reads faster, but writes slower because we must update the indexes when inserting new data into the table
Proxy Server
Proxy by default means “Forward” Proxy
A proxy server is an intermediate piece of software or hardware that sits between the client and the server to facillitate traffic.
Makes requests on behalf of the client, anonymizing the client.
Proxies are used to cache data, filter requests, log requests, or transform requests.
Collapsed Forwarding
When a proxy combines the same data access requests into one request to prevent reading the same data from disk more than once
Reverse Proxy
A reverse proxy anonymizes the server
Can be used for caching, load balancing or routing requests to appropriate resources