Partitioning, Indexes, Proxies Flashcards

1
Q

Data Partitioning

A

The process of dividing a large database into smaller, more manageable parts (called partitions or shards)

data is partitioned based on criteria such as data range, data size, data type.

each partition is assigned to a separate processing node, which can perform operations on its assigned data subset independently of the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we use data partitioning

A
  • improve performance and scalability of large-scale data processing applications
  • balances workload across multiple servers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Horizontal Partitioning

A

Horizontal Partitioning = Sharding

partitions data into sets of rows

need to be careful what you partition by to avoid unbalanced servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Vertical Partitioning

A

Vertical partitioning splits a database table into multiple partitions wherein each partition is a set of columns

This can reduce the amount of data that needs to be scanned and prevents us from frequently accessing data that’s not needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hybrid Partitioning

A

Combines vertical partitioning and sharding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Partition Criteria

A

the facts or criteria used to divide a large dataset into smaller parts or partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Key or Hash Based Partitioning

A

A criteria used for partitioning data

We apply a hash function to some key attributes of the entity, that yields that partition number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List Partitioning

A

Each partition is assigned a list of values, so when we want to insert a new record, we see which partition includes our key and store it there

for example we can store all data on US and Canadian users in the North American Partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consistent Hashing

A

A hashing scheme used in distributed systems

represents requestors and servers in a virtual ring known as a hashring

this keep the hash table independent from the number of servers available –> this minimizes key relocation when changes to scale occur, for example when more servers are added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Common problems with data partitioning

A

JOINS
Joins that span database partitions (which are spread across different machines) will be slow - denormalizing can help with this

REFERENTIAL INTEGRITY
issues with relationships between tables, especially when a row with a foreign key is deleted from one table but not another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Database Indexing

A

Indexes make it faster to search a table for the row or rows that we want

An index is a data structure that can be perceived as a table of contents

Indexes make reads faster, but writes slower because we must update the indexes when inserting new data into the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Proxy Server

A

Proxy by default means “Forward” Proxy

A proxy server is an intermediate piece of software or hardware that sits between the client and the server to facillitate traffic.

Makes requests on behalf of the client, anonymizing the client.

Proxies are used to cache data, filter requests, log requests, or transform requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Collapsed Forwarding

A

When a proxy combines the same data access requests into one request to prevent reading the same data from disk more than once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Reverse Proxy

A

A reverse proxy anonymizes the server

Can be used for caching, load balancing or routing requests to appropriate resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly