Data Engineering Fundamentals - Data Skew Techniques Flashcards

1
Q

______________ unequal distribution or imbalance of data across various nodes or partitions in distributed computing systems.

a) indexing
b) Partitioning
c) Data Skew

A

Data Skew

Even partitioning doesn’t work if your traffic is uneven.
Like IMDB - actor id and has and distribute across partitions. Whatever partition has more traffic will overload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Non __________ distribution of data.

Is a cause for data skew.

A

Uniform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inadequate partitioning strategy, Temporal skew.

Are causes for ___________

a) Data Skew
b) Indexing

A

Data Skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Adaptive Partitioning,
Salting
RePartitioning,
Sampling,
Custom Partitioning

are ways to address _____________

a) indexing
b) Data Skew

A

Data Skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly