Data Engineering Fundamentals - Data Skew Techniques Flashcards
1
Q
______________ unequal distribution or imbalance of data across various nodes or partitions in distributed computing systems.
a) indexing
b) Partitioning
c) Data Skew
A
Data Skew
Even partitioning doesn’t work if your traffic is uneven.
Like IMDB - actor id and has and distribute across partitions. Whatever partition has more traffic will overload.
2
Q
Non __________ distribution of data.
Is a cause for data skew.
A
Uniform.
3
Q
Inadequate partitioning strategy, Temporal skew.
Are causes for ___________
a) Data Skew
b) Indexing
A
Data Skew
4
Q
Adaptive Partitioning,
Salting
RePartitioning,
Sampling,
Custom Partitioning
are ways to address _____________
a) indexing
b) Data Skew
A
Data Skew
5
Q
A