Untitled Deck Flashcards

1
Q

How do you profile and debug slow-running Spark jobs?

A

Use tools like Spark UI, logs, and metrics to identify performance issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the common bottlenecks in distributed systems?

A

Network latency, data serialization, and resource contention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does caching improve data processing performance?

A

Caching reduces the need to recompute data, leading to faster access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between vertical scaling and horizontal scaling?

A

Vertical scaling adds resources to a single node, while horizontal scaling adds more nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you monitor the performance of ETL pipelines?

A

Use monitoring tools to track execution time, data quality, and resource usage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you handle high-cardinality data in distributed systems?

A

Use techniques like data partitioning and indexing to manage high-cardinality data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What techniques do you use to minimize latency in data pipelines?

A

Implement batching, parallel processing, and efficient data serialization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the role of job checkpointing in streaming data systems.

A

Checkpointing saves the state of a job to recover from failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you optimize resource allocation in tools like EMR or Dataproc?

A

Use auto-scaling, instance types, and configuration tuning to optimize resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some strategies for reducing costs when processing large-scale data on the cloud?

A

Use spot instances, optimize data storage, and schedule jobs during off-peak hours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly