6. Programming (Python, Scala, Java) Flashcards

1
Q

How do you read and process large CSV or JSON files using Python?

A

You can use libraries like Pandas for CSV files and the built-in json module for JSON files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Python libraries for data processing, like Pandas and PySpark?

A

Pandas is used for data manipulation and analysis, while PySpark is used for large-scale data processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain how you would handle schema evolution in JSON data.

A

You can manage schema evolution by using flexible data structures and versioning your schemas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How would you implement a sliding window algorithm in Python for streaming data?

A

You can use collections.deque to maintain a fixed-size window of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you manage memory optimization in PySpark?

A

You can optimize memory usage by tuning Spark configurations and using efficient data formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Python decorators, and when would you use them in ETL scripts?

A

Decorators are functions that modify the behavior of other functions. They can be used for logging or performance monitoring in ETL scripts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How would you parallelize data processing in Python?

A

You can use the multiprocessing module or libraries like Dask to parallelize data processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does Scala handle immutability and parallel processing in Spark?

A

Scala uses immutable collections and functional programming principles to enable safe parallel processing in Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the purpose of Python generators in handling large datasets.

A

Generators allow you to iterate over large datasets without loading them entirely into memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you handle exceptions in Python for robust ETL workflows?

A

You can use try-except blocks to catch and handle exceptions, ensuring the ETL process continues smoothly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly