Batch Processing Flashcards

1
Q

In designing a Data Factory pipeline, it is important that you create a webhook process if your pipeline activity is successfully run. Which activity would be the best choice?

A

Success

Success triggers an activity upon the successful completion of a previous activity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You need to configure a pipeline trigger to perform at fixed intervals starting from last week. Whichis the best option?

A

Tumbling

Tumbling window is the best option. The key is historical data. Tumbling windows will let you run historical data as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You need to quickly update a database by examining and loading the difference. What’s the best solution?

A

Incremental data loading with watermark

This would load the difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You need to create a pipeline step to insert select rows into a database table if they don’t exist, or update them if they do. Which data loading type would be the best solution?

A

Upserting

Upserting would be the best data loading type for this scenario.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are working in Databricks and need to change your programming language from Python to SQL for a single cell. Is this possible? And if so, how would you complete this task?

A

You would start the cell is %SQL.

Starting the cell with %SQL is the best way to change your programming language from Python to SQL for a single cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which scenario would NOT lend itself to batch processing?

A) When you have complex transformation requirements and need low cost solutions

B) When working with Data Factory

C) When data is not required immediately

D) When you need data in real-time

A

D) When you need data in real-time

For real-time data, you would need to implement a streaming solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is handling schema drift important?

A

If not handled, it can lead to a complete pipeline breakdown.

If schema drift is not handled, it can lead to a complete breakdown in pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What would be a good reason to use data flows?

A

You need a visual, no-code solution to implement transformational logic in Azure Data Factory.

This is an excellent reason to use data flows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You have been experiencing issues with code-breaking deployments in production. What are two advantages of implementing GitHub in Data Factory?

A

Source Control
Increased collaboration

This is a definite advantage. Source control is also an advantage of using GitHub.

This is a definite advantage. Increased collaboration is also an advantage of using GitHub.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly