Data Ingestion & Transformation Flashcards
Which is a feature of Azure Synapse pipelines?
Monitoring of Spark Jobs for data flow
This is a feature of Azure Synapse pipelines.
What is Kusto?
A query that allows you to interact with data
Kusto is a query language that allows you to interact with data.
What is the definition of “wrangling data flow?”
Utilizing Power Query for code-free data interpretation
This is a great definition for wrangling data flow.
What does “shredding JSON” mean?
Parsing data into columns
Yes! This is the definition of Shredding JSON.
You want to create a Spark linked service in Data Factory. What do you need to do to create a Spark cluster?
Nothing; it is automatically created for you just-in-time by Data Factory
To create a Spark cluster in Data Factory, you don’t need to do anything. It is automatically created for you just-in-time by Data Factory.
What are some uses for T-SQL?
A) Perform code-free transformations on data types, or create aggregates.
B) Perform orchestration services, such as creating alets or amonitoring data pipeline activities.
C) Filter or alter data and return the query results as a data table
D) Create tables for results or save datasets.
C) Filter or alter data and return the query results as a data table
D) Create tables for results or save datasets.
This is a use for T-SQL.
This is a use of T-SQL.
Which activity is NOT possible with Azure Data Factory?
A) Data streaming
B) Data movement
C) Control
D) Data transformation
A) Data streaming.
This is NOT a valid activity with Azure Data Factory.
What is a conditional split?
Routes data rows to particular streams based on specified conditions.
This is the definition of a conditional split.
Choose 2 ways we go about cleansing data.
A) Use the Clean Missing Data module
B) Spark Cleaner Hive
C) Data wrangling services
D) Mapping data flows
A) Use the Clean Missing Data module
D) Mapping data flows
This is a way to clean data.
Put the following activities in order:
Write Tests
Check Row Counts
Count Activities
Publish Pipeline
A) Write tests, publish pipeline, count activities, and check row counts.
This is the correct order.