Misc Flashcards

1
Q

You can use SSAS data source in an ADF Copy activity

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ADF Copy activity can invoke the Polybase feature to load Azure synapse analytics SQL pool

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You can implement incremental load from Azure SQL database by using change tracking combined with an ADF copy activity

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which type of transactional database system would work best for product data?

A

OLTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suppose a retailer’s operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen?

A

One order will be processed and use the in-store credit, and the other order won’t be processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following describes a good strategy for creating storage accounts and blob containers for your application?

  • Create both your Azure Storage accounts and containers before deploying your application.
  • Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
  • Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
A

Create Azure Storage accounts before deploying your app. Create containers in your application as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following can be used to initialize the Blob Storage client library within an application?

  • An Azure username and password.
  • The Azure Storage account connection string.
  • A globally-unique identifier (GUID) that represents the application.
  • The Azure Storage account datacenter and location identifiers.
A

The Azure Storage account connection string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob?

  • A new block blob is created in storage.
  • A BlobClient object is created locally. No network calls are made.
  • An exception is thrown if the blob does not exist in storage.
  • The contents of the named blob are downloaded.
A

A BlobClient object is created locally. No network calls are made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which is the default distribution used for a table in Synapse Analytics?

HASH.

Round-Robin.

Replicated Table.

A

Round-Robin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which Index Type offers the highest compression?

Columnstore.

Rowstore.

Heap.

A

Columnstore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do column statistics improve query performance?

By keeping track of which columns are being queried.

By keeping track of how much data exists between ranges in columns.

By caching column values for queries.

A

By keeping track of how much data exists between ranges in columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used?

Python.

SQL.

Scala.

A

Scala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool?

Use the integrated notebook experience from Azure Synapse Studio.

Use the PySpark connector.

Use token-based authentication.

A

Use the integrated notebook experience from Azure Synapse Studio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which language can be used to define Spark job definitions?

Transact-SQL

PowerShell

PySpark

A

PySpark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What Transact-SQL function verifies if a piece of text is valid JSON?

JSON_QUERY

JSON_VALUE

ISJSON

A

ISJSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What Transact-SQL function is used to perform a HyperLogLog function?

APPROX_COUNT_DISTINCT

COUNT_DISTINCT_APPROX

COUNT

A

APPROX_COUNT_DISTINCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?

SCALE.

MODIFY

CHANGE.

A

MODIFY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which workload management feature influences the order in which a request gets access to resources?

Workload classification.

Workload importance.

Workload isolation.

A

Workload importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which Dynamic Management View enables the view the active connections against a dedicated SQL pool?

sys. dm_pdw_exec_requests.
sys. dm_pdw_dms_workers.

DBCC PDW_SHOWEXECUTIONPLAN.

A

sys.dm_pdw_exec_requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions?

Grouping the data based on partitions and counting rows with a T-SQL query.

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

Monitor query speeds by testing the same query for each partition.

A

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters?

VARCHAR(MAX)

VARCHAR(128)

NVARCHAR(128)

A

VARCHAR(128)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which of the following statements is a benefit of materialized views?

Reducing the execution time for complex queries with JOINs and aggregate functions.

Increased resiliency benefits.

Increased high availability.

A

Reducing the execution time for complex queries with JOINs and aggregate functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints is greyed out. Why is the option not available?

Azure Synapse Studio does not support the creation of private endpoints.

A Conditional Access policy has to be defined first.

A managed virtual network has not been created.

A

A managed virtual network has not been created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use?

Storage account keys.

Shared access signatures.

Managed identities.

A

Managed identities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which definition best describes Apache Spark?

A highly scalable relational database management system.

A virtual server with a Python runtime.

A distributed platform for parallel data processing using multiple languages.

A

A distributed platform for parallel data processing using multiple languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

You need to use Spark to analyze data in a parquet file. What should you do?

Load the parquet file into a dataframe.

Import the data into a table in a serverless SQL pool.

Convert the data to CSV format.

A

Load the parquet file into a dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use?

%%spark

%%pyspark

%%sql

A

%%sql

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which of the following descriptions best fits Delta Lake?

A Spark API for exporting data from a relational database into CSV files.

A relational storage layer for Spark that supports tables based on Parquet files.

A synchronization solution that replicates data between SQL pools and Spark pools.

A

A relational storage layer for Spark that supports tables based on Parquet files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

You’ve loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage?

CSV

PARQUET

DELTA

A

DELTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What feature of Delta Lake enables you to retrieve data from previous versions of a table?

Spark Structured Streaming

Time Travel

Catalog Tables

A

Time Travel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen?

The table metadata and data files will be deleted.

The table metadata will be removed from the catalog, but the data files will remain intact.

The table metadata will remain in the catalog, but the data files will be deleted.

A

The table metadata and data files will be deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When using Spark Structured Streaming, a Delta Lake table can be which of the following?

Only a source

Only a sink

Either a source or a sink

A

Either a source or a sink

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is one of the possible ways to optimize an Apache Spark Job?

Remove all nodes.

Remove the Apache Spark Pool.

Use bucketing.

A

Use bucketing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What can cause a slower performance on join or shuffle jobs?

Data skew.

Enablement of autoscaling

Bucketing.

A

Data skew.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture.

Business applications store data in an operational data store, which is also used to support analytical queries for reporting.

Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.

Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.

A

Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB core (SQL) API container. Which Azure Synapse Link service should you use?

Azure Synapse Link for SQL

Azure Synapse Link for Dataverse

Azure Synapse Link for Cosmos DB

A

Azure Synapse Link for Cosmos DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored?

In an Azure Synapse dedicated SQL pool

In an Azure Data Lake Gen2 storage container.

In an Azure Cosmos DB container.

A

In an Azure Data Lake Gen2 storage container.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

You have an Azure Cosmos DB core (SQL) account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics?

Configure global replication in Azure Cosmos DB.

Create a dedicated SQL pool in Azure Synapse Analytics.

Enable Azure Synapse Link in Azure Cosmos DB.

A

Enable Azure Synapse Link in Azure Cosmos DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics?

Delete and recreate the container.

Enable Azure Synapse Link in the container to create an analytical store.

Add an item to the container.

A

Enable Azure Synapse Link in the container to create an analytical store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Cosmos DB. What must you do?

Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.

Disable automatic pausing for the Spark pool in Azure Synapse Analytics.

Install the Azure Cosmos DB SDK for Python package in the Spark pool.

A

Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

You’re writing PySpark code to load data from a Cosmos DB analytical store into a dataframe. What format should you specify?

cosmos. json
cosmos. olap
cosmos. sql

A

cosmos.olap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

You’re writing a SQL code in a serverless SQL pool to query an analytical store in Cosmos DB. What function should you use?

OPENDATASET

ROW

OPENROWSET

A

OPENROWSET

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics?

Azure Cosmos DB

SQL Server 2022

Azure SQL Managed Instance

A

SQL Server 2022

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What must you create in your Azure Synapse Analytics workspace to implement Azure Synapse Link for Azure SQL Database?

A serverless SQL pool

A linked service for your Azure SQL Database

A link connection for your Azure SQL Database

A

A link connection for your Azure SQL Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create?

Azure Data Lake Storage Gen2

Azure Key Vault

Azure Application Insights

A

Azure Data Lake Storage Gen2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How many drivers does a Cluster have?

Only one

Two, running in parallel

Configurable between one and eight

A

Only one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur?

The Executor and the Slot

The Driver and the Executor

The Slot and the Task

A

The Executor and the Slot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What type of process are the driver and the executors?

Java processes

Python processes

C++ processes

A

Java processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Which notebook format is used in Databricks?

DBC

.notebook

.spark

A

DBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?

Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.

A

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?

Stages

Arrays

Jobs

A

Jobs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How do you list files in DBFS within a notebook?

ls /my-file-path

%fs dir /my-file-path

%fs ls /my-file-path

A

%fs ls /my-file-path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

How do you infer the data types and column names when you read a JSON file?

spark. read.option(“inferSchema”, “true”).json(jsonFile)
spark. read.inferSchema(“true”).json(jsonFile)
spark. read.option(“inferData”, “true”).json(jsonFile)

A

spark.read.option(“inferSchema”, “true”).json(jsonFile)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Which DataFrame method do you use to create a temporary view?

createTempView()

createTempViewDF()

createOrReplaceTempView()

A

createOrReplaceTempView()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

How do you create a DataFrame object?

Introduce a variable name and equate it to something like myDataFrameDF =

Use the createDataFrame() function

Use the DF.create() syntax

A

Introduce a variable name and equate it to something like myDataFrameDF =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

How do you cache data into the memory of the local executor for instant access?

.save().inMemory()

.inMemory().save()

.cache()

A

.cache()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)

IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)

IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)

A

IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Which of the following statements describes a wide transformations?

A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers

A wide transformation requires sharing data across workers. It does so by shuffling data.

A wide transformation applies data transformation over a large number of columns

A

A wide transformation requires sharing data across workers. It does so by shuffling data.

59
Q

Which feature of Spark determines how your code is executed?

Catalyst Optimizer

Tungsten Record Format

Java Garbage Collection

A

Catalyst Optimizer

60
Q

If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?

Tungsten Record Format

Java Garbage Collection

Lazy Execution

A

Lazy Execution

61
Q

Which command orders by a column in descending order?

df. orderBy(“requests desc”)
df. orderBy(“requests”).desc()
df. orderBy(col(“requests”).desc())

A

df.orderBy(col(“requests”).desc())

62
Q

Which command specifies a column value in a DataFrame’s filter? Specifically, filter by a productType column where the value is equal to book?

df. filter(col(“productType”) == “book”)
df. filter(“productType = ‘book’”)
df. col(“productType”).filter(“book”)

A

df.filter(col(“productType”) == “book”)

63
Q

When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with “ing”.

df. filter().col(“verb”).like(“%ing”)
df. filter(“verb like ‘%ing’”)
df. filter(col(“verb”).endswith(“ing”))

A

df.filter(col(“verb”).endswith(“ing”))

64
Q

Which method for renaming a DataFrame’s column is incorrect?

df. select(col(“timestamp”).alias(“dateCaptured”))
df. alias(“timestamp”, “dateCaptured”)
df. toDF(“dateCaptured”)

A

df.alias(“timestamp”, “dateCaptured”)

65
Q

You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?

df. select(col(“storefront”)).avg(“completedTransactions”)
df. groupBy(col(“storefront”)).avg(col(“completedTransactions”))
df. groupBy(col(“storefront”)).avg(“completedTransactions”)

A

df.groupBy(col(“storefront”)).avg(“completedTransactions”)

66
Q

Which statement about the Azure Databricks Data Plane is true?

The Data Plane contains the Cluster Manager and coordinates data processing jobs

The Data Plane is hosted within a Microsoft-managed subscription

The Data Plane is hosted within the client subscription and is where all data is processed and stored

A

The Data Plane is hosted within the client subscription and is where all data is processed and stored

67
Q

In which modes does Azure Databricks provide data encryption?

At-rest and in-transit

At-rest only

In-transit only

A

At-rest and in-transit

68
Q

What does Azure Data Lake Storage (ADLS) Passthrough enable?

Automatically mounting ADLS accounts to the workspace that are added to the managed resource group

User security groups that are added to ADLS are automatically created in the workspace as Databricks groups

Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials

A

Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials

69
Q

What is an Azure Key Vault-backed secret scope?

It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets

A Databricks secret scope that is backed by Azure Key Vault instead of Databricks

It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session

A

A Databricks secret scope that is backed by Azure Key Vault instead of Databricks

70
Q

What is the Databricks Delta command to display metadata?

MSCK DETAIL tablename

DESCRIBE DETAIL tableName

SHOW SCHEMA tablename

A

DESCRIBE DETAIL tableName

71
Q

How do you perform UPSERT in a Delta dataset?

Use UPSERT INTO my-table

Use UPSERT INTO my-table /MERGE

Use MERGE INTO my-table USING data-to-upsert

A

Use MERGE INTO my-table USING data-to-upsert

72
Q

What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade?

Creates an order-based index on the Grade field to improve filters against that field

Ensures that all data backing, for example, Grade=8 is colocated, then updates a graph that routes requests to the appropriate files

Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files

A

Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files

73
Q

What size does OPTIMIZE compact small files to?

Around 100 MB

Around 1 GB

Around 500 MB

A

Around 1 GB

74
Q

When doing a write stream command, what does the outputMode(“append”) option do?

The append mode allows records to be updated and changed in place

The append outputMode allows records to be added to the output sink

The append mode replaces existing records and updates aggregates

A

The append outputMode allows records to be added to the output sink

75
Q

In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?

spark. readStream
spark. read
spark. stream.read

A

spark.readStream

76
Q

What happens if the command option(“checkpointLocation”, pointer-to-checkpoint directory) is not specified?

It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict

The streaming job will function as expected since the checkpointLocation option does not exist

When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch

A

When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch

77
Q

What is a lambda architecture and what does it try to solve?

An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing

An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.

An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today

A

An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.

78
Q

What command should be issued to view the list of active streams?

Invoke spark.streams.active

Invoke spark.streams.show

Invoke spark.view.active

A

Invoke spark.streams.active

79
Q

What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?

.writeStream.format(“delta”).checkpoint(“location”, checkpointPath) …

.writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …

.writeStream.format(“parquet”).option(“checkpointLocation”, checkpointPath) …

A

.writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …

80
Q

What’s the purpose of linked services in Azure Data Factory?

To represent a data store or a compute resource that can host execution of an activity

To represent a processing step in a pipeline

To link data stores or computer resources together for the movement of data between resources

A

To represent a data store or a compute resource that can host execution of an activity

81
Q

How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?

Use the new API endpoint option on a notebook in Databricks and provide the parameter name

Use notebook widgets to define parameters that can be passed into the notebook

Deploy the notebook as a web service in Databricks, defining parameter names and types

A

Use notebook widgets to define parameters that can be passed into the notebook

82
Q

What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn’t running when the cluster is called by Data Factory?

If the target cluster is stopped, Databricks will start the cluster before attempting to execute

The Databricks activity will fail in Azure Data Factory – you must always have the cluster running

Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity

A

If the target cluster is stopped, Databricks will start the cluster before attempting to execute

83
Q

What does the CD in CI/CD mean?

Continuous Delivery

Continuous Deployment

Both are correct

A

Both are correct

84
Q

What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?

An Artifact pipeline

A Build pipeline

A Release pipeline

A

A Build pipeline

85
Q

What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?

Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

A

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

86
Q

What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?

Create a database master key and configure the firewall to enable Azure services to connect

Use a correctly formatted ConnectionString and create a database master key

Add the client IP address to the firewall’s allowed IP addresses list and use the correctly formatted ConnectionString

A

Create a database master key and configure the firewall to enable Azure services to connect

87
Q

Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?

df. write.mode(“overwrite”).option(“…”).option(“…”).save()
df. write.format(“com.databricks.spark.sqldw”).overwrite().option(“…”).option(“…”).save()
df. write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()

A

df.write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()

88
Q

What is SCIM?

An optimization that removes orphaned data from a given dataset

An open standard that enables users to bring their own auth key to the Databricks environment

An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks

A

An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks

89
Q

If mounting an Azure Data Lake Storage (ADLS) account to a workspace, what cluster feature must be used to have ACLS within ADLS applied to the user executing commands in a notebook?

Enable ADLS Passthrough on a cluster

Enable SCIM

Set spark.config.adls.impersonateuser(true)

A

Enable ADLS Passthrough on a cluster

90
Q

Mike is creating an Azure Data Lake Storage Gen2 account. He must configure this account to be able to process analytical data workloads for best performance. Which option should he configure when creating the storage account?

On the Basic tab, set the Performance option to Standard.

On the Basic Tab, set the Performance option to ON.

On the Advanced tab, set the Hierarchical Namespace to Enabled

A

On the Advanced tab, set the Hierarchical Namespace to Enabled.

91
Q

In which phase of big data processing is Azure Data Lake Storage located?

Ingestion

Store

Model & Serve

A

Store

92
Q

You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use?

CORS Support

Storage Account

Shared Access Signatures

A

Shared Access Signatures

93
Q

When configuring network access to your Azure Storage Account, what is the default network rule?

To allow all connections from all networks

To allow all connection from a private IP address range

To deny all connections from all networks

A

To allow all connections from all networks

94
Q

Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account?

Azure Defender for Storage

Azure Storage Account Security Feature

Encryption in transit

A

Azure Defender for Storage

95
Q

Which of the following technologies typically provide an ingestion point for data streaming in an event processing solution that uses static data as a source?

Azure IoT Hub

Azure Blob storage

Azure Event Hubs

A

Azure Blob storage

96
Q

To consume processed event streaming data in near-real-time to produce dashboards containing rich visualizations, which of the following services should you use?

Azure Cosmos DB

Event Hubs

Power BI

A

Power BI

97
Q

Applications that publish messages to Azure Event Hub very frequently will get the best performance using Advanced Message Queuing Protocol (AMQP) because it establishes a persistent socket.

True

False

A

True

98
Q

By default, how many partitions will a new Event Hub have?

1

2

3

4

A

4

99
Q

What is the maximum size for a single publication (individual or batch) that is allowed by Azure Event Hub?

256 KB

512 KB

1 MB

2 MB

A

1 MB

100
Q

Which of the definitions below best describes a Tumbling window?

A windowing function that clusters together events that arrive at similar times, filtering out periods of time in which there is no data.

A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.

A windowing function that groups events by identical timestamp values.

A

A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.

101
Q

Which of the following services is an invalid input for an Azure Stream Analytics job?

Blob storage

Azure Cosmos DB

Azure Event Hubs

A

Azure Cosmos DB

102
Q

Below is a list of key benefits of using Azure Stream Analytics to process streaming data. Which of the following statements is incorrect?

The ability to write and test transformation queries in the Azure portal

Being able to rapidly deploy queries into production by creating and starting an Azure Stream Analytics job

Integration with Azure Blob storage

A

Integration with Azure Blob storage

103
Q

Which technology is typically used as a staging area in a modern data warehousing architecture?

Azure Data Lake.

Azure Synapse SQL Pools.

Azure Synapse Spark Pools.

A

Azure Data Lake.

104
Q

Which component enables you to perform code free transformations in Azure Synapse Analytics?

Studio.

Copy activity.

Mapping Data Flow.

A

Mapping Data Flow.

105
Q

Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions?

Lookup.

Conditional Split.

Select.

A

Conditional Split.

106
Q

Which transformation is used to load data into a data store or compute resource?

Window.

Source.

Sink.

A

Sink.

107
Q

In which of the following table types should an insurance company store details of customer attributes by which claims will be aggregated?

Staging table

Dimension table

Fact table

A

Dimension table

108
Q

You create a dimension table for product data, assigning a unique numeric key for each row in a column named ProductKey. The ProductKey is only defined in the data warehouse. What kind of key is ProductKey?

A surrogate key

An alternate key

A business key

A

A surrogate key

109
Q

What distribution option would be best for a sales fact table that will contain billions of records?

HASH

ROUND_ROBIN

REPLICATE

A

HASH

110
Q

You need to write a query to return the total of the UnitsProduced numeric measure in the FactProduction table aggregated by the ProductName attribute in the FactProduct table. Both tables include a ProductKey surrogate key field. What should you do?

Use two SELECT queries with a UNION ALL clause to combine the rows in the FactProduction table with those in the FactProduct table.

Use a SELECT query against the FactProduction table with a WHERE clause to filter out rows with a ProductKey that doesn’t exist in the FactProduct table.

Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.

A

Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.

111
Q

You use the RANK function in a query to rank customers in order of the number of purchases they have made. Five customers have made the same number of purchases and are all ranked equally as 1. What rank will the customer with the next highest number of purchases be assigned?

2

6

1

A

6

112
Q

You need to compare approximate production volumes by product while optimizing query response time. Which function should you use?

COUNT

NTILE

APPROX_COUNT_DISTINCT

A

APPROX_COUNT_DISTINCT

113
Q

How does splitting source files help maintain good performance when loading into Synapse Analytics?

optimized processing of smaller file sizes.

Compute node to storage segment alignment.

Reduced possibility of data corruptions.

A

Compute node to storage segment alignment.

114
Q

Which Workload Management capability manages minimum and maximum resource allocations during peak periods?

Workload Isolation.

Workload Importance.

Workload Containment.

A

Workload Isolation.

115
Q

Which T-SQL Statement loads data directly from Azure Storage?

LOAD DATA.

COPY.

INSERT FROM FILE.

A

COPY.

116
Q

How does splitting source files help maintain good performance when loading into Synapse Analytics?

optimized processing of smaller file sizes.

Compute node to storage segment alignment.

Reduced possibility of data corruptions.

A

Compute node to storage segment alignment.

117
Q

Which Workload Management capability manages minimum and maximum resource allocations during peak periods?

Workload Isolation.

Workload Importance.

Workload Containment.

A

Workload Isolation.

118
Q

Which T-SQL Statement loads data directly from Azure Storage?

LOAD DATA.

COPY.

INSERT FROM FILE.

A

COPY.

119
Q

Which Azure Data Factory component orchestrates a transformation job or runs a data movement command?

Linked Services

Datasets

Activities

A

Activities

120
Q

You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity?

Azure-SSIS

Azure

Self-hosted

A

Azure

121
Q

In Azure Data Factory authoring tool, where would you find the Copy data activity?

Move & Transform

Batch Service

Databricks

A

Move & Transform

122
Q

You want to ingest data from a SQL Server database hosted on an on-premises Windows Server. What integration runtime is required for Azure Data Factory to ingest data from the on-premises server?

Azure-SSIS Integration Runtime

Self-Hosted Integration Runtime

Azure Integration Runtime

A

Self-Hosted Integration Runtime

123
Q

By default, how long are the Azure Data Factory diagnostic logs retained for?

15 days

30 days

45 days

A

45 days

124
Q

Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions?

Lookup.

Conditional Split.

Select.

A

Conditional Split

125
Q

Which transformation is used to load data into a data store or compute resource?

Window.

Source.

Sink.

A

Sink

126
Q

Which SCD type would you use to keep history of changes in dimension members by adding a new row to the table for each change?

Type 1 SCD.

Type 2 SCD.

Type 3 SCD.

A

Type 2 SCD.

127
Q

Which SCD type would you use to update the dimension members without keeping track of history?

Type 1 SCD.

Type 2 SCD.

Type 3 SCD.

A

Type 1 SCD.

128
Q

What is a supported connector for built-in parameterization?

Azure Data Lake Storage Gen2

Azure Synapse Analytics

Azure Key Vault

A

Azure Synapse Analytics

129
Q

What is an example of a branching activity used in control flows?

The If-condition

Until-condition

Lookup-condition

A

The If-condition

130
Q

In which version of SQL Server was SSIS Projects introduced?

SQL Server 2008.

SQL Server 2012.

SQL Server 2016.

A

SQL Server 2012.

131
Q

Which tool is used to perform an assessment of migrating SSIS packages to Azure SQL Database services?

Data Migration Assistant.

Data Migration Assessment.

Data Migration Service.

A

Data Migration Assistant.

132
Q

Which tool is used to create and deploy SQL Server Integration Packages on an Azure-SSIS integration runtime, or for on-premises SQL Server?

SQL Server Data Tools.

SQL Server Management Studio.

dtexec.

A

SQL Server Data Tools.

133
Q

Which version control software does Azure Data Factory integrate with?

Team Foundation Server.

Source Safe.

Git repositories.

A

Git repositories.

134
Q

Which feature commits the changes of Azure Data Factory work in a custom branch created with the main branch in a Git repository?

Repo.

Pull request.

Commit.

A

Pull request.

135
Q

Which feature in alerts can be used to determine how an alert is fired?

Add rule.

Add severity.

Add criteria.

A

Add criteria.

136
Q

Suppose you have two video files stored as blobs. One of the videos is business-critical and requires a replication policy that creates multiple copies across geographically diverse datacenters. The other video is non-critical, and a local replication policy is sufficient. Which of the following options would satisfy both data diversity and cost sensitivity consideration.

Create a single storage account that makes use of Local-redundant storage (LRS) and host both videos from here.

Create a single storage account that makes use of Geo-redundant storage (GRS) and host both videos from here.

Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.

A

Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.

137
Q

The name of a storage account must be:

Unique within the containing resource group.

Unique within your Azure subscription.

Globally unique.

A

Globally unique.

138
Q

In a typical project, when would you create your storage account(s)?

At the beginning, during project setup.

After deployment, when the project is running.

At the end, during resource cleanup.

A

At the beginning, during project setup.

139
Q

How many access keys are provided for accessing your Azure storage account?

1

2

3

4

A

2

140
Q

You can use either the REST API or the Azure client library to programmatically access a storage account. What is the primary advantage of using the client library?

Cost

Availability

Localization

Convenience

A

Convenience

141
Q

Which of the following is a good analogy for the access keys of a storage account?

IP Address

REST Endpoint

Username and password

Cryptographic algorithm

A

Username and password

142
Q

You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use?

CORS Support

Storage Account

Shared Access Signatures

A

Shared Access Signatures

143
Q

When configuring network access to your Azure Storage Account, what is the default network rule?

To allow all connections from all networks

To allow all connection from a private IP address range

To deny all connections from all networks

A

To allow all connections from all networks

144
Q

Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account?

Azure Defender for Storage

Azure Storage Account Security Feature

Encryption in transit

A

Azure Defender for Storage