Misc Flashcards

Question 1

Q

You can use SSAS data source in an ADF Copy activity

Question 2

Q

ADF Copy activity can invoke the Polybase feature to load Azure synapse analytics SQL pool

Question 3

Q

You can implement incremental load from Azure SQL database by using change tracking combined with an ADF copy activity

Question 4

Q

Which type of transactional database system would work best for product data?

Question 5

Q

Suppose a retailer’s operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen?

Answer

A

One order will be processed and use the in-store credit, and the other order won’t be processed.

Question 6

Q

Which of the following describes a good strategy for creating storage accounts and blob containers for your application?

Create both your Azure Storage accounts and containers before deploying your application.
Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
Create Azure Storage accounts before deploying your app. Create containers in your application as needed.

Answer

A

Create Azure Storage accounts before deploying your app. Create containers in your application as needed.

Question 7

Q

Which of the following can be used to initialize the Blob Storage client library within an application?

An Azure username and password.
The Azure Storage account connection string.
A globally-unique identifier (GUID) that represents the application.
The Azure Storage account datacenter and location identifiers.

Answer

A

The Azure Storage account connection string.

Question 8

Q

What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob?

A new block blob is created in storage.
A BlobClient object is created locally. No network calls are made.
An exception is thrown if the blob does not exist in storage.
The contents of the named blob are downloaded.

Answer

A

A BlobClient object is created locally. No network calls are made.

Question 9

Q

Which is the default distribution used for a table in Synapse Analytics?

HASH.

Round-Robin.

Replicated Table.

Answer

A

Round-Robin.

Question 10

Q

Which Index Type offers the highest compression?

Columnstore.

Rowstore.

Heap.

Answer

A

Columnstore

Question 11

Q

How do column statistics improve query performance?

By keeping track of which columns are being queried.

By keeping track of how much data exists between ranges in columns.

By caching column values for queries.

Answer

A

By keeping track of how much data exists between ranges in columns.

Question 12

Q

In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used?

Python.

SQL.

Scala.

Question 13

Q

When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool?

Use the integrated notebook experience from Azure Synapse Studio.

Use the PySpark connector.

Use token-based authentication.

Answer

A

Use the integrated notebook experience from Azure Synapse Studio.

Question 14

Q

Which language can be used to define Spark job definitions?

Transact-SQL

PowerShell

PySpark

Question 15

Q

What Transact-SQL function verifies if a piece of text is valid JSON?

JSON_QUERY

JSON_VALUE

ISJSON

Question 16

Q

What Transact-SQL function is used to perform a HyperLogLog function?

APPROX_COUNT_DISTINCT

COUNT_DISTINCT_APPROX

COUNT

Answer

A

APPROX_COUNT_DISTINCT

Question 17

Q

Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?

SCALE.

MODIFY

CHANGE.

Question 18

Q

Which workload management feature influences the order in which a request gets access to resources?

Workload classification.

Workload importance.

Workload isolation.

Answer

A

Workload importance.

Question 19

Q

Which Dynamic Management View enables the view the active connections against a dedicated SQL pool?

sys. dm_pdw_exec_requests.
sys. dm_pdw_dms_workers.

DBCC PDW_SHOWEXECUTIONPLAN.

Answer

A

sys.dm_pdw_exec_requests.

Question 20

Q

What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions?

Grouping the data based on partitions and counting rows with a T-SQL query.

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

Monitor query speeds by testing the same query for each partition.

Answer

A

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

Question 21

Q

To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters?

VARCHAR(MAX)

VARCHAR(128)

NVARCHAR(128)

Answer

A

VARCHAR(128)

Question 22

Q

Which of the following statements is a benefit of materialized views?

Reducing the execution time for complex queries with JOINs and aggregate functions.

Increased resiliency benefits.

Increased high availability.

Answer

A

Reducing the execution time for complex queries with JOINs and aggregate functions.

Question 23

Q

You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints is greyed out. Why is the option not available?

Azure Synapse Studio does not support the creation of private endpoints.

A Conditional Access policy has to be defined first.

A managed virtual network has not been created.

Answer

A

A managed virtual network has not been created.

Question 24

Q

You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use?

Storage account keys.

Shared access signatures.

Managed identities.

Answer

A

Managed identities.

Question 25

Q

Which definition best describes Apache Spark?

A highly scalable relational database management system.

A virtual server with a Python runtime.

A distributed platform for parallel data processing using multiple languages.

Answer

A

A distributed platform for parallel data processing using multiple languages.

Question 26

Q

You need to use Spark to analyze data in a parquet file. What should you do?

Load the parquet file into a dataframe.

Import the data into a table in a serverless SQL pool.

Convert the data to CSV format.

Answer

A

Load the parquet file into a dataframe.

Question 27

Q

You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use?

%%spark

%%pyspark

%%sql

Question 28

Q

Which of the following descriptions best fits Delta Lake?

A Spark API for exporting data from a relational database into CSV files.

A relational storage layer for Spark that supports tables based on Parquet files.

A synchronization solution that replicates data between SQL pools and Spark pools.

Answer

A

A relational storage layer for Spark that supports tables based on Parquet files.

Question 29

Q

You’ve loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage?

CSV

PARQUET

DELTA

Question 30

Q

What feature of Delta Lake enables you to retrieve data from previous versions of a table?

Spark Structured Streaming

Time Travel

Catalog Tables

Answer

A

Time Travel

Question 31

Q

You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen?

The table metadata and data files will be deleted.

The table metadata will be removed from the catalog, but the data files will remain intact.

The table metadata will remain in the catalog, but the data files will be deleted.

Answer

A

The table metadata and data files will be deleted.

Question 32

Q

When using Spark Structured Streaming, a Delta Lake table can be which of the following?

Only a source

Only a sink

Either a source or a sink

Answer

A

Either a source or a sink

Question 33

Q

What is one of the possible ways to optimize an Apache Spark Job?

Remove all nodes.

Remove the Apache Spark Pool.

Use bucketing.

Answer

A

Use bucketing.

Question 34

Q

What can cause a slower performance on join or shuffle jobs?

Data skew.

Enablement of autoscaling

Bucketing.

Answer

A

Data skew.

Question 35

Q

Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture.

Business applications store data in an operational data store, which is also used to support analytical queries for reporting.

Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.

Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.

Answer

A

Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.

Question 36

Q

You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB core (SQL) API container. Which Azure Synapse Link service should you use?

Azure Synapse Link for SQL

Azure Synapse Link for Dataverse

Azure Synapse Link for Cosmos DB

Answer

A

Azure Synapse Link for Cosmos DB

Question 37

Q

You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored?

In an Azure Synapse dedicated SQL pool

In an Azure Data Lake Gen2 storage container.

In an Azure Cosmos DB container.

Answer

A

In an Azure Data Lake Gen2 storage container.

Question 38

Q

You have an Azure Cosmos DB core (SQL) account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics?

Configure global replication in Azure Cosmos DB.

Create a dedicated SQL pool in Azure Synapse Analytics.

Enable Azure Synapse Link in Azure Cosmos DB.

Answer

A

Enable Azure Synapse Link in Azure Cosmos DB.

Question 39

Q

You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics?

Delete and recreate the container.

Enable Azure Synapse Link in the container to create an analytical store.

Add an item to the container.

Answer

A

Enable Azure Synapse Link in the container to create an analytical store.

Question 40

Q

You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Cosmos DB. What must you do?

Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.

Disable automatic pausing for the Spark pool in Azure Synapse Analytics.

Install the Azure Cosmos DB SDK for Python package in the Spark pool.

Answer

A

Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.

Question 41

Q

You’re writing PySpark code to load data from a Cosmos DB analytical store into a dataframe. What format should you specify?

cosmos. json
cosmos. olap
cosmos. sql

Answer

A

cosmos.olap

Question 42

Q

You’re writing a SQL code in a serverless SQL pool to query an analytical store in Cosmos DB. What function should you use?

OPENDATASET

ROW

OPENROWSET

Answer

A

OPENROWSET

Question 43

Q

From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics?

Azure Cosmos DB

SQL Server 2022

Azure SQL Managed Instance

Answer

A

SQL Server 2022

Question 44

Q

What must you create in your Azure Synapse Analytics workspace to implement Azure Synapse Link for Azure SQL Database?

A serverless SQL pool

A linked service for your Azure SQL Database

A link connection for your Azure SQL Database

Answer

A

A link connection for your Azure SQL Database

Question 45

Q

You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create?

Azure Data Lake Storage Gen2

Azure Key Vault

Azure Application Insights

Answer

A

Azure Data Lake Storage Gen2

Question 46

Q

How many drivers does a Cluster have?

Only one

Two, running in parallel

Configurable between one and eight

Question 47

Q

Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur?

The Executor and the Slot

The Driver and the Executor

The Slot and the Task

Answer

A

The Executor and the Slot

Question 48

Q

What type of process are the driver and the executors?

Java processes

Python processes

C++ processes

Answer

A

Java processes

Question 49

Q

Which notebook format is used in Databricks?

DBC

.notebook

.spark

Question 50

Q

When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?

Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.

Answer

A

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

Question 51

Q

To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?

Stages

Arrays

Jobs

Question 52

Q

How do you list files in DBFS within a notebook?

ls /my-file-path

%fs dir /my-file-path

%fs ls /my-file-path

Answer

A

%fs ls /my-file-path

Question 53

Q

How do you infer the data types and column names when you read a JSON file?

spark. read.option(“inferSchema”, “true”).json(jsonFile)
spark. read.inferSchema(“true”).json(jsonFile)
spark. read.option(“inferData”, “true”).json(jsonFile)

Answer

A

spark.read.option(“inferSchema”, “true”).json(jsonFile)

Question 54

Q

Which DataFrame method do you use to create a temporary view?

createTempView()

createTempViewDF()

createOrReplaceTempView()

Answer

A

createOrReplaceTempView()

Question 55

Q

How do you create a DataFrame object?

Introduce a variable name and equate it to something like myDataFrameDF =

Use the createDataFrame() function

Use the DF.create() syntax

Answer

A

Introduce a variable name and equate it to something like myDataFrameDF =

Question 56

Q

How do you cache data into the memory of the local executor for instant access?

.save().inMemory()

.inMemory().save()

.cache()

Question 57

Q

What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)

IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)

IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)

Answer

A

IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)

Question 58

Q

Which of the following statements describes a wide transformations?

A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers

A wide transformation requires sharing data across workers. It does so by shuffling data.

A wide transformation applies data transformation over a large number of columns

Answer

A

A wide transformation requires sharing data across workers. It does so by shuffling data.

Question 59

Q

Which feature of Spark determines how your code is executed?

Catalyst Optimizer

Tungsten Record Format

Java Garbage Collection

Answer

A

Catalyst Optimizer

Question 60

Q

If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?

Tungsten Record Format

Java Garbage Collection

Lazy Execution

Answer

A

Lazy Execution

Question 61

Q

Which command orders by a column in descending order?

df. orderBy(“requests desc”)
df. orderBy(“requests”).desc()
df. orderBy(col(“requests”).desc())

Answer

A

df.orderBy(col(“requests”).desc())

Question 62

Q

Which command specifies a column value in a DataFrame’s filter? Specifically, filter by a productType column where the value is equal to book?

df. filter(col(“productType”) == “book”)
df. filter(“productType = ‘book’”)
df. col(“productType”).filter(“book”)

Answer

A

df.filter(col(“productType”) == “book”)

Question 63

Q

When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with “ing”.

df. filter().col(“verb”).like(“%ing”)
df. filter(“verb like ‘%ing’”)
df. filter(col(“verb”).endswith(“ing”))

Answer

A

df.filter(col(“verb”).endswith(“ing”))

Question 64

Q

Which method for renaming a DataFrame’s column is incorrect?

df. select(col(“timestamp”).alias(“dateCaptured”))
df. alias(“timestamp”, “dateCaptured”)
df. toDF(“dateCaptured”)

Answer

A

df.alias(“timestamp”, “dateCaptured”)

Answer 51

A

df.groupBy(col(“storefront”)).avg(“completedTransactions”)

Answer 52

A

The Data Plane is hosted within the client subscription and is where all data is processed and stored

Answer 53

A

At-rest and in-transit

Answer 54

A

Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials

Answer 55

A

A Databricks secret scope that is backed by Azure Key Vault instead of Databricks

Answer 56

A

DESCRIBE DETAIL tableName

Answer 57

A

Use MERGE INTO my-table USING data-to-upsert

Answer 58

A

Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files

Answer 59

A

Around 1 GB

Answer 60

A

The append outputMode allows records to be added to the output sink

Answer 61

A

spark.readStream

Answer 62

A

When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch

Answer 63

A

An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.

Answer 64

A

Invoke spark.streams.active

Answer 65

A

.writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …

Answer 66

A

To represent a data store or a compute resource that can host execution of an activity

Answer 67

A

Use notebook widgets to define parameters that can be passed into the notebook

Answer 68

A

If the target cluster is stopped, Databricks will start the cluster before attempting to execute

Answer 69

A

Both are correct

Answer 70

A

A Build pipeline

Answer 71

A

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

Answer 72

A

Create a database master key and configure the firewall to enable Azure services to connect

Answer 73

A

df.write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()

Answer 74

A

An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks

Answer 75

A

Enable ADLS Passthrough on a cluster

Answer 76

A

On the Advanced tab, set the Hierarchical Namespace to Enabled.

Answer 77

A

Shared Access Signatures

Answer 78

A

To allow all connections from all networks

Answer 79

A

Azure Defender for Storage

Answer 80

A

Azure Blob storage

Answer 81

A

A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.

Answer 82

A

Azure Cosmos DB

Answer 83

A

Integration with Azure Blob storage

Answer 84

A

Azure Data Lake.

Answer 85

A

Mapping Data Flow.

Answer 86

A

Conditional Split.

Answer 87

A

Dimension table

Answer 88

A

A surrogate key

Answer 89

A

Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.

Answer 90

A

APPROX_COUNT_DISTINCT

Answer 91

A

Compute node to storage segment alignment.

Answer 92

A

Workload Isolation.

Answer 93

A

Compute node to storage segment alignment.

Answer 94

A

Workload Isolation.

Answer 95

A

Activities

Answer 96

A

Move & Transform

Answer 97

A

Self-Hosted Integration Runtime

Answer 98

A

Conditional Split

Answer 99

A

Type 2 SCD.

Answer 100

A

Type 1 SCD.

Answer 101

A

Azure Synapse Analytics

Answer 102

A

The If-condition

Answer 103

A

SQL Server 2012.

Answer 104

A

Data Migration Assistant.

Answer 105

A

SQL Server Data Tools.

Answer 106

A

Git repositories.

Answer 107

A

Pull request.

Answer 108

A

Add criteria.

Answer 109

A

Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.

Answer 110

A

Globally unique.

Answer 111

A

At the beginning, during project setup.

Answer 112

A

Convenience

Answer 113

A

Username and password

Answer 114

A

Shared Access Signatures

Answer 115

A

To allow all connections from all networks

Answer 116

A

Azure Defender for Storage