Misc Flashcards
You can use SSAS data source in an ADF Copy activity
False
ADF Copy activity can invoke the Polybase feature to load Azure synapse analytics SQL pool
True
You can implement incremental load from Azure SQL database by using change tracking combined with an ADF copy activity
True
Which type of transactional database system would work best for product data?
OLTP
Suppose a retailer’s operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen?
One order will be processed and use the in-store credit, and the other order won’t be processed.
Which of the following describes a good strategy for creating storage accounts and blob containers for your application?
- Create both your Azure Storage accounts and containers before deploying your application.
- Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
- Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
Which of the following can be used to initialize the Blob Storage client library within an application?
- An Azure username and password.
- The Azure Storage account connection string.
- A globally-unique identifier (GUID) that represents the application.
- The Azure Storage account datacenter and location identifiers.
The Azure Storage account connection string.
What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob?
- A new block blob is created in storage.
- A BlobClient object is created locally. No network calls are made.
- An exception is thrown if the blob does not exist in storage.
- The contents of the named blob are downloaded.
A BlobClient object is created locally. No network calls are made.
Which is the default distribution used for a table in Synapse Analytics?
HASH.
Round-Robin.
Replicated Table.
Round-Robin.
Which Index Type offers the highest compression?
Columnstore.
Rowstore.
Heap.
Columnstore
How do column statistics improve query performance?
By keeping track of which columns are being queried.
By keeping track of how much data exists between ranges in columns.
By caching column values for queries.
By keeping track of how much data exists between ranges in columns.
In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used?
Python.
SQL.
Scala.
Scala
When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool?
Use the integrated notebook experience from Azure Synapse Studio.
Use the PySpark connector.
Use token-based authentication.
Use the integrated notebook experience from Azure Synapse Studio.
Which language can be used to define Spark job definitions?
Transact-SQL
PowerShell
PySpark
PySpark
What Transact-SQL function verifies if a piece of text is valid JSON?
JSON_QUERY
JSON_VALUE
ISJSON
ISJSON
What Transact-SQL function is used to perform a HyperLogLog function?
APPROX_COUNT_DISTINCT
COUNT_DISTINCT_APPROX
COUNT
APPROX_COUNT_DISTINCT
Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?
SCALE.
MODIFY
CHANGE.
MODIFY
Which workload management feature influences the order in which a request gets access to resources?
Workload classification.
Workload importance.
Workload isolation.
Workload importance.
Which Dynamic Management View enables the view the active connections against a dedicated SQL pool?
sys. dm_pdw_exec_requests.
sys. dm_pdw_dms_workers.
DBCC PDW_SHOWEXECUTIONPLAN.
sys.dm_pdw_exec_requests.
What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions?
Grouping the data based on partitions and counting rows with a T-SQL query.
Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.
Monitor query speeds by testing the same query for each partition.
Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.
To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters?
VARCHAR(MAX)
VARCHAR(128)
NVARCHAR(128)
VARCHAR(128)
Which of the following statements is a benefit of materialized views?
Reducing the execution time for complex queries with JOINs and aggregate functions.
Increased resiliency benefits.
Increased high availability.
Reducing the execution time for complex queries with JOINs and aggregate functions.
You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints is greyed out. Why is the option not available?
Azure Synapse Studio does not support the creation of private endpoints.
A Conditional Access policy has to be defined first.
A managed virtual network has not been created.
A managed virtual network has not been created.
You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use?
Storage account keys.
Shared access signatures.
Managed identities.
Managed identities.
Which definition best describes Apache Spark?
A highly scalable relational database management system.
A virtual server with a Python runtime.
A distributed platform for parallel data processing using multiple languages.
A distributed platform for parallel data processing using multiple languages.
You need to use Spark to analyze data in a parquet file. What should you do?
Load the parquet file into a dataframe.
Import the data into a table in a serverless SQL pool.
Convert the data to CSV format.
Load the parquet file into a dataframe.
You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use?
%%spark
%%pyspark
%%sql
%%sql
Which of the following descriptions best fits Delta Lake?
A Spark API for exporting data from a relational database into CSV files.
A relational storage layer for Spark that supports tables based on Parquet files.
A synchronization solution that replicates data between SQL pools and Spark pools.
A relational storage layer for Spark that supports tables based on Parquet files.
You’ve loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage?
CSV
PARQUET
DELTA
DELTA
What feature of Delta Lake enables you to retrieve data from previous versions of a table?
Spark Structured Streaming
Time Travel
Catalog Tables
Time Travel
You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen?
The table metadata and data files will be deleted.
The table metadata will be removed from the catalog, but the data files will remain intact.
The table metadata will remain in the catalog, but the data files will be deleted.
The table metadata and data files will be deleted.
When using Spark Structured Streaming, a Delta Lake table can be which of the following?
Only a source
Only a sink
Either a source or a sink
Either a source or a sink
What is one of the possible ways to optimize an Apache Spark Job?
Remove all nodes.
Remove the Apache Spark Pool.
Use bucketing.
Use bucketing.
What can cause a slower performance on join or shuffle jobs?
Data skew.
Enablement of autoscaling
Bucketing.
Data skew.
Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture.
Business applications store data in an operational data store, which is also used to support analytical queries for reporting.
Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.
Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.
Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.
You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB core (SQL) API container. Which Azure Synapse Link service should you use?
Azure Synapse Link for SQL
Azure Synapse Link for Dataverse
Azure Synapse Link for Cosmos DB
Azure Synapse Link for Cosmos DB
You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored?
In an Azure Synapse dedicated SQL pool
In an Azure Data Lake Gen2 storage container.
In an Azure Cosmos DB container.
In an Azure Data Lake Gen2 storage container.
You have an Azure Cosmos DB core (SQL) account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics?
Configure global replication in Azure Cosmos DB.
Create a dedicated SQL pool in Azure Synapse Analytics.
Enable Azure Synapse Link in Azure Cosmos DB.
Enable Azure Synapse Link in Azure Cosmos DB.
You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics?
Delete and recreate the container.
Enable Azure Synapse Link in the container to create an analytical store.
Add an item to the container.
Enable Azure Synapse Link in the container to create an analytical store.
You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Cosmos DB. What must you do?
Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.
Disable automatic pausing for the Spark pool in Azure Synapse Analytics.
Install the Azure Cosmos DB SDK for Python package in the Spark pool.
Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.
You’re writing PySpark code to load data from a Cosmos DB analytical store into a dataframe. What format should you specify?
cosmos. json
cosmos. olap
cosmos. sql
cosmos.olap
You’re writing a SQL code in a serverless SQL pool to query an analytical store in Cosmos DB. What function should you use?
OPENDATASET
ROW
OPENROWSET
OPENROWSET
From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics?
Azure Cosmos DB
SQL Server 2022
Azure SQL Managed Instance
SQL Server 2022
What must you create in your Azure Synapse Analytics workspace to implement Azure Synapse Link for Azure SQL Database?
A serverless SQL pool
A linked service for your Azure SQL Database
A link connection for your Azure SQL Database
A link connection for your Azure SQL Database
You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create?
Azure Data Lake Storage Gen2
Azure Key Vault
Azure Application Insights
Azure Data Lake Storage Gen2
How many drivers does a Cluster have?
Only one
Two, running in parallel
Configurable between one and eight
Only one
Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur?
The Executor and the Slot
The Driver and the Executor
The Slot and the Task
The Executor and the Slot
What type of process are the driver and the executors?
Java processes
Python processes
C++ processes
Java processes
Which notebook format is used in Databricks?
DBC
.notebook
.spark
DBC
When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?
Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.
Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.
When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.
Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.
To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?
Stages
Arrays
Jobs
Jobs
How do you list files in DBFS within a notebook?
ls /my-file-path
%fs dir /my-file-path
%fs ls /my-file-path
%fs ls /my-file-path
How do you infer the data types and column names when you read a JSON file?
spark. read.option(“inferSchema”, “true”).json(jsonFile)
spark. read.inferSchema(“true”).json(jsonFile)
spark. read.option(“inferData”, “true”).json(jsonFile)
spark.read.option(“inferSchema”, “true”).json(jsonFile)
Which DataFrame method do you use to create a temporary view?
createTempView()
createTempViewDF()
createOrReplaceTempView()
createOrReplaceTempView()
How do you create a DataFrame object?
Introduce a variable name and equate it to something like myDataFrameDF =
Use the createDataFrame() function
Use the DF.create() syntax
Introduce a variable name and equate it to something like myDataFrameDF =
How do you cache data into the memory of the local executor for instant access?
.save().inMemory()
.inMemory().save()
.cache()
.cache()
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?
IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)