Misc Flashcards
You can use SSAS data source in an ADF Copy activity
False
ADF Copy activity can invoke the Polybase feature to load Azure synapse analytics SQL pool
True
You can implement incremental load from Azure SQL database by using change tracking combined with an ADF copy activity
True
Which type of transactional database system would work best for product data?
OLTP
Suppose a retailer’s operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen?
One order will be processed and use the in-store credit, and the other order won’t be processed.
Which of the following describes a good strategy for creating storage accounts and blob containers for your application?
- Create both your Azure Storage accounts and containers before deploying your application.
- Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
- Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
Which of the following can be used to initialize the Blob Storage client library within an application?
- An Azure username and password.
- The Azure Storage account connection string.
- A globally-unique identifier (GUID) that represents the application.
- The Azure Storage account datacenter and location identifiers.
The Azure Storage account connection string.
What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob?
- A new block blob is created in storage.
- A BlobClient object is created locally. No network calls are made.
- An exception is thrown if the blob does not exist in storage.
- The contents of the named blob are downloaded.
A BlobClient object is created locally. No network calls are made.
Which is the default distribution used for a table in Synapse Analytics?
HASH.
Round-Robin.
Replicated Table.
Round-Robin.
Which Index Type offers the highest compression?
Columnstore.
Rowstore.
Heap.
Columnstore
How do column statistics improve query performance?
By keeping track of which columns are being queried.
By keeping track of how much data exists between ranges in columns.
By caching column values for queries.
By keeping track of how much data exists between ranges in columns.
In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used?
Python.
SQL.
Scala.
Scala
When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool?
Use the integrated notebook experience from Azure Synapse Studio.
Use the PySpark connector.
Use token-based authentication.
Use the integrated notebook experience from Azure Synapse Studio.
Which language can be used to define Spark job definitions?
Transact-SQL
PowerShell
PySpark
PySpark
What Transact-SQL function verifies if a piece of text is valid JSON?
JSON_QUERY
JSON_VALUE
ISJSON
ISJSON
What Transact-SQL function is used to perform a HyperLogLog function?
APPROX_COUNT_DISTINCT
COUNT_DISTINCT_APPROX
COUNT
APPROX_COUNT_DISTINCT
Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?
SCALE.
MODIFY
CHANGE.
MODIFY
Which workload management feature influences the order in which a request gets access to resources?
Workload classification.
Workload importance.
Workload isolation.
Workload importance.
Which Dynamic Management View enables the view the active connections against a dedicated SQL pool?
sys. dm_pdw_exec_requests.
sys. dm_pdw_dms_workers.
DBCC PDW_SHOWEXECUTIONPLAN.
sys.dm_pdw_exec_requests.
What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions?
Grouping the data based on partitions and counting rows with a T-SQL query.
Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.
Monitor query speeds by testing the same query for each partition.
Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.
To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters?
VARCHAR(MAX)
VARCHAR(128)
NVARCHAR(128)
VARCHAR(128)
Which of the following statements is a benefit of materialized views?
Reducing the execution time for complex queries with JOINs and aggregate functions.
Increased resiliency benefits.
Increased high availability.
Reducing the execution time for complex queries with JOINs and aggregate functions.
You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints is greyed out. Why is the option not available?
Azure Synapse Studio does not support the creation of private endpoints.
A Conditional Access policy has to be defined first.
A managed virtual network has not been created.
A managed virtual network has not been created.
You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use?
Storage account keys.
Shared access signatures.
Managed identities.
Managed identities.
Which definition best describes Apache Spark?
A highly scalable relational database management system.
A virtual server with a Python runtime.
A distributed platform for parallel data processing using multiple languages.
A distributed platform for parallel data processing using multiple languages.
You need to use Spark to analyze data in a parquet file. What should you do?
Load the parquet file into a dataframe.
Import the data into a table in a serverless SQL pool.
Convert the data to CSV format.
Load the parquet file into a dataframe.
You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use?
%%spark
%%pyspark
%%sql
%%sql
Which of the following descriptions best fits Delta Lake?
A Spark API for exporting data from a relational database into CSV files.
A relational storage layer for Spark that supports tables based on Parquet files.
A synchronization solution that replicates data between SQL pools and Spark pools.
A relational storage layer for Spark that supports tables based on Parquet files.
You’ve loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage?
CSV
PARQUET
DELTA
DELTA
What feature of Delta Lake enables you to retrieve data from previous versions of a table?
Spark Structured Streaming
Time Travel
Catalog Tables
Time Travel
You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen?
The table metadata and data files will be deleted.
The table metadata will be removed from the catalog, but the data files will remain intact.
The table metadata will remain in the catalog, but the data files will be deleted.
The table metadata and data files will be deleted.
When using Spark Structured Streaming, a Delta Lake table can be which of the following?
Only a source
Only a sink
Either a source or a sink
Either a source or a sink
What is one of the possible ways to optimize an Apache Spark Job?
Remove all nodes.
Remove the Apache Spark Pool.
Use bucketing.
Use bucketing.
What can cause a slower performance on join or shuffle jobs?
Data skew.
Enablement of autoscaling
Bucketing.
Data skew.
Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture.
Business applications store data in an operational data store, which is also used to support analytical queries for reporting.
Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.
Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.
Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.
You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB core (SQL) API container. Which Azure Synapse Link service should you use?
Azure Synapse Link for SQL
Azure Synapse Link for Dataverse
Azure Synapse Link for Cosmos DB
Azure Synapse Link for Cosmos DB
You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored?
In an Azure Synapse dedicated SQL pool
In an Azure Data Lake Gen2 storage container.
In an Azure Cosmos DB container.
In an Azure Data Lake Gen2 storage container.
You have an Azure Cosmos DB core (SQL) account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics?
Configure global replication in Azure Cosmos DB.
Create a dedicated SQL pool in Azure Synapse Analytics.
Enable Azure Synapse Link in Azure Cosmos DB.
Enable Azure Synapse Link in Azure Cosmos DB.
You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics?
Delete and recreate the container.
Enable Azure Synapse Link in the container to create an analytical store.
Add an item to the container.
Enable Azure Synapse Link in the container to create an analytical store.
You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Cosmos DB. What must you do?
Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.
Disable automatic pausing for the Spark pool in Azure Synapse Analytics.
Install the Azure Cosmos DB SDK for Python package in the Spark pool.
Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.
You’re writing PySpark code to load data from a Cosmos DB analytical store into a dataframe. What format should you specify?
cosmos. json
cosmos. olap
cosmos. sql
cosmos.olap
You’re writing a SQL code in a serverless SQL pool to query an analytical store in Cosmos DB. What function should you use?
OPENDATASET
ROW
OPENROWSET
OPENROWSET
From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics?
Azure Cosmos DB
SQL Server 2022
Azure SQL Managed Instance
SQL Server 2022
What must you create in your Azure Synapse Analytics workspace to implement Azure Synapse Link for Azure SQL Database?
A serverless SQL pool
A linked service for your Azure SQL Database
A link connection for your Azure SQL Database
A link connection for your Azure SQL Database
You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create?
Azure Data Lake Storage Gen2
Azure Key Vault
Azure Application Insights
Azure Data Lake Storage Gen2
How many drivers does a Cluster have?
Only one
Two, running in parallel
Configurable between one and eight
Only one
Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur?
The Executor and the Slot
The Driver and the Executor
The Slot and the Task
The Executor and the Slot
What type of process are the driver and the executors?
Java processes
Python processes
C++ processes
Java processes
Which notebook format is used in Databricks?
DBC
.notebook
.spark
DBC
When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?
Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.
Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.
When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.
Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.
To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?
Stages
Arrays
Jobs
Jobs
How do you list files in DBFS within a notebook?
ls /my-file-path
%fs dir /my-file-path
%fs ls /my-file-path
%fs ls /my-file-path
How do you infer the data types and column names when you read a JSON file?
spark. read.option(“inferSchema”, “true”).json(jsonFile)
spark. read.inferSchema(“true”).json(jsonFile)
spark. read.option(“inferData”, “true”).json(jsonFile)
spark.read.option(“inferSchema”, “true”).json(jsonFile)
Which DataFrame method do you use to create a temporary view?
createTempView()
createTempViewDF()
createOrReplaceTempView()
createOrReplaceTempView()
How do you create a DataFrame object?
Introduce a variable name and equate it to something like myDataFrameDF =
Use the createDataFrame() function
Use the DF.create() syntax
Introduce a variable name and equate it to something like myDataFrameDF =
How do you cache data into the memory of the local executor for instant access?
.save().inMemory()
.inMemory().save()
.cache()
.cache()
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?
IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)
Which of the following statements describes a wide transformations?
A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers
A wide transformation requires sharing data across workers. It does so by shuffling data.
A wide transformation applies data transformation over a large number of columns
A wide transformation requires sharing data across workers. It does so by shuffling data.
Which feature of Spark determines how your code is executed?
Catalyst Optimizer
Tungsten Record Format
Java Garbage Collection
Catalyst Optimizer
If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?
Tungsten Record Format
Java Garbage Collection
Lazy Execution
Lazy Execution
Which command orders by a column in descending order?
df. orderBy(“requests desc”)
df. orderBy(“requests”).desc()
df. orderBy(col(“requests”).desc())
df.orderBy(col(“requests”).desc())
Which command specifies a column value in a DataFrame’s filter? Specifically, filter by a productType column where the value is equal to book?
df. filter(col(“productType”) == “book”)
df. filter(“productType = ‘book’”)
df. col(“productType”).filter(“book”)
df.filter(col(“productType”) == “book”)
When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with “ing”.
df. filter().col(“verb”).like(“%ing”)
df. filter(“verb like ‘%ing’”)
df. filter(col(“verb”).endswith(“ing”))
df.filter(col(“verb”).endswith(“ing”))
Which method for renaming a DataFrame’s column is incorrect?
df. select(col(“timestamp”).alias(“dateCaptured”))
df. alias(“timestamp”, “dateCaptured”)
df. toDF(“dateCaptured”)
df.alias(“timestamp”, “dateCaptured”)
You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?
df. select(col(“storefront”)).avg(“completedTransactions”)
df. groupBy(col(“storefront”)).avg(col(“completedTransactions”))
df. groupBy(col(“storefront”)).avg(“completedTransactions”)
df.groupBy(col(“storefront”)).avg(“completedTransactions”)
Which statement about the Azure Databricks Data Plane is true?
The Data Plane contains the Cluster Manager and coordinates data processing jobs
The Data Plane is hosted within a Microsoft-managed subscription
The Data Plane is hosted within the client subscription and is where all data is processed and stored
The Data Plane is hosted within the client subscription and is where all data is processed and stored
In which modes does Azure Databricks provide data encryption?
At-rest and in-transit
At-rest only
In-transit only
At-rest and in-transit
What does Azure Data Lake Storage (ADLS) Passthrough enable?
Automatically mounting ADLS accounts to the workspace that are added to the managed resource group
User security groups that are added to ADLS are automatically created in the workspace as Databricks groups
Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials
Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials
What is an Azure Key Vault-backed secret scope?
It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets
A Databricks secret scope that is backed by Azure Key Vault instead of Databricks
It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session
A Databricks secret scope that is backed by Azure Key Vault instead of Databricks
What is the Databricks Delta command to display metadata?
MSCK DETAIL tablename
DESCRIBE DETAIL tableName
SHOW SCHEMA tablename
DESCRIBE DETAIL tableName
How do you perform UPSERT in a Delta dataset?
Use UPSERT INTO my-table
Use UPSERT INTO my-table /MERGE
Use MERGE INTO my-table USING data-to-upsert
Use MERGE INTO my-table USING data-to-upsert
What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade?
Creates an order-based index on the Grade field to improve filters against that field
Ensures that all data backing, for example, Grade=8 is colocated, then updates a graph that routes requests to the appropriate files
Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files
Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files
What size does OPTIMIZE compact small files to?
Around 100 MB
Around 1 GB
Around 500 MB
Around 1 GB
When doing a write stream command, what does the outputMode(“append”) option do?
The append mode allows records to be updated and changed in place
The append outputMode allows records to be added to the output sink
The append mode replaces existing records and updates aggregates
The append outputMode allows records to be added to the output sink
In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?
spark. readStream
spark. read
spark. stream.read
spark.readStream
What happens if the command option(“checkpointLocation”, pointer-to-checkpoint directory) is not specified?
It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict
The streaming job will function as expected since the checkpointLocation option does not exist
When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch
When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch
What is a lambda architecture and what does it try to solve?
An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing
An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.
An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today
An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.
What command should be issued to view the list of active streams?
Invoke spark.streams.active
Invoke spark.streams.show
Invoke spark.view.active
Invoke spark.streams.active
What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?
.writeStream.format(“delta”).checkpoint(“location”, checkpointPath) …
.writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …
.writeStream.format(“parquet”).option(“checkpointLocation”, checkpointPath) …
.writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …
What’s the purpose of linked services in Azure Data Factory?
To represent a data store or a compute resource that can host execution of an activity
To represent a processing step in a pipeline
To link data stores or computer resources together for the movement of data between resources
To represent a data store or a compute resource that can host execution of an activity
How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?
Use the new API endpoint option on a notebook in Databricks and provide the parameter name
Use notebook widgets to define parameters that can be passed into the notebook
Deploy the notebook as a web service in Databricks, defining parameter names and types
Use notebook widgets to define parameters that can be passed into the notebook
What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn’t running when the cluster is called by Data Factory?
If the target cluster is stopped, Databricks will start the cluster before attempting to execute
The Databricks activity will fail in Azure Data Factory – you must always have the cluster running
Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity
If the target cluster is stopped, Databricks will start the cluster before attempting to execute
What does the CD in CI/CD mean?
Continuous Delivery
Continuous Deployment
Both are correct
Both are correct
What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?
An Artifact pipeline
A Build pipeline
A Release pipeline
A Build pipeline
What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?
Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo
Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?
Create a database master key and configure the firewall to enable Azure services to connect
Use a correctly formatted ConnectionString and create a database master key
Add the client IP address to the firewall’s allowed IP addresses list and use the correctly formatted ConnectionString
Create a database master key and configure the firewall to enable Azure services to connect
Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?
df. write.mode(“overwrite”).option(“…”).option(“…”).save()
df. write.format(“com.databricks.spark.sqldw”).overwrite().option(“…”).option(“…”).save()
df. write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()
df.write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()
What is SCIM?
An optimization that removes orphaned data from a given dataset
An open standard that enables users to bring their own auth key to the Databricks environment
An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks
An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks
If mounting an Azure Data Lake Storage (ADLS) account to a workspace, what cluster feature must be used to have ACLS within ADLS applied to the user executing commands in a notebook?
Enable ADLS Passthrough on a cluster
Enable SCIM
Set spark.config.adls.impersonateuser(true)
Enable ADLS Passthrough on a cluster
Mike is creating an Azure Data Lake Storage Gen2 account. He must configure this account to be able to process analytical data workloads for best performance. Which option should he configure when creating the storage account?
On the Basic tab, set the Performance option to Standard.
On the Basic Tab, set the Performance option to ON.
On the Advanced tab, set the Hierarchical Namespace to Enabled
On the Advanced tab, set the Hierarchical Namespace to Enabled.
In which phase of big data processing is Azure Data Lake Storage located?
Ingestion
Store
Model & Serve
Store
You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use?
CORS Support
Storage Account
Shared Access Signatures
Shared Access Signatures
When configuring network access to your Azure Storage Account, what is the default network rule?
To allow all connections from all networks
To allow all connection from a private IP address range
To deny all connections from all networks
To allow all connections from all networks
Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account?
Azure Defender for Storage
Azure Storage Account Security Feature
Encryption in transit
Azure Defender for Storage
Which of the following technologies typically provide an ingestion point for data streaming in an event processing solution that uses static data as a source?
Azure IoT Hub
Azure Blob storage
Azure Event Hubs
Azure Blob storage
To consume processed event streaming data in near-real-time to produce dashboards containing rich visualizations, which of the following services should you use?
Azure Cosmos DB
Event Hubs
Power BI
Power BI
Applications that publish messages to Azure Event Hub very frequently will get the best performance using Advanced Message Queuing Protocol (AMQP) because it establishes a persistent socket.
True
False
True
By default, how many partitions will a new Event Hub have?
1
2
3
4
4
What is the maximum size for a single publication (individual or batch) that is allowed by Azure Event Hub?
256 KB
512 KB
1 MB
2 MB
1 MB
Which of the definitions below best describes a Tumbling window?
A windowing function that clusters together events that arrive at similar times, filtering out periods of time in which there is no data.
A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.
A windowing function that groups events by identical timestamp values.
A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.
Which of the following services is an invalid input for an Azure Stream Analytics job?
Blob storage
Azure Cosmos DB
Azure Event Hubs
Azure Cosmos DB
Below is a list of key benefits of using Azure Stream Analytics to process streaming data. Which of the following statements is incorrect?
The ability to write and test transformation queries in the Azure portal
Being able to rapidly deploy queries into production by creating and starting an Azure Stream Analytics job
Integration with Azure Blob storage
Integration with Azure Blob storage
Which technology is typically used as a staging area in a modern data warehousing architecture?
Azure Data Lake.
Azure Synapse SQL Pools.
Azure Synapse Spark Pools.
Azure Data Lake.
Which component enables you to perform code free transformations in Azure Synapse Analytics?
Studio.
Copy activity.
Mapping Data Flow.
Mapping Data Flow.
Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions?
Lookup.
Conditional Split.
Select.
Conditional Split.
Which transformation is used to load data into a data store or compute resource?
Window.
Source.
Sink.
Sink.
In which of the following table types should an insurance company store details of customer attributes by which claims will be aggregated?
Staging table
Dimension table
Fact table
Dimension table
You create a dimension table for product data, assigning a unique numeric key for each row in a column named ProductKey. The ProductKey is only defined in the data warehouse. What kind of key is ProductKey?
A surrogate key
An alternate key
A business key
A surrogate key
What distribution option would be best for a sales fact table that will contain billions of records?
HASH
ROUND_ROBIN
REPLICATE
HASH
You need to write a query to return the total of the UnitsProduced numeric measure in the FactProduction table aggregated by the ProductName attribute in the FactProduct table. Both tables include a ProductKey surrogate key field. What should you do?
Use two SELECT queries with a UNION ALL clause to combine the rows in the FactProduction table with those in the FactProduct table.
Use a SELECT query against the FactProduction table with a WHERE clause to filter out rows with a ProductKey that doesn’t exist in the FactProduct table.
Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.
Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.
You use the RANK function in a query to rank customers in order of the number of purchases they have made. Five customers have made the same number of purchases and are all ranked equally as 1. What rank will the customer with the next highest number of purchases be assigned?
2
6
1
6
You need to compare approximate production volumes by product while optimizing query response time. Which function should you use?
COUNT
NTILE
APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT
How does splitting source files help maintain good performance when loading into Synapse Analytics?
optimized processing of smaller file sizes.
Compute node to storage segment alignment.
Reduced possibility of data corruptions.
Compute node to storage segment alignment.
Which Workload Management capability manages minimum and maximum resource allocations during peak periods?
Workload Isolation.
Workload Importance.
Workload Containment.
Workload Isolation.
Which T-SQL Statement loads data directly from Azure Storage?
LOAD DATA.
COPY.
INSERT FROM FILE.
COPY.
How does splitting source files help maintain good performance when loading into Synapse Analytics?
optimized processing of smaller file sizes.
Compute node to storage segment alignment.
Reduced possibility of data corruptions.
Compute node to storage segment alignment.
Which Workload Management capability manages minimum and maximum resource allocations during peak periods?
Workload Isolation.
Workload Importance.
Workload Containment.
Workload Isolation.
Which T-SQL Statement loads data directly from Azure Storage?
LOAD DATA.
COPY.
INSERT FROM FILE.
COPY.
Which Azure Data Factory component orchestrates a transformation job or runs a data movement command?
Linked Services
Datasets
Activities
Activities
You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity?
Azure-SSIS
Azure
Self-hosted
Azure
In Azure Data Factory authoring tool, where would you find the Copy data activity?
Move & Transform
Batch Service
Databricks
Move & Transform
You want to ingest data from a SQL Server database hosted on an on-premises Windows Server. What integration runtime is required for Azure Data Factory to ingest data from the on-premises server?
Azure-SSIS Integration Runtime
Self-Hosted Integration Runtime
Azure Integration Runtime
Self-Hosted Integration Runtime
By default, how long are the Azure Data Factory diagnostic logs retained for?
15 days
30 days
45 days
45 days
Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions?
Lookup.
Conditional Split.
Select.
Conditional Split
Which transformation is used to load data into a data store or compute resource?
Window.
Source.
Sink.
Sink
Which SCD type would you use to keep history of changes in dimension members by adding a new row to the table for each change?
Type 1 SCD.
Type 2 SCD.
Type 3 SCD.
Type 2 SCD.
Which SCD type would you use to update the dimension members without keeping track of history?
Type 1 SCD.
Type 2 SCD.
Type 3 SCD.
Type 1 SCD.
What is a supported connector for built-in parameterization?
Azure Data Lake Storage Gen2
Azure Synapse Analytics
Azure Key Vault
Azure Synapse Analytics
What is an example of a branching activity used in control flows?
The If-condition
Until-condition
Lookup-condition
The If-condition
In which version of SQL Server was SSIS Projects introduced?
SQL Server 2008.
SQL Server 2012.
SQL Server 2016.
SQL Server 2012.
Which tool is used to perform an assessment of migrating SSIS packages to Azure SQL Database services?
Data Migration Assistant.
Data Migration Assessment.
Data Migration Service.
Data Migration Assistant.
Which tool is used to create and deploy SQL Server Integration Packages on an Azure-SSIS integration runtime, or for on-premises SQL Server?
SQL Server Data Tools.
SQL Server Management Studio.
dtexec.
SQL Server Data Tools.
Which version control software does Azure Data Factory integrate with?
Team Foundation Server.
Source Safe.
Git repositories.
Git repositories.
Which feature commits the changes of Azure Data Factory work in a custom branch created with the main branch in a Git repository?
Repo.
Pull request.
Commit.
Pull request.
Which feature in alerts can be used to determine how an alert is fired?
Add rule.
Add severity.
Add criteria.
Add criteria.
Suppose you have two video files stored as blobs. One of the videos is business-critical and requires a replication policy that creates multiple copies across geographically diverse datacenters. The other video is non-critical, and a local replication policy is sufficient. Which of the following options would satisfy both data diversity and cost sensitivity consideration.
Create a single storage account that makes use of Local-redundant storage (LRS) and host both videos from here.
Create a single storage account that makes use of Geo-redundant storage (GRS) and host both videos from here.
Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.
Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.
The name of a storage account must be:
Unique within the containing resource group.
Unique within your Azure subscription.
Globally unique.
Globally unique.
In a typical project, when would you create your storage account(s)?
At the beginning, during project setup.
After deployment, when the project is running.
At the end, during resource cleanup.
At the beginning, during project setup.
How many access keys are provided for accessing your Azure storage account?
1
2
3
4
2
You can use either the REST API or the Azure client library to programmatically access a storage account. What is the primary advantage of using the client library?
Cost
Availability
Localization
Convenience
Convenience
Which of the following is a good analogy for the access keys of a storage account?
IP Address
REST Endpoint
Username and password
Cryptographic algorithm
Username and password
You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use?
CORS Support
Storage Account
Shared Access Signatures
Shared Access Signatures
When configuring network access to your Azure Storage Account, what is the default network rule?
To allow all connections from all networks
To allow all connection from a private IP address range
To deny all connections from all networks
To allow all connections from all networks
Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account?
Azure Defender for Storage
Azure Storage Account Security Feature
Encryption in transit
Azure Defender for Storage