DP-200 - Monitor and optimise data solution Flashcards
A company has an Azure SQL database. The database contains tables that have masked columns. The company wants to identify when a user tries to attempt to access the data from any one of the masked columns. Which of the following would you use for this requirement?
A. Azure Advanced Threat Protection
B. Auditing
C. Transparent Data Encryption
D. Azure Monitor Audit Logs
B - Auditing
In Azure Synapse Analytics what is workload management?
Workload management provides the capability to prioritise the query workloads that take place on the server. It is managed by three related areas:
Workload groups
Workload classification
Workload importance
In Azure Synapse Analytics workload management what are workload groups?
Workload groups enable you to define the resources to isolate and reserve resources for its use.
In In Azure Synapse Analytics workload management what is workload classification?
Using T-SQL, you can create a workload classifier to map queries to a specific classifier. A classifier can define the level of importance of the request, so that it can be mapped to a specific workload group that has an allocation of specific resources during the execution of the query.
In In Azure Synapse Analytics workload management what is workload classification?
Workload importance is defined in the CREATE WORKLOAD CLASSIFIER command, and enables higher priority queries to receive resources ahead of lower priority queries that are in the queue. By default, queries are released from the queue on a first-in, first-out basis as resources become available, but workload importance overrides this qualifier.
What are the benefits of workload groups?
Reserves resources for a group of requests.
Limits the amount of resources a group of requests can consume.
Accesses shared resources based on importance level.
Sets query timeout value. Gets DBAs out of the business of terminating runaway queries.
In In Azure Synapse Analytics workload management what is workload importance?
Workload importance is defined in the CREATE WORKLOAD CLASSIFIER command, and enables higher priority queries to receive resources ahead of lower priority queries that are in the queue. By default, queries are released from the queue on a first-in, first-out basis as resources become available, but workload importance overrides this qualifier.
What is the benefit of result-set cache?
In scenarios where the same results are requested on a regular basis, result-set caching can improve the performance of the queries that retrieve these results. When result-set caching is enabled, the results of the query are cached in the SQL pool storage.
What is a materialised view?
A materialized view can pre-compute, store, and maintain data like a table. These views are automatically updated when data in underlying tables are changed.
What is descriptive analytics?
Descriptive analytics answers the question “What is happening in my business?” The data to answer this question is typically found through the creation of a data warehouse.
What is diagnostic analytics?
Diagnostic analytics deals with answering the question “Why is it happening?” This type of analytics may involve exploring information that already exists in a data warehouse, but typically involves a wider search of your data estate to find more data to support this type of analysis.
What is predictive analytics?
Azure Synapse Analytics also enables you to answer the question “What is likely to happen in the future based on previous trends and patterns?” by using its integrated Apache Spark engine. This can also be used with other services, such as Azure Machine Learning Services, or Azure Databricks.
What is prescriptive analytics?
This type of analytics looks at executing actions based on real-time or near real-time analysis of data, using predictive analytics. Azure Synapse Analytics provides this capability through both Apache Spark, Azure Synapse Link, and by integrating streaming technologies such as Azure Stream Analytics.
What is the best way to increase the performance of Azure Synapse Analytics?
Change the service level which alters the number of Data Warehouse Units (DWUs) that are allocated to the system.
What are the three sharding patterns used within Azure Synapse Analytics?
Hash
Round Robin
Replicate
In relation to Azure Synapse Analytics; what is hash (sharding)?
A hash-distributed table can deliver the highest query performance for joins and aggregations on large tables.
To shard data into a hash-distributed table, Azure Synapse Analytics uses a hash function to assign each row to one distribution deterministically. In the table definition, one of the columns is designated as the distribution column. The hash function uses the values in the distribution column to assign each row to a distribution.
In relation to Azure Synapse Analytics; what is round robin (sharding)?
Distributes evenly across all nodes. Great for temporary/staging tables. Data with no obvious joining key or good candidate column. BUT performance is slow due to data movement.
In relation to Azure Synapse Analytics; what is replicated (sharding)?
A replicated table provides the fastest query performance for small tables. Small-dimension tables in a star schema with less than 2GB of storage after compression (~5x compression).
A table that is replicated caches a full copy on each compute node. Consequently, replicating a table removes the need to transfer data among compute nodes before a join or aggregation. Extra storage is required, and there are additional overheads that are incurred when writing data which make large tables impractical.
Which of the following terms refers to the compute scale that’s used in a data warehouse in Azure Synapse Analytics?
RTU
DWU
DTU
DWU - Data warehouse unit
You have an Azure Synapse Analytics database. Within the database, you have a dimension table named Stores that contains store information in three columns: StoreID as an integer, StoreName as varchar(200), and Category as as varchar(40). You have a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store, and finance information. You want to improve the query performance of these queries by configuring the table geometry of the Stores table. Which is the appropriate table geometry to select for the Stores table?
- Round-robin
- Non-clustered
- Replicated table
A replicated table is an appropriate table geometry choice because the size of the data in the table is less than 2 GB compressed. The table will be replicated to every distribution node of a data warehouse in Azure Synapse Analytics to improve performance.
What is a fan-out query? (Cosmos DB)
Fan-out queries check all partitions, which will cost you extra RU/s and may affect the performance of your application. Caused by queries without a partition key.
In CosmosDB, what problems can be caused with unevenly distributed data?
The total RU/s for a collection is divided amongst all partitions. This means that 1000 RU/s will be distributed across five partitions as 200 RU/s for each partition. If you try to write more than 200 RU/s to any of these partitions, calls will begin to fail as you cross the threshold.
A partition key can currently have 10 GB of data at maximum (this may change in future), making it important to use a partition key which fills all partitions efficiently.
What are the steps needed to enable Transparent Data Encryption for an Azure SQL database?
- Create a master key
- Create or obtain a certificate protected by the master key
- Create a database encryption key and protect it by the certificate
- Set the database to use encryption
What is strong consistency?
Strong consistency, you get the highest consistency, lower performance, and lowest availability. The latency is also high and this type of model is best suited for Inventory applications, Financial transactions, Scheduling, or Forecasting workloads.
What is bounded staleness?
the data is consistent beyond the user-defined time or operations threshold. The performance of bounded staleness is better than the strong consistency however the availability is still low due to inherent lag for the replication. This level is used for apps that don’t need to fetch data in real-time, however still in the order, it was written.
What is session consistency?
session consistency provides strong consistency for the session, ensuring the data stays up to date for any active read-write session. The availability of the data is relatively high with lower latency and higher throughput than the bounded staleness. The possible candidate for this kind of model could be a typical e-commerce application, social media app, and other similar services with persistent user connection.
What is consistent prefix?
The consistent prefix model is similar to bounded staleness except, the operational or time lag guarantee. The replicas guarantee the consistency and order of the writes however the data is not always current. This model ensures that the user never sees an out-of-order write.
For example, if data is written in the order A, B, and C, the user may either see A, A,B or A,B,C, but never out-of-order entry like A,C or B,A,C. This model provides high availability and very low latency which is best for certain applications that can afford the lag and still function as expected.
What is eventual consistency?
This model offers high availability and low latency along with the highest throughput of all. This model suits the application that does not require any ordering guarantee. The best usage of this type of model would be the count of retweets, likes, non-threaded comments where the count is more important than any other information.
What are the 3 types of integration run time (IR)?
Azure
Self-hosted
Azure-SSIS
When to use self-hosted IR?
You use a self-hosted integration runtime when you:
- Copy data between cloud and on-premises stores
- Copy data between on-premises stores
- Execute activities using on-premises stores and services
When to use Azure-SSIS IR?
Execute SSIS Packages through Azure Data Factory
When to use Azure IR?
You use an Azure integration runtime when you:
- Copy data between cloud stores
- Transform data between cloud stores using data flows
- Execute activities using cloud stores and services
What are the 5 types of windowing functions in stream analytics?
Tumbling window Hopping window Sliding window Session window Snapshot window