DP-200 - Monitor and optimise data solution Flashcards

1
Q

A company has an Azure SQL database. The database contains tables that have masked columns. The company wants to identify when a user tries to attempt to access the data from any one of the masked columns. Which of the following would you use for this requirement?

A. Azure Advanced Threat Protection

B. Auditing

C. Transparent Data Encryption

D. Azure Monitor Audit Logs

A

B - Auditing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In Azure Synapse Analytics what is workload management?

A

Workload management provides the capability to prioritise the query workloads that take place on the server. It is managed by three related areas:

Workload groups
Workload classification
Workload importance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In Azure Synapse Analytics workload management what are workload groups?

A

Workload groups enable you to define the resources to isolate and reserve resources for its use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In In Azure Synapse Analytics workload management what is workload classification?

A

Using T-SQL, you can create a workload classifier to map queries to a specific classifier. A classifier can define the level of importance of the request, so that it can be mapped to a specific workload group that has an allocation of specific resources during the execution of the query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In In Azure Synapse Analytics workload management what is workload classification?

A

Workload importance is defined in the CREATE WORKLOAD CLASSIFIER command, and enables higher priority queries to receive resources ahead of lower priority queries that are in the queue. By default, queries are released from the queue on a first-in, first-out basis as resources become available, but workload importance overrides this qualifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the benefits of workload groups?

A

Reserves resources for a group of requests.
Limits the amount of resources a group of requests can consume.
Accesses shared resources based on importance level.
Sets query timeout value. Gets DBAs out of the business of terminating runaway queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In In Azure Synapse Analytics workload management what is workload importance?

A

Workload importance is defined in the CREATE WORKLOAD CLASSIFIER command, and enables higher priority queries to receive resources ahead of lower priority queries that are in the queue. By default, queries are released from the queue on a first-in, first-out basis as resources become available, but workload importance overrides this qualifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the benefit of result-set cache?

A

In scenarios where the same results are requested on a regular basis, result-set caching can improve the performance of the queries that retrieve these results. When result-set caching is enabled, the results of the query are cached in the SQL pool storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a materialised view?

A

A materialized view can pre-compute, store, and maintain data like a table. These views are automatically updated when data in underlying tables are changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is descriptive analytics?

A

Descriptive analytics answers the question “What is happening in my business?” The data to answer this question is typically found through the creation of a data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is diagnostic analytics?

A

Diagnostic analytics deals with answering the question “Why is it happening?” This type of analytics may involve exploring information that already exists in a data warehouse, but typically involves a wider search of your data estate to find more data to support this type of analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is predictive analytics?

A

Azure Synapse Analytics also enables you to answer the question “What is likely to happen in the future based on previous trends and patterns?” by using its integrated Apache Spark engine. This can also be used with other services, such as Azure Machine Learning Services, or Azure Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is prescriptive analytics?

A

This type of analytics looks at executing actions based on real-time or near real-time analysis of data, using predictive analytics. Azure Synapse Analytics provides this capability through both Apache Spark, Azure Synapse Link, and by integrating streaming technologies such as Azure Stream Analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best way to increase the performance of Azure Synapse Analytics?

A

Change the service level which alters the number of Data Warehouse Units (DWUs) that are allocated to the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three sharding patterns used within Azure Synapse Analytics?

A

Hash
Round Robin
Replicate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In relation to Azure Synapse Analytics; what is hash (sharding)?

A

A hash-distributed table can deliver the highest query performance for joins and aggregations on large tables.

To shard data into a hash-distributed table, Azure Synapse Analytics uses a hash function to assign each row to one distribution deterministically. In the table definition, one of the columns is designated as the distribution column. The hash function uses the values in the distribution column to assign each row to a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In relation to Azure Synapse Analytics; what is round robin (sharding)?

A

Distributes evenly across all nodes. Great for temporary/staging tables. Data with no obvious joining key or good candidate column. BUT performance is slow due to data movement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In relation to Azure Synapse Analytics; what is replicated (sharding)?

A

A replicated table provides the fastest query performance for small tables. Small-dimension tables in a star schema with less than 2GB of storage after compression (~5x compression).

A table that is replicated caches a full copy on each compute node. Consequently, replicating a table removes the need to transfer data among compute nodes before a join or aggregation. Extra storage is required, and there are additional overheads that are incurred when writing data which make large tables impractical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following terms refers to the compute scale that’s used in a data warehouse in Azure Synapse Analytics?

RTU
DWU
DTU

A

DWU - Data warehouse unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

You have an Azure Synapse Analytics database. Within the database, you have a dimension table named Stores that contains store information in three columns: StoreID as an integer, StoreName as varchar(200), and Category as as varchar(40). You have a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store, and finance information. You want to improve the query performance of these queries by configuring the table geometry of the Stores table. Which is the appropriate table geometry to select for the Stores table?

  • Round-robin
  • Non-clustered
  • Replicated table
A

A replicated table is an appropriate table geometry choice because the size of the data in the table is less than 2 GB compressed. The table will be replicated to every distribution node of a data warehouse in Azure Synapse Analytics to improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a fan-out query? (Cosmos DB)

A

Fan-out queries check all partitions, which will cost you extra RU/s and may affect the performance of your application. Caused by queries without a partition key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In CosmosDB, what problems can be caused with unevenly distributed data?

A

The total RU/s for a collection is divided amongst all partitions. This means that 1000 RU/s will be distributed across five partitions as 200 RU/s for each partition. If you try to write more than 200 RU/s to any of these partitions, calls will begin to fail as you cross the threshold.

A partition key can currently have 10 GB of data at maximum (this may change in future), making it important to use a partition key which fills all partitions efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the steps needed to enable Transparent Data Encryption for an Azure SQL database?

A
  • Create a master key
  • Create or obtain a certificate protected by the master key
  • Create a database encryption key and protect it by the certificate
  • Set the database to use encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is strong consistency?

A

Strong consistency, you get the highest consistency, lower performance, and lowest availability. The latency is also high and this type of model is best suited for Inventory applications, Financial transactions, Scheduling, or Forecasting workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is bounded staleness?

A

the data is consistent beyond the user-defined time or operations threshold. The performance of bounded staleness is better than the strong consistency however the availability is still low due to inherent lag for the replication. This level is used for apps that don’t need to fetch data in real-time, however still in the order, it was written.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is session consistency?

A

session consistency provides strong consistency for the session, ensuring the data stays up to date for any active read-write session. The availability of the data is relatively high with lower latency and higher throughput than the bounded staleness. The possible candidate for this kind of model could be a typical e-commerce application, social media app, and other similar services with persistent user connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is consistent prefix?

A

The consistent prefix model is similar to bounded staleness except, the operational or time lag guarantee. The replicas guarantee the consistency and order of the writes however the data is not always current. This model ensures that the user never sees an out-of-order write.

For example, if data is written in the order A, B, and C, the user may either see A, A,B or A,B,C, but never out-of-order entry like A,C or B,A,C. This model provides high availability and very low latency which is best for certain applications that can afford the lag and still function as expected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is eventual consistency?

A

This model offers high availability and low latency along with the highest throughput of all. This model suits the application that does not require any ordering guarantee. The best usage of this type of model would be the count of retweets, likes, non-threaded comments where the count is more important than any other information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the 3 types of integration run time (IR)?

A

Azure
Self-hosted
Azure-SSIS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

When to use self-hosted IR?

A

You use a self-hosted integration runtime when you:

  • Copy data between cloud and on-premises stores
  • Copy data between on-premises stores
  • Execute activities using on-premises stores and services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

When to use Azure-SSIS IR?

A

Execute SSIS Packages through Azure Data Factory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

When to use Azure IR?

A

You use an Azure integration runtime when you:

  • Copy data between cloud stores
  • Transform data between cloud stores using data flows
  • Execute activities using cloud stores and services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the 5 types of windowing functions in stream analytics?

A
Tumbling window
Hopping window
Sliding window
Session window
Snapshot window
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a tumbling window function?

A

Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window.

35
Q

What is a hopping window?

A

Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.

36
Q

What is a sliding window?

A

Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually changes. In other words, when an event enters or exits the window. So, every window has at least one event. Similar to Hopping windows, events can belong to more than one sliding window.

37
Q

What is a session window?

A

Session window functions group events that arrive at similar times, filtering out periods of time where there is no data. It has three main parameters: timeout, maximum duration, and partitioning key (optional).

A session window begins when the first event occurs. If another event occurs within the specified timeout from the last ingested event, then the window extends to include the new event. Otherwise if no events occur within the timeout, then the window is closed at the timeout.

If events keep occurring within the specified timeout, the session window will keep extending until maximum duration is reached. The maximum duration checking intervals are set to be the same size as the specified max duration. For example, if the max duration is 10, then the checks on if the window exceed maximum duration will happen at t = 0, 10, 20, 30, etc.

When a partition key is provided, the events are grouped together by the key and session window is applied to each group independently. This partitioning is useful for cases where you need different session windows for different users or devices.

38
Q

What is a snapshot window?

A

Snapshot windows groups events that have the same timestamp. Unlike other windowing types, which require a specific window function (such as SessionWindow(), you can apply a snapshot window by adding System.Timestamp() to the GROUP BY clause.

39
Q

In Azure SQL data warehouse, what would you implement to get users to connect with two-factor authentication?

A

Conditional access policies - this can also help with:

  • -Blocking sign ins
  • -Blocking or granting access from specific locations
  • -Blocking risky sign-in behaviours
    • Requiring organisation managed devices for specific applications
40
Q

What is TDE?

A

Transparent data encryption

41
Q

Is TDE enabled automatically for all newly deployed Azure SQL databases?

A

Yes

42
Q

Data masking: What is default value?

A

Default value, which displays the default value for that data type instead.

43
Q

Data masking: Credit card value

A

Credit card value, which only shows the last four digits of the number, converting all other numbers to lower case x’s.

44
Q

Data masking: Email

A

which hides the domain name and all but the first character of the email account name.

45
Q

Data masking: Number

A

which specifies a random number between a range of values. For example, on the credit card expiry month and year, you could select random months from 1 to 12 and set the year range from 2018 to 3000.

46
Q

Data masking: Custom string

A

which allows you to set the number of characters exposed from the start of the data, the number of characters exposed from the end of the data, and the characters to repeat for the remainder of the data.

47
Q

Data masking: Can non-administrators see the non-masked values?

A

No, but this can be enabled by adding specific SQL users to an exclusion list.

48
Q

How would you locate suspicious activity on a database in Azure?

A

Azure SQL Database auditing

49
Q

** is required if you wish to use Azure threat detection

A

Auditing

50
Q

Where are audit logs written to?

A

Append blobs in a designated Azure Blob storage account

51
Q

What is Data discovery & classification?

A

Provides capabilities built into Azure SQL Database for discovering, classifying, labeling & protecting the sensitive data in your databases. It can be used to provide visibility into your database classification state, and to track the access to sensitive data within the database and beyond its borders.

52
Q

What is vulnerability assessment?

A

Is an easy to configure service that can discover, track, and help you remediate potential database vulnerabilities. It provides visibility into your security state, and includes actionable steps to resolve security issues, and enhance your database fortifications.

53
Q

What is Advanced Threat Protection (ADS)?

A

Detects anomalous activities indicating unusual and potentially harmful attempts to access or exploit your database. It continuously monitors your database for suspicious activities, and provides immediate security alerts on potential vulnerabilities, SQL injection attacks, and anomalous database access patterns. Advanced Threat Protection alerts provide details of the suspicious activity and recommend action on how to investigate and mitigate the threat.

54
Q

When Advanced Threat Protection (ADS) is enabled and Advanced Threat Protection Types is set to ‘All’ what values will we see?

A

SQL injection reports where SQL injection attacks have occurred.
SQL injection vulnerability reports where the possibility of a SQL injection is likely.
Anomalous client login looks at logins that are irregular and could be cause for concern, such as a potential attacker gaining access.

55
Q

Advanced Threat Protection (ADS) emails will send notifications to which users by default?

A

Sends the threats to the service administrators.

56
Q

Which of the following is the most efficient way to secure a database to allow only access from a VNet while restricting access from the internet?

An allow access to Azure services rule

A server-level IP address rule

A server-level virtual network rule

A database-level IP address rule

A

A server-level virtual network rule will allow you to allow connectivity from specific Azure VNet subnets, and will block access from the internet. This is the most efficient manner to secure this configuration.

57
Q

A mask has been applied to a column in the database that holds a user’s email address, laura@contoso.com. From the list of options, what would the mask display when a database administrator account accesses user data?

lxxx@xxxx.com

laura@xxxxxxx.com

laura@contoso.com

Data not available

A

laura@contoso.com
When database administrator accounts access data that have a mask applied, the mask is removed, and the original data is visible.

58
Q

Transparent Data Encryption will encrypt which database files?

Database files only

Log files and backup files only

Backup files only

Database files, log files, and backup files

A

Database files, log files, and backup files

Transparent Data Encryption encrypts all database, log, and backup files. When new Azure SQL databases are created, Transparent Data Encryption will be enabled by default.

59
Q

Is encrypted communication turned on automatically when connecting to an Azure SQL Server?

A

Yes

Azure SQL Database enforces encryption (SSL/TLS) at all times for all connections

60
Q

What is deterministic encryption?

A

always generates the same encrypted value for any given plain text value. Using deterministic encryption allows point lookups, equality joins, grouping and indexing on encrypted columns. However, it may also allow unauthorized users to guess information about encrypted values by examining patterns in the encrypted column, especially if there’s a small set of possible encrypted values, such as True/False, or

61
Q

What is randomised encryption?

A

uses a method that encrypts data in a less predictable manner. Randomized encryption is more secure, but prevents searching, grouping, indexing, and joining on encrypted columns.

62
Q

When accessing data in Azure Data Lake Storage via a virtual network service endpoint, you have to use a

A

Managed Identity Authentication

63
Q

A single logical partition has an upper limit of…

A

10GB storage

64
Q

What is the best way to choose a partition key?

A
  • Wide range of values and access patterns that are evenly spread across logical partitions.
  • That spreads the workload evenly and over time.

Good candidates are properties that appear frequently as a filter.

65
Q

What distribution would a Fact table use?

A

Hash distributed

66
Q

What distribution would a Dimension table use?

A

Replicated for smaller tables, if tables are too large to store on each compute node, use hash.

67
Q

What distribution to use for staging?

A

Round robin.

68
Q

In Azure Monitor, which resource type would you select to send an alert to the support department when the DataWarehouse consumes the maximum allotted resources to it?

A

SQL Data warehouse as this is for the consumption of this particular resource.

69
Q

In Azure databricks if you wanted to connect to Azure Data Lake Storage Gen2 with Active Directory, which cluser would you use?

A

High concurrency - They can be shared by multiple users but only support python, SQL and R.

70
Q

A standard databricks cluster with credential passthrough can support how many users?

A

One and only supports Python, SQL and Scala.

71
Q

To change the language in Databricks’ cells to either Scala, SQL, Python or R, prefix the cell with…

A

’%’, followed by the language. You will not need to prefix the cell with anything if you are using your primary language.

72
Q

What are the steps required to load data from Azure Data Lake Gen 2 to an Azure SQL Data warehouse?

A
  • Create a database scoped credential

- Create an external data source using ‘abfs’ as the file location

73
Q

In Azure Monitor, what are some of the locations that you can send diagnostic logs to?

A

Storage Account: Save your diagnostic logs to a storage account for auditing or manual inspection. You can use the diagnostic settings to specify the retention time in days.

Event Hub: Stream the logs to Azure Event Hubs. The logs become input to a partner service/custom analytics solution like Power BI.

Log Analytics: Analyze the logs with Log Analytics. The Data Factory integration with Azure Monitor is useful in the following scenarios:

74
Q

What is reference data?

A

(Lookup table) is a finite data set that is static or slowly changing in nature, used to perform a lookup.

75
Q

Stream Analytics supports which 2 locations for reference data?

A

Azure blob storage and Azure SQL Datawarehouse

76
Q

What are the 3 ways to create an exported copy of an Azure SQL Database?

A

Export to BACPAC using SSMS, save to storage account.

Export to a BACPAC file using Powershell and save locally

Export to BACPAC using SqlPackage utility.

77
Q

What is the maximum BACPAC file size in blob storage?

A

200GB

78
Q

What steps would you take to pull data from an on-premise SQL server and migrate the data to Azure Blob Stoage?

A

Create a Virtual Private Network Connection from the on-prem network to Azure.

Create new ADF resource

Create self-hosted integration runtime

79
Q

In Azure stream analytics, if you want to ensure that there are enough processing resources to handle the load of additional devices, which metric would you track?

A

Watermark delay

80
Q

If a company wants to migrate a set of on-premisis SQL databases to Azure with a simple lift and shift process, how would they do it?

A

Use Azure SQL Database managed instance

81
Q

If you want to host a single database on Azure (lift shift) what would you use?

A

Azure SQL database single database

82
Q

HDFS - What is a Datanode?

A

Used to perform block creation, deletion and replication upon instruction from the NameNode

83
Q

HDFS - What is a Namenode?

A

Executes file system namespace operations like opening, closing and renaming files and directories. Also determines mapping of blocks to datanodes

84
Q

In Azure storage accounts, how long are logs retained for?

A

If ‘Delete data’ is not specified then they are retained ‘indefinitely’