Azure DP-201 Flashcards

Question

What are advantages of SQL Data Warehouse (Synapse)?

Answer 1

- Parallel processing - multiple relational source data capture - handles complex queries - scales horizontally

Answer 2

- global replication - multi-model - for non-relational data

Answer 3

- Strong (Best consistency; Most expensive) - Bounded Staleness - Session - Consistent Prefix - Eventual (Weak consistency; Least Expensive)

Answer 4

1. Cosmos DB 2. Data Lake Gen2 3. BLOB storage

Answer 5

Logical | Physical

Answer 6

Partition Keys

Answer 7

- should be a property that will exist on every object - anticipate top queries - avoid fans - Keys are immutable. They cannot change

Answer 8

importing/exporting data between Azure BLOB storage and Synapse (Data Warehouse)

Answer 9

- An orchestration service. - Primary method for ingesting data into an Azure architecture. - Responsible for moving and monitoring the data

Answer 10

a group of compute resources

Answer 11

R, SQL, Python, Scala, Java

Answer 12

Exploration and visualization of data

Answer 13

Cluster: compute resources Workspace: "filing cabinet" for Databricks work Notebooks: "folders" that contain cells Cells: individual pieces of code Libraries: packages that provide additional functionality Tables: where structured data is stored

Answer 14

- enable creating checkpoints - configure jobs to restart on failure - recover after changes

Answer 15

Group jobs into pools by weight. By default, all queries run in a fair scheduler pool (first in first out). By grouping into pools by weight, you can allow more important jobs to go through first.

Answer 16

use "compute-optimized" instances.

Answer 17

A way to set thresholds for late data coming in from input streams.

Answer 18

- enable autoscaling - optimize configuration settings - group jobs into pools by weights - recover from query failures

Answer 19

Storage | Throughput

Answer 20

- How cost sensitive is the project? | - What is the end result of the data?

Answer 21

- Activity - Linked Service - Pipeline - Datasets - Pipeline execution and triggers - Integration runtime

Answer 22

Role Based Access Control A system that provides fine-grained access management of Azure resources. Using Azure RBAC, you can segregate duties within your team and grant only the amount of access to users that they need to perform their jobs.

Answer 23

Scope - the set of resources that the access applies to. | Role Definition - a collection of permissions

Answer 24

Allow you to secure your critical Azure service resources to only your virtual networks.

Answer 25

- Data Transformation - Data Movement - Control Flow

Answer 26

Transparent Data Encryption Protects Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics against threats encrypting data at rest

Answer 27

- BLOB Storage - IOT Hub - Event Hubs

Answer 28

Recovery Point Objective Maximum amount of data that can be lost when restoring backups

Answer 29

Recovery Time Objective? Maximum time that can elapse before a system is brought online

Answer 30

Confidentiality, Integrity, Availability

Answer 31

Shared Access Signature 1. User Delegation 2. Account 3. Service

Answer 32

1. Gremlin 2. MongoDB 3. Cassandra 4. Core 5. Azure Table Storage

Answer 33

- Pipeline Orchestration | - Inactive Pipelines

Answer 34

ExpressRoute

Answer 35

fully managed, real-time data ingestion service

Answer 36

HTTP | HTTPS

Answer 37

used to store and retrieve messages

Answer 38

Service that stores structured NoSQL data. Used to store large amounts of structured data. Azure tables are ideal for storing structured, non-relational data.

Answer 39

Azure Application Gateway is a web traffic load balancer that enables you to manage traffic to your web applications.

Answer 40

a specific type of virtual network gateway that is used to send encrypted traffic between an Azure virtual network and an on-premises location over the public Internet.

Answer 41

Add a virtual network to the (*Firewall?) Azure SQL server that hosts the database.

Answer 42

First week: hot After 1 months: cool After 1 year: cool Archive: Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements (on the order of hours).

Answer 43

Log records: SQL Social media mentions: Gremlin

Answer 44

Increase the RU's (Request Units)

Answer 45

Consistent (default mode) | None

Answer 46

1. Increase RU's (request units) | 2. Turn off indexing

Answer 47

ACL is better suited for implementing security at the individual user level and for low-level data, while RBAC better serves a company-wide security system with an overseeing administrator. An ACL can, for example, grant write access to a specific file, but it cannot determine how a user might change the file.

Answer 48

Create a Watermark Policy

Answer 49

IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud. Bi-directional communication capabilities mean that while you receive data from devices you can also send commands and policies back to devices. Event Hubs was designed for big data streaming.

Answer 50

Standard: - lowest cost per GB - best for apps that require bulk storage and data is accessed infrequently Premium: - consistent, low-latency performance - best for I/O intensive apps (like databases)

Answer 51

a command-line utility that you can use to copy blobs or files to or from a storage account

Answer 52

A managed, full-spectrum, open-source analytics service in the cloud for enterprises. A cloud distribution of Hadoop components.

Answer 53

Azure SQL Database Hyperscale A Hyperscale database supports up to 100 TB of data and provides high throughput and performance

Answer 54

Sales: choose Interactive Query cluster type to optimize for ad hoc, interactive queries. Account: Apache Hadoop cluster type to optimize for Hive queries used as a batch process.

Answer 55

Round-robin distributed table

Answer 56

Scale up by increasing the database throughput units. The cost of running an Azure SQL database instance is based on the number of Database Throughput Units (DTUs) allocated for the database. (Elastic pools is used if there are two or more databases.)

Answer 57

Step 1: Get your IoT hub ready for data access by adding a consumer group. Step 2: Create, configure, and run a Stream Analytics job for data transfer from your IoT hub to your Power BI account. Step 3: Create and publish a Power BI report to visualize the data.

Answer 58

C. Use an Azure Stream Analytics job in the cloud. Ingress data from the Azure Event Hub instance and build queries that output to Power BI. F. Send data from the application to an Azure Event Hub instance.

Answer 59

Consumption plan: Azure provides all of the necessary computational resources. You don't have to worry about resource management, and only pay for the time that your code runs. Premium plan: You specify a number of pre-warmed instances that are always online and ready to immediately respond. When your function runs, Azure provides any additional computational resources that are needed. You pay for the pre-warmed instances running continuously and any additional instances you use as Azure scales your app in and out. App Service plan: Run your functions just like your web apps. If you use App Service for your other applications, your functions can run on the same plan at no additional cost.

Answer 60

Azure Databricks A databrick job is a way of running a notebook or JAR either immediately or on a scheduled basis. Azure Databricks has two types of clusters: interactive and job. Interactive clusters are used to analyze data collaboratively with interactive notebooks. Job clusters are used to run fast and robust automated workloads using the UI or API. You can visualize Data with Azure Databricks and Power BI Desktop.

Answer 61

Azure Stream Analytics cloud job using Azure PowerShell

Answer 62

Azure Event Hubs | Azure Databricks

Answer 63

Enable Auditing. Run Vulnerability Assessment. Use Advanced Threat Protection

Answer 64

- Add classifications to the columns that contain sensitive data - Turn on Auditing and set the audit log destination to use Azure Blob storage

Answer 65

Deterministic encryption always generates the same encrypted value for any given plain text value. Using deterministic encryption allows point lookups, equality joins, grouping and indexing on encrypted columns.

Answer 66

Randomized encryption uses a method that encrypts data in a less predictable manner. Randomized encryption is more secure, but prevents searching, grouping, indexing, and joining on encrypted columns.

Answer 67

A feature in Azure SQL Database or SQL Server databases designed to protect sensitive data. Provides a separation between those who own the data and can view it, and those who manage the data but should have no access.

Answer 68

a contained database user

Answer 69

use personal access tokens

Answer 70

Azure Active Directory

Answer 71

- Each day, restore the data warehouse from a user-defined restore point to an available Azure region. - Each day, create Azure Firewall rules that allow access to the restored data warehouse. - If a failure occurs, update the connection strings to point to the recovered data warehouse.

Answer 72

Databricks

Answer 73

a URI that grants restricted access rights to Azure Storage resources. By distributing a shared access signature URI, you can grant clients access to a resource for a specified period of time, with a specified set of permissions.

Answer 74

the public access level for the blobs service

Answer 75

an application registration in Azure Active Directory (Azure AD)

Answer 76

Azure conditional access policies

Answer 77

- Deploy an Azure Stream Analytics job to each region in a paired region. - Monitor jobs in both regions for failure.

Answer 78

Failover group failover groups is a SQL Database feature that allows you to manage replication and failover of a group of databases on a SQL Database server or all databases in a Managed Instance to another region

Answer 79

Cold Storage

Answer 80

a simple, cost-effective solution for managing and scaling multiple databases that have varying and unpredictable usage demands. The databases in an elastic pool are on a single Azure SQL Database server and share a set number of resources at a set price.

Answer 81

High Concurrency with Autoscaling

Answer 82

with a High Concurrency cluster

Answer 83

a Spark application that uses Spark MLib.

Answer 84

From the storage account, you disable a hierarchical namespace, and you use access control lists (ACLs).

Answer 85

Speed layer: Data Factory Serving layer: Databricks

Answer 86

add classifications to the columns that contain sensitive data. You turn on Auditing and set the audit log destination to use Azure Blob storage.

Answer 87

Create a user-defined restore point before data is uploaded. Delete the restore point after data corruption checks complete.

Answer 88

You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.

Answer 89

Enable soft delete.

Answer 90

You create an Azure Stream Analytics job that uses an input from Event Hubs to count the posts that have the specified keywords, then and send the data to an Azure SQL database. You consume the data in Power BI by using DirectQuery mode.

Azure DP-201 Flashcards

(119 cards)