Build data analytics solutions using Azure Synapse serverless SQL pools Flashcards

1
Q

Azure Synapse Analytics

A
  1. includes serverless SQL pools, which are tailored for querying data in a data lake
  2. can use SQL code to query data in files of various common formats without needing to load the file data into database storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Azure Synapse SQL

A

a distributed query system in Azure Synapse Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Azure Synapse SQL runtime environments

A
  1. Serverless SQL pool
  2. Dedicated SQL pool
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dedicated SQL pool

A

Enterprise-scale relational database instances used to host data warehouses in which data is stored in relational tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Serverless SQL pool

A
  1. primarily used to work with data in a data lake
  2. pay-per-query endpoint to query the data in your data lake
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Benefits of using an SQL pool

A
  1. A familiar Transact-SQL syntax to query data in place (no load)
  2. Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools
  3. Distributed query processing that is built for large-scale data
  4. Built-in query execution fault-tolerance - high success rate for long queries
  5. No infrastructure to setup or clusters to maintain.
  6. No charge for resources reserved, only for queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When to use serverless SQL pools

A
  1. tailored for querying the data in the data lake
  2. great for unplanned or “bursty” workloads
  3. Workloads that require millisecond response times and are looking to pinpoint a single row in a data set are not good fit for serverless SQL pool.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Common use cases for serverless SQL pools include:

A
  1. Data exploration
  2. Data transformation
  3. Logical data warehouse.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Logical data warehouse

A
  1. can define external objects such as tables and views in a serverless SQL database.
  2. emains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

File formats that can be queries

A
  1. Delimited text, such as comma-separated values (CSV) files.
  2. JavaScript object notation (JSON) files.
  3. Parquet files.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

External data source

A
  1. if you plan to query data in the same location frequently, it’s more efficient to define an external data source that references that location
  2. benefit of an external data source, is that you can simplify an OPENROWSET query to use the combination of the data source and the relative path to the folders or files you want to query
  3. ou can assign a credential for the data source to use when accessing the underlying storage, enabling you to provide access to data through SQL without permitting users to access the data directly in the storage account.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

External file format

A

encapsulate settings for delimited text files

CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ‘,’,
STRING_DELIMITER = ‘”’
)
);
GO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

External table

A

using the OPENROWSET function can result in complex code that includes data sources and file paths. To simplify access to the data, you can encapsulate the files in an external table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Some delimiter file settings

A
  1. With and without a header row.
  2. Comma and tab-delimited values.
  3. Windows and Unix style line endings.
  4. Non-quoted and quoted values, and escaping characters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Persist the results of a query in an external table

A
  1. CREATE EXTERNAL TABLE AS SELECT (CETAS)
  2. an external table, stores its data in a file in the data lake.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CETAS data sources

A
  1. an existing table
  2. view in a database
  3. OPENROWSET function that reads file-based data from the data lake
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of objects to be created to use with CETAS

A
  1. external data source
  2. external data format
18
Q

External data source

A
  1. encapsulates a connection to a file system location in a data lake
  2. use this connection to specify a relative path in which the data files for the external table crea
19
Q

LOCATION and BULK parameters

A

relative paths for the results and source files respectively

NB relative to the file system location referenced by the files external data source.

20
Q

External table

A

external tables are a metadata abstraction over the files that contain the actual data. Dropping an external table does not delete the underlying files.

21
Q

Benefits of stored procedures

A
  1. Reduces client to server network traffic (commands are executed in a single batch)
  2. Provides a security boundary
  3. Eases maintenance
  4. Improved performance (execution plan is held in the cache and reused on subsequent runs)
22
Q

a pipeline for the data transformation enables you to

A

schedule the operation to run

  1. at specific times
  2. based on specific events
23
Q

A lake database

A
  1. provides a relational metadata layer over one or more files in a data lake.
  2. can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns.
  3. , the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers.
24
Q

Lake database storage

A
  1. stored in the data lake as Parquet or CSV files
  2. can be managed independently of the database tables, making it easier to manage data ingestion and manipulation with a wide variety of data processing tools and technologies
25
Q

Lake database compute

A
  1. Azure Synapse serverless SQL pool
  2. Azure Synapse Apache Spark (Spark SQL API)
26
Q

Azure Synapse database designer

A
  1. can define the schema for your database
  2. Specifying the name and storage settings for each table.
  3. Specifying the names, key usage, nullability, and data types for each column.
  4. Defining relationships between key columns in tables.
27
Q

Serverless SQL pool authentication

A

how users prove their identity when connecting to the endpoint

28
Q

Two SQL Pool authentications

A
  1. SQL Authentication (username and password)
  2. Microsoft Entra authentication (managed by Microsoft Entra ID)
29
Q

Authorization

A
  1. is controlled by your user account’s database role memberships and object-level permissions.
30
Q

SQL Authentication

A

SQL user exists only in the serverless SQL pool and permissions are scoped to the objects in the serverless SQL pool. Access to securable objects in other services (such as Azure Storage) can’t be granted to a SQL user directly since it only exists in scope of serverless SQL pool. The SQL user needs get authorization to access the files in the storage account.

31
Q

Microsoft Entra authentication

A

a user can sign in to a serverless SQL pool and other services, like Azure Storage, and can grant permissions to the Microsoft Entra user

32
Q

Serverless SQL pool supports the following authorization types (for Azure storage)

A
  1. Anonymous access (publicly available files)
  2. Shared access signature (SAS)
  3. Managed Identity. (?)
  4. User Identity (?)
33
Q

Shared access signature (SAS)

A
  1. delegated access to resources in storage account
  2. can grant clients access to resources in storage account, without sharing account keys
  3. gives you granular control over the type of access you grant to clients who have the SAS: validity interval, granted permissions, acceptable IP address range, acceptable protocol
34
Q

Azure Storage Access Control Types

A
  1. Azure role-based access control (Azure RBAC)
  2. Access control lists (ACLs)
35
Q

ACL

A
  1. Each file and directory in your storage account has an access control list
  2. ACL check determines whether that security principal has the correct permission level to perform the operation.
36
Q

Kinds of access control lists:

A
  1. Access ACLs: access to an object
  2. Default ACLs: emplates of ACLs associated with a directory that determine the access ACLs for any child items that are created under that directory. Files do not have default ACLs.
37
Q

Permission types

A
  1. Read
  2. Write
  3. Execute
38
Q

Guidelines in setting up ACLs

A
  1. Always use Microsoft Entra security groups as the assigned principal in an ACL entry
  2. Resist the opportunity to directly assign individual users or service principals
39
Q

Roles: For users which need read only acces

A

Storage Blob Data Reader

40
Q
A