Build data analytics solutions using Azure Synapse serverless SQL pools Flashcards
Azure Synapse Analytics
- includes serverless SQL pools, which are tailored for querying data in a data lake
- can use SQL code to query data in files of various common formats without needing to load the file data into database storage.
Azure Synapse SQL
a distributed query system in Azure Synapse Analytics
Azure Synapse SQL runtime environments
- Serverless SQL pool
- Dedicated SQL pool
Dedicated SQL pool
Enterprise-scale relational database instances used to host data warehouses in which data is stored in relational tables
Serverless SQL pool
- primarily used to work with data in a data lake
- pay-per-query endpoint to query the data in your data lake
Benefits of using an SQL pool
- A familiar Transact-SQL syntax to query data in place (no load)
- Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools
- Distributed query processing that is built for large-scale data
- Built-in query execution fault-tolerance - high success rate for long queries
- No infrastructure to setup or clusters to maintain.
- No charge for resources reserved, only for queries
When to use serverless SQL pools
- tailored for querying the data in the data lake
- great for unplanned or “bursty” workloads
- Workloads that require millisecond response times and are looking to pinpoint a single row in a data set are not good fit for serverless SQL pool.
Common use cases for serverless SQL pools include:
- Data exploration
- Data transformation
- Logical data warehouse.
Logical data warehouse
- can define external objects such as tables and views in a serverless SQL database.
- emains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server
File formats that can be queries
- Delimited text, such as comma-separated values (CSV) files.
- JavaScript object notation (JSON) files.
- Parquet files.
External data source
- if you plan to query data in the same location frequently, it’s more efficient to define an external data source that references that location
- benefit of an external data source, is that you can simplify an OPENROWSET query to use the combination of the data source and the relative path to the folders or files you want to query
- ou can assign a credential for the data source to use when accessing the underlying storage, enabling you to provide access to data through SQL without permitting users to access the data directly in the storage account.
External file format
encapsulate settings for delimited text files
CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ‘,’,
STRING_DELIMITER = ‘”’
)
);
GO
External table
using the OPENROWSET function can result in complex code that includes data sources and file paths. To simplify access to the data, you can encapsulate the files in an external table
Some delimiter file settings
- With and without a header row.
- Comma and tab-delimited values.
- Windows and Unix style line endings.
- Non-quoted and quoted values, and escaping characters.
Persist the results of a query in an external table
- CREATE EXTERNAL TABLE AS SELECT (CETAS)
- an external table, stores its data in a file in the data lake.
CETAS data sources
- an existing table
- view in a database
- OPENROWSET function that reads file-based data from the data lake