Model, query, and explore data in Azure Synapse Flashcards
function used to read the data that are stored in files in a data lake
OPENROWSET
difference between dedicated SQL pool and serverless SQL pool
serverless used to perform SQL queries on files; dedicated defines a relational database in which data can be stored and queried
external database object that encapsulates the connection info to a file location in a data lake store
DATA SOURCE
how to analyze parquet file using Spark
load the parquet file directly into a Spark dataframe (no need to first load data into a serverless SQL pool)
HASH vs ROUND_ROBIN
HASH provides good read performance for a large table by distributing records across compute nodes based on the hash key
ROUND_ROBIN distributes data evenly, but does not optimize queries on commonly used distribution key fields