Azure Data Terms 01 Flashcards
ACID
Atomicity, consistency, isolation, durability. A set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.
AES
Advanced Encryption Standard. The encryption standard used for Transparent data encryption (TDE) in Microsoft SQL Server (all flavors).
BSON
Binary JSON. A binary-encoded serialization of JSON-like documents.
DMS
Azure Database Migration Service.
DPR
Supplier data protection requirements.
DQS
Data Quality Services. The data quality tool for Microsoft SQL Server.
DTU
Database transaction unit. On Microsoft Azure, a way of measuring computing resources for pricing, an alternative to serverless compute and provisioned compute pricing.
DWU
Data warehouse unit. In Azure Synapse Analytics, a measure based on CPU, memory, and I/O values.
EDA
Exploratory data analysis. An approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. One of the best practices in MLOps.
GRS
Geo-redundant storage. A redundancy option in Azure Storage in which there are three copies of the data in the primary region using locally-redundant storage (LRS) and another three copies in a second region also using LRS. The other options are LRS, zone-redundant storage (ZRS), and geo-zone-redundant storage (GZRS) which is the same as GRS except that the data in the primary region uses ZRS.
ISV
Independent software vendor.
LRS
Locally-redundant storage. A redundancy option in Azure Storage in which there are three copies of the data within the same datacenter. The other options are zone-redundant storage (ZRS), geo-redundant storage (GRS), and geo-zone-redundant storage (GZRS).
MDS
Master Data Services. The MDM tool for Microsoft SQL Server.
MLOps
Machine Learning Operations. The Machine Learning analog of DevOps, as used in Azure Machine Learning (Azure ML).
MPP
Massively parallel processing. The coordinated processing of a program by multiple processors working on different parts of the program. Each processor has its own operating system and memory. MPP speeds the performance of huge databases that deal with massive amounts of data. Contrast with Symmetric multiprocessing (SMP).
MQL
MongoDB Query Language. The query langugage used for the MongoDB API in Azure Cosmos DB.
MVCC
Multi-version concurrency control. A feature of PostgreSQL.
ORC
Optimized Row Columnar. A highly efficient file format used for Apache Hive data.
RDP
Remote Desktop Protocol.
RU
Request unit. The unit of measure for pricing in Azure Cosmos DB.
SID
Security identifier. A unique value that is used to identify any security entity that the Windows operating system can authenticate. A security entity can be a security principal - a user account, a computer account or a process started by those accounts - or it can be a security group.
SLO
Service Level Objective. A standardized combination of measures like CPU, memory, and I/O used for pricing tiers.
SMP
Symmetric multiprocessing. A multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally. Also called “shared-memory multiprocessing”. Contrast with massively parallel processing (MPP).
SQL-MI
Azure SQL Managed Instance.
SSPA
Supplier Security and Privacy Assurance. Microsoft’s program for ensuring that vendors adhere to data security and privacy standards.
TDE
Transparent data encryption. The feature in Microsoft SQL Server (all flavors) that encrypts data at rest.
TDS
Tabular Data Stream. An application layer protocol used to transfer data between a database server and a client.
TDSP
Team Data Science Process. An agile, iterative data science process.
Adjacency list
In graph theory and computer science, a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph.
Adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
Advanced Encryption Standard (AES)
The encryption standard used for Transparent data encryption (TDE) in Microsoft SQL Server (all flavors).
Apache Cassandra API
The API for column-family storage in Azure Cosmos DB.
Apache Gremlin API
The API for graph databases in Azure Cosmos DB.
Apache Hive
A distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.
Atomicity
In ACID, the guarantee that each transaction is treated as a single unit that either succeeds completely or fails completely, even if it’s made up of multiple statements.
Atomicity, consistency, isolation, durability (ACID)
A set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.
Batch layer
In the Lambda architecture, the layer that precomputes results using a distributed processing system that can handle very large quantities of data. The batch layer aims at perfect accuracy by being able to process all available data when generating views.
Binary JSON (BSON)
Binary JSON. A binary-encoded serialization of JSON-like documents.
Blocking transformation
In SQL Server Integration Services (SSIS), a data flow task that stalls the process, such as a large sort operation where there is not enough memory.
Card
In Microsoft Power BI, a report visual that focuses on displaying a single value.
Columnstore
Data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format. Contrast with rowstore.
Columnstore index
An index used for storing and querying large data warehouse fact and dimension tables.