Definitions Flashcards
- asterisk
“All columns” and useful when you need to retrieve all the columns at once.
5 V’s of Big Data
Volume
Velocity
Variety
Veracity
Value
@ (at) symbol
Used for variable identifiers in SQL Server
ABFS (Azure Blob Filesystem)
One of the primary access methods for data in Azure Data Lake Storage Gen2 is via the Hadoop FileSystem. Data Lake Storage Gen2 allows users of Azure Blob Storage access to a new driver, the Azure Blob File System driver or ABFS. ABFS is part of Apache Hadoop and is included in many of the commercial distributions of Hadoop. Using this driver, many applications and frameworks can access data in Azure Blob Storage without any code explicitly referencing Data Lake Storage Gen2.
ACID
OLTP concept:
-Atomicity
-Consistency
-Isolation
-Durability
AD (Active Directory)
Azure integrated active directory, using the same SID assigned to the current user logged in to the client computer. Using this authentication method allows you to define security boundaries by Active Directory groups, by creating SQL logins for the group, instead for users one by one. This way, the Active Directory admin could assign or revoke access permissions, managing the group memberships directly.
ADF (Azure Data Factory)
PaaS data movement and orchestration engine, and it shines in cloud or hybrid scenarios. It has a handy web UI for developing your pipelines. ADF has a strong integration with Azure DevOps, it provides a rich set of RESET APIs to interact with, and it has a prebuilt monitoring dashboard that lets you keep track of execution outcomes and resource consumption. You can also monitor activities through the Azure Monitor service.
AES (Advanced Encryption Standard)
Algorithm that uses the same key to encrypt and decrypt protected data. Uses an automatically generated certificate, which is rotated as needed, and there is no need to manage it from your side.
ALTER
Used to add, delete, or modify columns in an existing table. Changes part of the definition of an object, but not all changes are permitted. Can be used to add a column to a table but you cannot change the type of an existing column from a string type to a numeric one, if the data contains chars other than numbers.
Always On availability groups
SQL Server solution implemented for high availability and recovery. It uses Windows Server Failover Cluster and implements replicas between the members of the cluster. The replicas can be asynchronously committed when long distances must be covered, but usually the synchronous method is used.
Apache Hive
Data warehouse system for Apache Hadoop. Hive enables data summarization, querying, and analysis of data.
Apache Kafka
Open-source distributed streaming platform that can be used to build real-time streaming data pipeline and applications.
Apache Oozie
Workflow and coordination system that manages Hadoop jobs. Oozie is integrated with the Hadoop stack, and it supports the following jobs:
- Apache Hadoop MapReduce
- Apache Pig
- Apache Hive
- Apache Sqoop
You can also use Oozie to schedule jobs that are specific to a system, like Java programs or shell scripts.
Apache Spark
Unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine supports general execution graphs. See also Azure Databricks.
ARM (Azure Resource Manager)
Used to perform administrative tasks. ARM is the service that uses the portal to perform the tasks. The actions and parameters you choose are sent back to ARM for the portal to get the work done.