Azure Data Platforms Flashcards
Learn
What is TDE in Azure Security ?
Transparent Data Encryption
How does Azure Sql Protect Data in Flight
Always Encrypted feature that keeps data encrypted
even while in storage at rest, during query processing and on network.
What is Azure SQL dynamic data masking
It helps in exposing data to non-privileged users by hiding sensitive data in query
results. This also works transparently without changing the application.
How is Azure SQL users and groups managed?
With Active Directory
To what level does Azure Sql Security got down to?
To the Row level
What is Row level Security?
row level security is wherein users can access only those rows that they are authorized to.
Does Azure SQL Use White listing?
Yes, you can white list IP addresses to give access to those Ip addresses
What are Elastic Pools?
Azure provides elastic pools that help in hosting multiple databases
on the same SQL Server and puts them on the same pool. Each pool has dedicated
DTUs assigned to them known as eDTU.
How Often does Azure SQL perform backups?
It performs
a full weekly backup, hourly differential backups and backup of transaction logs every
five minutes.
How are backups stored ?
The backups are then stored in a geo-redundant storage which keeps
multiple copies across geographies. In case of a disaster, Azure support can help get
these backups and restore them.
What additional Disaster recovery options are there?
Azure SQL also provides geo-replication of databases with up to four readable
secondary databases in different regions. All these primary and secondary databases
are continuously synchronised using asynchronous replication. It also provides
capability for auto-failover to another database in another region.
Azure SQL provides point-in-time recovery and restores databases as well.
How is performance Calculated in Azure SQL?
In DTUs
What are DTUs?
DTUs are Database Transactional Units. They are a
combination of compute, memory and storage resources are calculated together.
What language does Azure Sql Use?
T-Sql
What are the Two components of Azure Sql?
The logical SQL server
The Databases
What are the levels of performance provided by
Basic, Standard, Premium and PremiumRS
What is a SLA?
Service Level Agreement?
What do you need to tie SLAs to for Disaster Recovery?
The companies RPO and RTOs
What is RPO?
Recovery Point Objective
What is RTO?
Recovery Time Objective
Explain RTO and RPO
The RTO, or recovery time objective, is the maximum length of time after an outage that your company is willing to wait for the recovery process to finish.
The RPO, or recovery point objective, is the maximum amount of data loss your company is willing to accept as measured in time.
Are high availability For Azure Sql the Same for DW?
Yes
What are Elastic Pools Dedicated DTUs called
eDTUs
What is Azure SQL?
Its a fully managed instance of SQL Server in Azure Cloud.
Is Sharding a Feature of Azure SQL ?
Yes
What is Azure Stream Analytics?
Stream analytics is a fully managed data and event processing engine that provides Extract-Transform-Load (ETL) functionality for real-time
processes and analytics on the data getting streamed.
What are the inputs for Stream Analytics?
Azure Blob storage, IOT hub, and Event Hub
What types of transformation is done In Stream Analytics?
filtering, augmentation or enrichment. It has a SQL type query language that helps in grouping, aggregation, joining
What is the core component of stream analytics?
The Stream analytics engine
How does the Stream analytics engine do?
It is here that the stream analytics
jobs are executed and every component interacts together to ETL the data. This
engine keeps running and continuously executes jobs over time-series windows
to ensure that real-time insights can be gathered.
What can you do with the Stream Analytics Stream
Send data
to durable storage to Azure blob storage or present data using visualisation tools,
such as Power BI or use it to send notifications.
Name some Stream analytics Destinations
Azure Blob Storage, Azure Data lake, Azure Power BI Event Hub(for another stream), Sql DB and DW
Can Stream Analytics Partition?
Yes on all Outputs except Power BI
What is Data Egress and Why does it matter?
It is the data going out to external sources. It is charged in Azure.
How is Data Egress handled for Stream Analytics?
If the data is not going out of region the cost is not applicable.
What the Stream Job topology?
The Inputs, the Query and the Outputs
Can Stream combine multiple inputs?
Yes
Can a Stream output to multiple outputs
Yes
Names some Tools That Can Access Azure Data Analytics
HDInsight Hadoop, Azure Machine learning, steam analytics, data factory, event hubs
Where is Azure Data Lake Available?
East and Central US and North Europe
What type of Storage is Azure Data lake?
Azure Data Lake is a hierarchical file-based storage just as a Windows,
Linux or Hadoop file-system that has been optimized for efficient querying and
storage for big data workloads.
Does Azure Data Lake have High Availability ?
Yes it is replicated 3 times in different Data centers
Does Azure Data Lake have Security?
Azure Data Lake has Role Based Access Control, Access control Lists, and Whitelisting
Can the data in Azure Data Lake be Encrypted?
It can be encrypted at rest, and then decryption Transparently after retrieval.
What is Azure Key vault?
A fault where encryption keys are held for services that store encrypted data.
What is Azure Data Lake Analytics?
Its a service that allows you to do analytic jobs against the data in the lake.
What language is used for the Analytical jobs in ADLS?
U - SQL
What is U - SQL?
Is a language that combines Sql and C # to form a querying language that is extensible
What data types are used in U-SQL
C# data types
What can U-SQL be used to query?
Azure SQL, IaaS SQL Server and Azure SQL data warehouse
How is Azure Analytics Charged?
Pay as you go, charged only when a job is running
Does Azure Analytics scale ?
Yes , it scales as needed while a job is running
What processing in used in Sql DW?
MPP
What is MPP?
Massively parallel Processing
What is SMP?
Symmetric Multi-Processing
The architecture of SQL DW is comprised of what?
The control Node and the compute nodes
What is a control node?
the control node is main braid responsible for for interactions with users and applications. It coordinates with compute nodes
what is a comput node?
the Compute node are the workers they store and retrieve data
What is the technology used by Sql DW for Querying?
Polybase
What is polybase?
Polybase is a technology that is used to query non relational data stores such as blob storage and Data lakes, it can read flat files.
What is the Underlining Storage for Sql DW
The underlying storage for Sql DW is blob storage
What is DMS and what does it do in Sql DW?
Data Movement Service, it works across all nodes , combining and aggregating data and then returning uniformed data to the requester.
How is performance Defined in Sql DW?
as DWU Data Warehouse units
What are the distribution types for SQL DW?
Hash and Round Robin Distribution
What is the Hash Distribution?
The distribution that distributes based upon a Hash Key
What is Round Robin Distribution ?
Distribution that happens by default where records are distributed evenly across nodes.
what should you have to use Hash Distribution?
You want to use Hash distribution only if you have a column that repeats often and has more than 60 unique items.
what is the amount of distributions in Sql DW?
60
What are the Table Types in Sql DW?
Column store, Heap, Clusters B Tree
What is Spark?
Spark is a lightning-fast unified analytics engine for big data and machine learning and an open-source cluster computing framework for real-time processing.
what is lazy evaluation ?
Its where nothing happens in spark until an action is called.
What is a RDD in Spark?
A Resilient Distributed Database, its collection of data stored in memory distributed across many nodes, that allows you to do transformation and actions against.
what is a Shuffle operations?
Its when part has to move data around the cluster to produce the results
what is a stage in spark?
A stage is like an execution plan container for tasks
How do you created and RDD?
Use the spark context (sc) on text file or some data stored
What operations does RDD support?
transformations and actions
What does a transformation do?
It creates a new RDD with the , like map, filter, reducebykey functions. It produces a new RDD with the transformed data you want to work on from the old data
What does an action do?
that when spark actually executes, and results are returned to driver program or written to the system
What is the Driver program?
Its the code written that drives the execution in spark and runs on the master node.
What is Spark core and what does it do?
Its the base engine . It handles Memory management and fault recovery, Scheduling, distribution and monitoring and interacting with storage systems.
Whats included in the Spark Ecosystem?
Spark core, Spark streaming, Mlib, Spark Sql, GraphX
How does Spark Streaming work?
It takes multiple RDDs and processes them in real time. the input data is divided into stream kind of like batches.
What is the Azure Data Copy Tool?
The Azure Data Factory Copy Data tool eases and optimizes the process of ingesting data into a data lake
What is the Cheapest Azure storage if you are not going to query the data?
azure blob storage
What is Azure Blob Storage?
A scalable object store for text and binary data
How can you ingest data into Azure Blob Storage?
use Azure Data Factory, Storage Explorer, the AzCopy tool, PowerShell, or Visual Studio.
If you use the File Upload feature to import file sizes above 2 GB, use PowerShell or Visual Studio. AzCopy supports a maximum file size of 1 TB and automatically splits data files that exceed 200 GB.
What are the Types of Blob Storage?
Block Blobs and Page Blobs
When do you use Page Blobs?
For vhds, or VM disks
How are blobs Secured ?
Azure Storage encrypts all data that’s written to it. You’ll secure the data by using keys or shared access signatures.
What is the blob storage permission Model?
Role-based access control (RBAC). Use this functionality to set permissions and assign roles to users, groups, or applications.
What are the container access categories explain each?
Private - only for the owner
Public Blob - Read access for the blob only not the container
Public Container - Full read on blob and container
What is Shared Access Signatures?
A URI(Uniform Resource Identifier) that grants restricted access to your blob for those you don’t want to give management keys too.
What is a container for Blob Storage?
A Container is like a folder
What is the redundancy Options for a Storage Account?
Local Redundancy (LRS) - Data is backed up in the same local data center. Zone Redundancy (ZRS) - Data is replicated in another data center in the same Zone Geo Redundancy ( GRS) - Data is replicated in another different Zone as the original data center Geo - Read Only Redundancy(RA-GRS) Data will be replicated in another Zone but you can only read from the replication.
What are Queues?
They are storage for messages that can be accessed from anywhere in the world.
What are File system?
File storage in Azure, which mean you can mount to the directory from your computer.
What is Tables Storage?
Azure Table storage service stores large amounts of structured data, their structure changes automatically
What is a resource Group?
Is like a container that allows you to group related assets together logically. It allows you to managed them together and delete all at one time.
Its like a namespace in .Net and a Sequence container in ssis.
What is APIs can be used with Cosmos DB?
Sql , Cassandra, MongoDB, Gremlin, Azure Table Storage
what makes up the Structure of Cosmos DB?
Database Account, Databases, containers, Items
What Types of Partitions exist in Cosmos DB?
Logical and Physical
Which partition can you design in Cosmos DB?
Logical, physical is determined by Cosmos DB
What will container be in Cosmos DB?
Like a collection, a table or a graph depending on API used
What are Items in a Cosmos DB container?
Items are documents, rows, nodes or edges
Outside of Items what else can be in a Cosmos DB Container?
Stored Procs, user-defined functions, triggers, conflicts, merge procedures
How is the cost of Cosmos DB calculated?
Request Units(RU)
Steps to Create Add Data To CosmosDB
- Create CosmosDB Account
- Create Database
- Create Container
- Add Items
When to use CosmosDB?
When you need a database for an application that handles massive amounts of data and writes at a global scale with real near real time response times for a variety of data.