Azure Data Platforms Flashcards

Learn

1
Q

What is TDE in Azure Security ?

A

Transparent Data Encryption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Azure Sql Protect Data in Flight

A

Always Encrypted feature that keeps data encrypted

even while in storage at rest, during query processing and on network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Azure SQL dynamic data masking

A

It helps in exposing data to non-privileged users by hiding sensitive data in query
results. This also works transparently without changing the application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is Azure SQL users and groups managed?

A

With Active Directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To what level does Azure Sql Security got down to?

A

To the Row level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Row level Security?

A

row level security is wherein users can access only those rows that they are authorized to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Azure SQL Use White listing?

A

Yes, you can white list IP addresses to give access to those Ip addresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are Elastic Pools?

A

Azure provides elastic pools that help in hosting multiple databases
on the same SQL Server and puts them on the same pool. Each pool has dedicated
DTUs assigned to them known as eDTU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How Often does Azure SQL perform backups?

A

It performs
a full weekly backup, hourly differential backups and backup of transaction logs every
five minutes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are backups stored ?

A

The backups are then stored in a geo-redundant storage which keeps
multiple copies across geographies. In case of a disaster, Azure support can help get
these backups and restore them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What additional Disaster recovery options are there?

A

Azure SQL also provides geo-replication of databases with up to four readable
secondary databases in different regions. All these primary and secondary databases
are continuously synchronised using asynchronous replication. It also provides
capability for auto-failover to another database in another region.
Azure SQL provides point-in-time recovery and restores databases as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is performance Calculated in Azure SQL?

A

In DTUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are DTUs?

A

DTUs are Database Transactional Units. They are a

combination of compute, memory and storage resources are calculated together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What language does Azure Sql Use?

A

T-Sql

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the Two components of Azure Sql?

A

The logical SQL server

The Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the levels of performance provided by

A

Basic, Standard, Premium and PremiumRS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a SLA?

A

Service Level Agreement?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What do you need to tie SLAs to for Disaster Recovery?

A

The companies RPO and RTOs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is RPO?

A

Recovery Point Objective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is RTO?

A

Recovery Time Objective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain RTO and RPO

A

The RTO, or recovery time objective, is the maximum length of time after an outage that your company is willing to wait for the recovery process to finish.

The RPO, or recovery point objective, is the maximum amount of data loss your company is willing to accept as measured in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Are high availability For Azure Sql the Same for DW?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are Elastic Pools Dedicated DTUs called

A

eDTUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Azure SQL?

A

Its a fully managed instance of SQL Server in Azure Cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Is Sharding a Feature of Azure SQL ?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Azure Stream Analytics?

A

Stream analytics is a fully managed data and event processing engine that provides Extract-Transform-Load (ETL) functionality for real-time
processes and analytics on the data getting streamed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the inputs for Stream Analytics?

A

Azure Blob storage, IOT hub, and Event Hub

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What types of transformation is done In Stream Analytics?

A

filtering, augmentation or enrichment. It has a SQL type query language that helps in grouping, aggregation, joining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the core component of stream analytics?

A

The Stream analytics engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How does the Stream analytics engine do?

A

It is here that the stream analytics
jobs are executed and every component interacts together to ETL the data. This
engine keeps running and continuously executes jobs over time-series windows
to ensure that real-time insights can be gathered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What can you do with the Stream Analytics Stream

A

Send data
to durable storage to Azure blob storage or present data using visualisation tools,
such as Power BI or use it to send notifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Name some Stream analytics Destinations

A
Azure Blob Storage, Azure Data lake, Azure Power BI
Event Hub(for another stream), Sql DB and DW
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Can Stream Analytics Partition?

A

Yes on all Outputs except Power BI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Data Egress and Why does it matter?

A

It is the data going out to external sources. It is charged in Azure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How is Data Egress handled for Stream Analytics?

A

If the data is not going out of region the cost is not applicable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What the Stream Job topology?

A

The Inputs, the Query and the Outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Can Stream combine multiple inputs?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Can a Stream output to multiple outputs

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Names some Tools That Can Access Azure Data Analytics

A

HDInsight Hadoop, Azure Machine learning, steam analytics, data factory, event hubs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Where is Azure Data Lake Available?

A

East and Central US and North Europe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What type of Storage is Azure Data lake?

A

Azure Data Lake is a hierarchical file-based storage just as a Windows,
Linux or Hadoop file-system that has been optimized for efficient querying and
storage for big data workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Does Azure Data Lake have High Availability ?

A

Yes it is replicated 3 times in different Data centers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Does Azure Data Lake have Security?

A

Azure Data Lake has Role Based Access Control, Access control Lists, and Whitelisting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Can the data in Azure Data Lake be Encrypted?

A

It can be encrypted at rest, and then decryption Transparently after retrieval.

45
Q

What is Azure Key vault?

A

A fault where encryption keys are held for services that store encrypted data.

46
Q

What is Azure Data Lake Analytics?

A

Its a service that allows you to do analytic jobs against the data in the lake.

47
Q

What language is used for the Analytical jobs in ADLS?

A

U - SQL

48
Q

What is U - SQL?

A

Is a language that combines Sql and C # to form a querying language that is extensible

49
Q

What data types are used in U-SQL

A

C# data types

50
Q

What can U-SQL be used to query?

A

Azure SQL, IaaS SQL Server and Azure SQL data warehouse

51
Q

How is Azure Analytics Charged?

A

Pay as you go, charged only when a job is running

52
Q

Does Azure Analytics scale ?

A

Yes , it scales as needed while a job is running

53
Q

What processing in used in Sql DW?

A

MPP

54
Q

What is MPP?

A

Massively parallel Processing

55
Q

What is SMP?

A

Symmetric Multi-Processing

56
Q

The architecture of SQL DW is comprised of what?

A

The control Node and the compute nodes

57
Q

What is a control node?

A

the control node is main braid responsible for for interactions with users and applications. It coordinates with compute nodes

58
Q

what is a comput node?

A

the Compute node are the workers they store and retrieve data

59
Q

What is the technology used by Sql DW for Querying?

A

Polybase

60
Q

What is polybase?

A

Polybase is a technology that is used to query non relational data stores such as blob storage and Data lakes, it can read flat files.

61
Q

What is the Underlining Storage for Sql DW

A

The underlying storage for Sql DW is blob storage

62
Q

What is DMS and what does it do in Sql DW?

A

Data Movement Service, it works across all nodes , combining and aggregating data and then returning uniformed data to the requester.

63
Q

How is performance Defined in Sql DW?

A

as DWU Data Warehouse units

64
Q

What are the distribution types for SQL DW?

A

Hash and Round Robin Distribution

65
Q

What is the Hash Distribution?

A

The distribution that distributes based upon a Hash Key

66
Q

What is Round Robin Distribution ?

A

Distribution that happens by default where records are distributed evenly across nodes.

67
Q

what should you have to use Hash Distribution?

A

You want to use Hash distribution only if you have a column that repeats often and has more than 60 unique items.

68
Q

what is the amount of distributions in Sql DW?

A

60

69
Q

What are the Table Types in Sql DW?

A

Column store, Heap, Clusters B Tree

70
Q

What is Spark?

A

Spark is a lightning-fast unified analytics engine for big data and machine learning and an open-source cluster computing framework for real-time processing.

71
Q

what is lazy evaluation ?

A

Its where nothing happens in spark until an action is called.

72
Q

What is a RDD in Spark?

A

A Resilient Distributed Database, its collection of data stored in memory distributed across many nodes, that allows you to do transformation and actions against.

73
Q

what is a Shuffle operations?

A

Its when part has to move data around the cluster to produce the results

74
Q

what is a stage in spark?

A

A stage is like an execution plan container for tasks

75
Q

How do you created and RDD?

A

Use the spark context (sc) on text file or some data stored

76
Q

What operations does RDD support?

A

transformations and actions

77
Q

What does a transformation do?

A

It creates a new RDD with the , like map, filter, reducebykey functions. It produces a new RDD with the transformed data you want to work on from the old data

78
Q

What does an action do?

A

that when spark actually executes, and results are returned to driver program or written to the system

79
Q

What is the Driver program?

A

Its the code written that drives the execution in spark and runs on the master node.

80
Q

What is Spark core and what does it do?

A

Its the base engine . It handles Memory management and fault recovery, Scheduling, distribution and monitoring and interacting with storage systems.

81
Q

Whats included in the Spark Ecosystem?

A

Spark core, Spark streaming, Mlib, Spark Sql, GraphX

82
Q

How does Spark Streaming work?

A

It takes multiple RDDs and processes them in real time. the input data is divided into stream kind of like batches.

83
Q

What is the Azure Data Copy Tool?

A

The Azure Data Factory Copy Data tool eases and optimizes the process of ingesting data into a data lake

84
Q

What is the Cheapest Azure storage if you are not going to query the data?

A

azure blob storage

85
Q

What is Azure Blob Storage?

A

A scalable object store for text and binary data

86
Q

How can you ingest data into Azure Blob Storage?

A

use Azure Data Factory, Storage Explorer, the AzCopy tool, PowerShell, or Visual Studio.

If you use the File Upload feature to import file sizes above 2 GB, use PowerShell or Visual Studio. AzCopy supports a maximum file size of 1 TB and automatically splits data files that exceed 200 GB.

87
Q

What are the Types of Blob Storage?

A

Block Blobs and Page Blobs

88
Q

When do you use Page Blobs?

A

For vhds, or VM disks

89
Q

How are blobs Secured ?

A

Azure Storage encrypts all data that’s written to it. You’ll secure the data by using keys or shared access signatures.

90
Q

What is the blob storage permission Model?

A

Role-based access control (RBAC). Use this functionality to set permissions and assign roles to users, groups, or applications.

91
Q

What are the container access categories explain each?

A

Private - only for the owner
Public Blob - Read access for the blob only not the container
Public Container - Full read on blob and container

92
Q

What is Shared Access Signatures?

A

A URI(Uniform Resource Identifier) that grants restricted access to your blob for those you don’t want to give management keys too.

93
Q

What is a container for Blob Storage?

A

A Container is like a folder

94
Q

What is the redundancy Options for a Storage Account?

A
Local Redundancy (LRS) - Data is backed up in the same local data center.
Zone Redundancy (ZRS) - Data is replicated in another data center in the same Zone
Geo Redundancy ( GRS) - Data is replicated in another different Zone as the original data center
Geo - Read Only Redundancy(RA-GRS) Data will be replicated in another Zone but you can only read from the replication.
95
Q

What are Queues?

A

They are storage for messages that can be accessed from anywhere in the world.

96
Q

What are File system?

A

File storage in Azure, which mean you can mount to the directory from your computer.

97
Q

What is Tables Storage?

A

Azure Table storage service stores large amounts of structured data, their structure changes automatically

98
Q

What is a resource Group?

A

Is like a container that allows you to group related assets together logically. It allows you to managed them together and delete all at one time.

Its like a namespace in .Net and a Sequence container in ssis.

99
Q

What is APIs can be used with Cosmos DB?

A

Sql , Cassandra, MongoDB, Gremlin, Azure Table Storage

100
Q

what makes up the Structure of Cosmos DB?

A

Database Account, Databases, containers, Items

101
Q

What Types of Partitions exist in Cosmos DB?

A

Logical and Physical

102
Q

Which partition can you design in Cosmos DB?

A

Logical, physical is determined by Cosmos DB

103
Q

What will container be in Cosmos DB?

A

Like a collection, a table or a graph depending on API used

104
Q

What are Items in a Cosmos DB container?

A

Items are documents, rows, nodes or edges

105
Q

Outside of Items what else can be in a Cosmos DB Container?

A

Stored Procs, user-defined functions, triggers, conflicts, merge procedures

106
Q

How is the cost of Cosmos DB calculated?

A

Request Units(RU)

107
Q

Steps to Create Add Data To CosmosDB

A
  1. Create CosmosDB Account
  2. Create Database
  3. Create Container
  4. Add Items
108
Q

When to use CosmosDB?

A

When you need a database for an application that handles massive amounts of data and writes at a global scale with real near real time response times for a variety of data.