Azure Data Fundamentals Flashcards

1
Q

Storage Account

A

Azure Storage is a core Azure service that enables you to store data in:
Blob containers - scalable, cost-effective storage for binary files.
File shares - network file shares such as you typically find in corporate networks.
Tables - key-value storage for applications that need to read and write data values quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Blob Storage

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tables

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Files

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Entity

A

Data structures in which this data is organized often represents Each entity typically has one or more attributes, or characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Structured Data

A

Fixed Schema, tabular, often stored in db’s,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Semi Structured Data

A

data is information that has some structure, but which allows for some variation between entity instances; One common format for semi-structured data is JavaScript Object Notation (JSON)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unstructured

A

Not all data is structured or even semi-structured. For example, documents, images, audio and video data, and binary files might not have a specific structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Stores

A

there are 2 - files stores and databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BLOB

A

Binary Large Objects such as video, audio, images, and application specific documents, stored as raw binary files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Optimized File Formats

A

Avro, ORC, Parquet
While human-readable formats for structured and semi-structured data can be useful, they’re typically not optimized for storage space or processing. Over time, some specialized file formats that enable compression, indexing, and efficient storage and processing have been developed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Avro

A

Avro is a row-based format. It was created by Apache. Each record contains a header that describes the structure of the data in the record. This header is stored as JSON. The data is stored as binary information. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ORC (Optimized Row Columnar)

A

organizes data into columns rather than rows. It was developed by HortonWorks for optimizing read and write operations in Apache Hive (Hive is a data warehouse system that supports fast data summarization and querying over large datasets). An ORC file contains stripes of data. Each stripe holds the data for a column or set of columns. A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds statistical information (count, sum, max, min, and so on) for each column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parquet

A

another columnar data format. It was created by Cloudera and Twitter. A Parquet file contains row groups. Data for each column is stored together in the same row group. Each row group contains one or more chunks of data. A Parquet file includes metadata that describes the set of rows found in each chunk. An application can use this metadata to quickly locate the correct chunk for a given set of rows, and retrieve the data in the specified columns for these rows. Parquet specializes in storing and processing nested data types efficiently. It supports very efficient compression and encoding schemes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Documents

A

Collective form in which data exist; common types:
Datasets, Databases, Datastores, Data Warehouses, Notebooks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dataset

A

logical grouping of data

17
Q

Database

A

structured or semi-structured data that can be quickly access and searches;

18
Q

Datastores

A

unstructured or semi-strucutred data (example Azure Data Lake)

19
Q

Non-Relational Databases

A

often referred to as NoSQL databases. There are 4 common types: Key-value; document (similar to key-value where the value is the JSON document); column family; graph

20
Q

Data Lake Store (Gen 2)

A

hierarchical data storage for analytical data lakes, work with structure, semi, and unstructured; To create an Azure Data Lake Store Gen2 files system, you must enable the Hierarchical Namespace option of an Azure Storage account.

21
Q

Azure File Storage

A

Many on-premises systems comprising a network of in-house computers make use of file shares. A file share enables you to store a file on one computer, and grant access to that file to users and applications running on other computers. This strategy can work well for computers in the same local area network, but doesn’t scale well as the number of users increases, or if users are located at different sites.
Azure Files enables you to share up to 100 TB of data in a single storage account.
Offers Standard and Premium tiers

22
Q

Azure Table Storage

A

NoSQL storage solution, key-value, The key in an Azure Table Storage table comprises two elements; the partition key that identifies the partition containing the row, and a row key that is unique to each row in the same partition.
Azure Table Storage allows you to store key-value data as the cheapest per GB rate. Cosmos DB also have a key-value option, but is not as cost effective.

23
Q

Storage Account

A

needs to be set up first before other storage services are added

24
Q

Azure Cosmos DB

A

Azure Cosmos DB is a highly scalable cloud database service for NoSQL data; example of NoSQL database; Cosmos DB uses indexes and partitioning to provide fast read and write performance and can scale to massive volumes of data. You can enable multi-region writes, adding the Azure regions of your choice to your Cosmos DB account so that globally distributed users can each work with data in their local replica. Includes API’s for Mongo DB, Table API (key-value tables), Cassandra API (column family storage), Gremlin API (graph structures)

25
Q

Synapse Analytics

A

unified, end-to-end solution for large scale data analytics, combines benefts of SQL Server, data lake and open source Apache Spark. All services within this can be managed through a single Azure Synapse Studio.

26
Q

Data Explorer

A

standalone service for efficiently analyzing data

27
Q

Kusto Query Language

A

to query data explorer tables

28
Q

Manual Sharding

A

With a relational data store, when data transaction volumes get too high and database performance suffers, which of the following is a common yet difficult method of distributing those data transactions over multiple servers

29
Q

Paginated Report

A

designed to be printed or shared; formated to fit well on page; display all the data in table; pixel perfect;

30
Q

Dynamic Data Masking

A

Dynamic data masking (DDM) limits sensitive data exposure by masking it to non-privileged users. It can be used to greatly simplify the design and coding of security in your application. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to specify how much sensitive data to reveal with minimal impact on the application layer. DDM can be configured on designated database fields to hide sensitive data in the result sets of queries. With DDM the data in the database is not changed.

31
Q

Transaction Optimized Storage

A

provides better a cost profile for frequently accessed files. It costs more to store the file, but much less to access it. And it supports Standard Tier of storage

32
Q

Azure Data Factory

A

has the ability to move data from one source to another destination with processing along the way. Azure Data Migration is more about simple movement of data from one source to another.

33
Q

Azure Cache for Redis

A

Azure Cache for Redis provides an in-memory data store based on the Redis software. Redis improves the performance and scalability of an application that uses backend data stores heavily. It’s able to process large volumes of application requests by keeping frequently accessed data in the server memory, which can be written to and read from quickly. Redis brings a critical low-latency and high-throughput data storage solution to modern applications.

34
Q

Column-store data

A

data organized in to columns, faster at aggregating values for analytics; NoSQL store of SQL-Like databases; great for vast amount of data; great when you only need a few columns;

35
Q

Balanced Tree

A

a common data structure for storing index

36
Q

Data Consistency

A

when data being kept in two different place and whether the data exactly match;
Strongly consistent - every time you query you get consistent data;
Eventually Consistent - when you request data you may get back inconsistent data within 2 seconds

37
Q

Datamart

A

subset of data warehouse; has single business focus;

38
Q

Data Lake

A

centralized storage repository that holds a vast amount of raw data (big data) in either semi or unstructured format; Hording for data scientist;