Core data concepts Flashcards

1
Q

What 3 factors influence the file format used for certain data?

A
  1. The type of data (structured, semi-structured, unstructured)
  2. The applications and services that need to read, write and process the data
  3. Whether files need to be readable by humans or optimized for some other factor, like storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the general form of a csv file?
What are 3 other examples of that?

A

Delimited text file
Tab delimited (TSV), space delimited and fixed-width data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are blob files?

A

Binary large object files. Binary files that are meant for application and don’t have some human-readable encoding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AVRO? When is it a good file format?

A

Row-based file format. Header is stored in json, the data in binary. Good for compressing data and minimizing storage and network bandwidth requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ORC?

A

A columnar file format that organizes its columns into stripes. A stripe contains an index for the rows in the stripe, the data for each row and statistical information (count, max, etc) for each column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is parquet? When is it a good file format?

A

A columnar file format consisting of row groups. Every column is stored in one row group with chunks of data, with metadata describing these chunks.
Parquet specializes in storing and processing nested data, and supports compression and encoding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 4 common types of nonrelational databases?

A

Key-value, (json) document, column family (tabular with column groups), graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are CRUD operations?

A

The transactional operations create, retrieve, updata and delete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are OLTP and OLAP?

A

Online transactional processing (often rows) and online analytical processing (often columns)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is ACID?

A

An acronym related to processing transactional data: atomicity, consistency, isolation and durability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are a data lake, data warehouse and data lakehouse?

A

Data lake: place to store high volumes of raw data
Data warehouse: data storage optimized for analytical reading
Data lakehouse combines the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What 3 versions of Azure SQL are there?

A

Azure SQL database, PaaS
Azure SQL Managed instance, automated maintenance and more responsibilities for the owner
Azure SQL VM, IaaS, een VM met SQL Server erop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name 3 open source databases offered on Azure

A

MySQL, MariaDB, PostgreSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an index?

A

A way to speed up searching through a table. It’s extra data stored as a tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is automated and what is manual when choosing for Azure SQL Managed Instance?

A

Automated: backup, patching, database monitoring, other general tasks
Manual: security, resource allocation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between Azure SQL Database single database vs elastic pool?

A

With an elastic pool, by default, multiple database can share the same pool of resources, while a single database is more isolated.

17
Q

What is the data migration assistant?

A

A tool that can determine compatibility when migrating Azure SQL Database, migrating Azure SQL Managed instance or upgrading SQL server

18
Q

What is one notable feature supported by MariaDB?

A

temporal data

19
Q

What are 3 notable features supported by PostgreSQL?

A

Custom data types with non-relation properties, code modules that can be run by queries and the abilitiy to store and manipulate geometric data

20
Q

In Azure blob storage, what can the folder structure do?

A

Very little, it’s just virtual. No support for access control or bulk operations

21
Q

What 3 types of blobs does Azure blob storage support?

A

Block blobs for discrete, large binary objects that change infrequently
Page blobs, optimized for read and write, used for virtual disk storage for VMs
Append blobs, a block blob optimzed for appends that doesn’t support update or delete

22
Q

How do you create a data lake?
What is one limitation of this process?

A

Enable “Hierarchical namespace” on an Azure storage account. You can upgrade anytime, but you can’t revert from a datalake to a regular storage account

23
Q

What are the 4 key benefits of Microsoft OneLake in Fabric?

A
  1. Organization-wide data lake
  2. Distributed ownership and collaboration
  3. Open and compatible, built on Delta Parquet
  4. Easy to navigate via OneLake file explorer
24
Q

What is cognitive analytics?

A

Using certain AI services to transform data

25
Q

What are the 2 performance tiers for Azure files?

A

Standard: hard-disk based in a datacenter
Premium: solid-state disks

26
Q

What 2 common network protocols are supported by Azure files?

A

Server Message Block (SMB): commonly used across multiple operating systems
Network File System (NFS): used by some Linux and MacOS versions, only available on premium

27
Q

What is Azure tables?

A

A NoSQL storage solution using key/value data items. All rows have a unique key, timestamp for last update and variable columns

28
Q

What is used for fast access for Azure tables? In what 4 ways does that improve scalability and performance?

A

Partitions. Partitions are independent, can grow and shrink and a table can have an number of partitions.
You can include your partition key in your searches.

29
Q

What are the 2 elements of an Azure Table storage key?

A

Partition key and row key

30
Q

What 6 database engines does Azure Cosmos DB support?

A

Azure Cosmos DB for:
1. NoSQL (native), 2. MongoDB, 3 PostGreSQL, 4 Table (Azure Table Storage), 5 Apache Cassandra, 6 Apache Gremlin

31
Q

What is Microsoft Purview?

A

A unified solution for data governance, protection and management.