Own questions Flashcards

1
Q

What does ETL stand for?

A

Extract, Transform and Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the aggregate functions of the Cosmos DB SQL API?

A

COUNT, SUM, AVG, MAX, MIN

Unlike the ANSI SQL query language it does not support GROUP BY!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three supported blob types in an Azure blob storage?

A

Block blobs: Up to 50’000 Blocks with size <100MB, the smallest unit for a r/w is one whole Block

Page blobs: A Collection of fixed 512-byte pages, total size up to 4.7TB, optimized for random r/w opperations, used for virtual disk storage for Azure virtual machines

Append blobs: Blob size <4MB, total size up to 195GB, only supports appending of blobs, no UPDATE or DELETE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where are single blobs stored inside an Azure blob storage?

A

Blobs are stored in containers.

A container provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders inside a container, similar to files in a file system on disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between DDL and DML?

A

DDL (Data Description Language): DDL statements are used to create or change databases, schemes, constraints etc.
E.g. CREATE, ALTER, DROP, RENAME

DML (Data Manipulation Language): DML statements are used to insert update or delete records.
E.g. SELECT, INSERT, UPDATE, DELETE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is IaaS?

A

IaaS (Infrastructure as a Service): Azure enables you to create a virtual infrastructure in the cloud that mirrors the way an on-premises data center might work.

You take responsibility for installing and configuring the software, such as the DBMS, on these virtual machines.

Very similar to how you would run a system inside an organization but you don’t have to concern yourself with the hardware.

E.g. SQL Server running on a Azure virtual machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is PaaS?

A

PaaS (Platform as a Service): You specify the resources that you require and Azure creates the necessary virtual machines, networks, and other devices for you.

IaaS + database management and operating systems.

E.g. Azure SQL Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is SaaS?

A

SaaS (Software as a Service): SaaS services are typically specific software packages that are installed and run on virtual hardware in the cloud.

E.g. Office 365

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the tasks performed by an analytical data processing system?

A

Data ingestion > Data transformation / Data processing > Data querying > Data visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between a relational and a non-relational database?

A

Relational database: Relational database store the data in rows and columns similar to how a spreadsheet would work. The data is easily structured into categories and the relationships between the tables are well defined. Most of the time the data has to go through a normalization process so that it can fit inside the structure of the database.

E.g. MySQL, postgreSQL, MariaDB etc.

Non-Relational Databases: Non-relational databases enable you to store the data in a format closer to the original structure. Queries can be executed faster as you don’t have to join as many tables to get all the necessary data. Data with a variable amount of data-points can be easily saved inside a non-relational database as the number of data-points inside on row may vary.

E.g MongoDB, Cassandra, Azure Cosmos DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the ACID properties and when are they used?

A

The ACID properties are there to ensure that a transactional database remains consistent while processing transactions.

Atomicity: Atomicity guarantees that each transaction is treated as a single unit. This means that it is not possible that the first half of the unit succeeds and the second half does not. This has to be true even in case of an error, crashes or power outages during the transaction.

Consistency: Consistency ensures that the data used in the transaction is valid before and after the transaction.

E.g It is not possible to deduct the funds of a bank account to below zero.

Isolation: Isolation guarantees that concurrent transactions have the same result as if when they would have been done sequentially.

Durability: Durability guarantees that once a transaction has been finished, it doesn’t get reverted by a crash or a power outage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three key roles in the world of data?

A

Database Administrators, Data Engineers and Data Analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an index?

A

An Index contains a sorted list of references from one column of a table. When using a query for that specific column the database management system can use the index to fetch the data more quickly.

You can create as many indices of a table as you want, however every index consumes more storage space and each time you either INSERT, UPDATE or DELETE the data all indices have to be updated as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a view?

A

A view is a virtual table based on the result set of a query. In the simplest case, you can think of a view as a window on specified rows in an underlying table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is semi-structured data?

A

Semi-structured data is data that contains fields. The fields don’t have to be the same in every entity. You only define the fields that you need on a per-entity basis. The Customer entities shown in the previous unit are examples of semi-structured data. The data must be formatted in such a way that an application can parse and process it. One common way of doing this is to store the data for each entity as a JSON document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is unstructured data?

A

Unstructured data is data that doesn’t naturally contain fields. Each item is an amorphous blob of binary data. You can’t search for specific elements in this data.

E.g. video, audio and other media streams

17
Q

What is the difference between a Azure Single Database deployment and an Azure Elastic Pool

A

In an elastic pool by default multiple databases can share the same resources (memory, data storage space, processing power). Those resources are then referred to as a pool.

An elastic pool can help to reduce costs for databases with variable resource requirements.

E.g. You have two databases one for payroll processing and one for running reports. The first one will be used a lot at the end of the month while the other will be used more during the month.

18
Q

What is the difference between an Azure SQL Database managed instance and an Single Database or Elastic Pool?

A

Both the single database and the elastic pool restrict some of the administrative features available to SQL Server. A managed instance runs an effectively fully controllable SQL Server in the cloud.

19
Q

What is automated in a Azure SQL Database managed instance.

A

The Managed instance service automates backups, software patching, database monitoring, and other general tasks, but you have full control over security and resource allocation for your databases.

20
Q

What entails the Azure Database option for PostgreSQL Hypersacle?

A

Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to support large database loads. Your database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key.

21
Q

What is Provisioning?

A

Provisioning is the act of running series of tasks that a service provider, such as Azure SQL Database, performs to create and configure a service. Behind the scenes, the service provider will set up the various resources (disks, memory, CPUs, networks, and so on) required to run the service. You’ll be assigned these resources, and they remain allocated to you (and charged to you), until you delete the service.

22
Q

What are the different tools you can use for the provisioning of an azure service?

A

The Azure portal: Website / you are prompted for the necessary parameters.

The Azure command-line interface (CLI): CMD for creating services. Can be used to create scripts and automate service creation.

Azure PowerShell: Azure provides some Azure-specific commands that can be used in PowerShell

Azure Resource Manager templates: An Azure Resource Manager template describes the service that you want to deploy as a JSON file.

23
Q

What is the default connectivity for Azure relational data services?

A

The default connectivity for Azure relational data services is to disable access to the world.

24
Q

What are the 4 core functionalities of the SQL Database gateway service (DoSGuard)

A
  • DoSGuard actively tracks failed logins from IP addresses. If there are multiple failed logins from a specific IP address within a period of time, the IP address is blocked from accessing any resources in the service for a short while.
  • It validates all connections to the database servers, to ensure that they are from genuine clients.
  • It encrypts all communications between a client and the database servers.
  • It inspects each network packet sent over a client connection. The gateway validates the connection information in the packet, and forwards it to the appropriate physical server based on the database name that’s specified in the connection string.
25
Q

What does OLTP and OLAP stand for?

A

OLTP: Online transaction processing captures, stores, and processes data from transactions in real time.

OLAP: Online analytical processing uses complex queries to analyze aggregated historical data from OLTP systems.

26
Q

What is the purpose of an Azure Data Factory?

A

The purpose of Azure Data Factory is to retrieve data from one or more data sources, and convert it into a format that you process. The data sources might present data in different ways, and contain noise that you need to filter out.

27
Q

What is an Azure Data Lake Storage?

A

A data lake is a repository for large quantities of raw data. Because the data is raw and unprocessed, it’s very fast to load and update, but the data hasn’t been put into a structure suitable for efficient analysis. You can think of a data lake as a staging point for your ingested data, before it’s massaged and converted into a format suitable for performing analytics.

28
Q

What is the difference between a Data Warehouse and an Data Lake Storage?

A

The data in a Data Warehouse has already been converted so that it can be used for an efficient analysis, while the data in an Azure Lake Storage still is in it’s raw state.

29
Q

What is PolyBase?

A

PolyBase is a feature of SQL Server and Azure Synapse Analytics that enables you to run Transact-SQL queries that read data from external data sources. PolyBase makes these external data sources appear like tables in a SQL database. Using PolyBase, you can read data managed by Hadoop, Spark, and Azure Blob Storage, as well as other database management systems such as Cosmos DB, Oracle, Teradata, and MongoDB.

Azure SQL Database does NOT support PolyBase!

30
Q

What is SSIS (SQL Server Integration Services)?

A

SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. You can use SSIS to solve complex business problems by copying or downloading files, loading data warehouses, cleaning and mining data, and managing SQL database objects and data. SSIS is part of Microsoft SQL Server.

31
Q

What is Azure Databricks?

A

Azure Databricks is an analytics platform optimized for the Microsoft Azure cloud services platform. Databricks is based on Spark, and is integrated with Azure to streamline workflows. It provides an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

32
Q

What are the basic building blocks of Power BI?

A

Visualizations – A visual representation of data, sometimes just called visuals

Datasets – A collection of data that Power BI uses to create visualizations

Reports – A collection of visuals from a dataset, spanning one or more pages

Dashboards – A single-page collection of visuals built from a report

Tiles – A single visualization on a report or dashboard