Own questions Flashcards
What does ETL stand for?
Extract, Transform and Load.
What are the aggregate functions of the Cosmos DB SQL API?
COUNT, SUM, AVG, MAX, MIN
Unlike the ANSI SQL query language it does not support GROUP BY!
What are the three supported blob types in an Azure blob storage?
Block blobs: Up to 50’000 Blocks with size <100MB, the smallest unit for a r/w is one whole Block
Page blobs: A Collection of fixed 512-byte pages, total size up to 4.7TB, optimized for random r/w opperations, used for virtual disk storage for Azure virtual machines
Append blobs: Blob size <4MB, total size up to 195GB, only supports appending of blobs, no UPDATE or DELETE
Where are single blobs stored inside an Azure blob storage?
Blobs are stored in containers.
A container provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders inside a container, similar to files in a file system on disk.
What is the difference between DDL and DML?
DDL (Data Description Language): DDL statements are used to create or change databases, schemes, constraints etc.
E.g. CREATE, ALTER, DROP, RENAME
DML (Data Manipulation Language): DML statements are used to insert update or delete records.
E.g. SELECT, INSERT, UPDATE, DELETE
What is IaaS?
IaaS (Infrastructure as a Service): Azure enables you to create a virtual infrastructure in the cloud that mirrors the way an on-premises data center might work.
You take responsibility for installing and configuring the software, such as the DBMS, on these virtual machines.
Very similar to how you would run a system inside an organization but you don’t have to concern yourself with the hardware.
E.g. SQL Server running on a Azure virtual machine.
What is PaaS?
PaaS (Platform as a Service): You specify the resources that you require and Azure creates the necessary virtual machines, networks, and other devices for you.
IaaS + database management and operating systems.
E.g. Azure SQL Databases
What is SaaS?
SaaS (Software as a Service): SaaS services are typically specific software packages that are installed and run on virtual hardware in the cloud.
E.g. Office 365
What are the tasks performed by an analytical data processing system?
Data ingestion > Data transformation / Data processing > Data querying > Data visualization
What is the difference between a relational and a non-relational database?
Relational database: Relational database store the data in rows and columns similar to how a spreadsheet would work. The data is easily structured into categories and the relationships between the tables are well defined. Most of the time the data has to go through a normalization process so that it can fit inside the structure of the database.
E.g. MySQL, postgreSQL, MariaDB etc.
Non-Relational Databases: Non-relational databases enable you to store the data in a format closer to the original structure. Queries can be executed faster as you don’t have to join as many tables to get all the necessary data. Data with a variable amount of data-points can be easily saved inside a non-relational database as the number of data-points inside on row may vary.
E.g MongoDB, Cassandra, Azure Cosmos DB
What are the ACID properties and when are they used?
The ACID properties are there to ensure that a transactional database remains consistent while processing transactions.
Atomicity: Atomicity guarantees that each transaction is treated as a single unit. This means that it is not possible that the first half of the unit succeeds and the second half does not. This has to be true even in case of an error, crashes or power outages during the transaction.
Consistency: Consistency ensures that the data used in the transaction is valid before and after the transaction.
E.g It is not possible to deduct the funds of a bank account to below zero.
Isolation: Isolation guarantees that concurrent transactions have the same result as if when they would have been done sequentially.
Durability: Durability guarantees that once a transaction has been finished, it doesn’t get reverted by a crash or a power outage.
What are the three key roles in the world of data?
Database Administrators, Data Engineers and Data Analysts
What is an index?
An Index contains a sorted list of references from one column of a table. When using a query for that specific column the database management system can use the index to fetch the data more quickly.
You can create as many indices of a table as you want, however every index consumes more storage space and each time you either INSERT, UPDATE or DELETE the data all indices have to be updated as well.
What is a view?
A view is a virtual table based on the result set of a query. In the simplest case, you can think of a view as a window on specified rows in an underlying table.
What is semi-structured data?
Semi-structured data is data that contains fields. The fields don’t have to be the same in every entity. You only define the fields that you need on a per-entity basis. The Customer entities shown in the previous unit are examples of semi-structured data. The data must be formatted in such a way that an application can parse and process it. One common way of doing this is to store the data for each entity as a JSON document.