Data Warehousing Flashcards

1
Q

General four step data warehousing architecture

A

Data ingestion and processing

Analytical Data Store

Analytical Data Model

Data Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Ingestion Pipelines - Linked Services

A

Pipelines use linked services to load and process data

  • Azure Blob Store linked service to ingest the input database
  • Azure SQL
    Database to run stored procedures
  • Azure Databricks/Azure HDInsight for data processing
  • Azure Functions for custom processing logic
  • Azure Synapse Analytics as a data store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytical Data Stores - The Data Warehouse

A

A data warehouse is a relational database in which the data is stored in a schema (e.g., star or snowflake) that is optimized for data analytics rather than transactional workloads.

A data warehouse is a great choice when you have transactional data that can be organized into a structured schema of tables, and you want to use SQL to query them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Analytical Data Stores - The Data Lake

A

A data lake is a file store, usually on a distributed file system for high performance data access.

Technologies like Spark or Hadoop are often used to process queries on the stored files and return data for reporting and analytics.

These systems often apply a schema-on-read approach to define tabular schemas on semi-structured data files at the point where the data is read for analysis, without applying constraints when it’s stored.

Data lakes are great for supporting a mix of structured, semi-structured, and even unstructured data that you want to analyze without the need for schema enforcement when the data is written to the store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Analytical Data Stores - “Data lakehouse” or “lake database”

A

The raw data is stored as files in a data lake, and a relational storage layer abstracts the underlying files and expose them as tables, which can be queried using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Azure Services for Analytical Stores - Azure Synapse Analytics

A

Azure Synapse Analytics is a unified, end-to-end solution for large scale data analytics.

It brings together multiple technologies and capabilities, enabling you to combine the data integrity and reliability of a scalable, high-performance SQL Server based relational data warehouse with the flexibility of a data lake and open-source Apache Spark.

It also includes native support for log and telemetry analytics with Azure Synapse Data Explorer pools, as well as built in data pipelines for data ingestion and transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Azure Services for Analytical Stores - Azure Databricks

A

Azure Databricks is an Azure implementation of the popular Databricks platform.

Databricks is a comprehensive data analytics solution built on Apache Spark, and offers native SQL capabilities as well as workload-optimized Spark clusters for data analytics and data science.

Databricks provides an interactive user interface through which the system can be managed and data can be explored in interactive notebooks.

Due to its common use on multiple cloud platforms, you might want to consider using Azure Databricks as your analytical store if you want to use existing expertise with the platform or if you need to operate in a multi-cloud environment or support a cloud-portable solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Analytical Data Stores on Azure - Azure HDInsight

A

Azure HDInsight is an Azure service that supports multiple open-source data analytics cluster types. Although not as user-friendly as Azure Synapse Analytics and Azure Databricks, it can be a suitable option if your analytics solution relies on multiple open-source frameworks or if you need to migrate an existing on-premises Hadoop-based solution to the cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Azure Services for Data Ingestion Pipelines

A

You can create and run pipelines using Azure Data Factory, or you can use the same pipeline engine in Azure Synapse Analytics if you want to manage all of the components of your data warehousing solution in a unified workspace.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Ingestion Pipeline flow

A

An input dataset provides the source data, and activities can be defined as a data flow that incrementally manipulates the data until an output dataset is produced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly