Lakehouse Flashcards

Question 1

Q

What is a lakehouse?

Answer

A

A lakehouse presents as a database and is built on top of a data lake using Delta format tables.

Question 2

Q

What capabilities do lakehouses combine?

Answer

A

The SQL-based analytical capabilities of a relational data warehouse and the flexibility and scalability of a data lake.

Question 3

Q

What types of data formats can lakehouses store?

Answer

A

All data formats.

Question 4

Q

What is the advantage of lakehouses being cloud-based?

Answer

A

They can scale automatically and provide high availability and disaster recovery.

Question 5

Q

What processing engines do lakehouses use?

Answer

A

Spark and SQL engines.

Question 6

Q

What is the schema-on-read format?

Answer

A

Data is organized in a schema-on-read format, meaning the schema is defined as needed rather than having a predefined schema.

Question 7

Q

What does ACID stand for in the context of lakehouses?

Answer

A

Atomicity, Consistency, Isolation, Durability.

Question 8

Q

What are the roles of different users in a lakehouse?

Answer

A

Data engineers, data scientists, and data analysts access and use data.

Question 9

Q

What is the ETL process?

Answer

A

Extract, Transform, Load.

Question 10

Q

What types of data sources can be ingested into a lakehouse?

Answer

A

Local files, databases, or APIs.

Question 11

Q

What are Fabric shortcuts?

Answer

A

Links to data in external sources, such as Azure Data Lake Store Gen2 or OneLake.

Question 12

Q

What tools can be used to transform ingested data?

Answer

A

Apache Spark with notebooks or Dataflows Gen2.

Question 13

Q

What is the purpose of Data Factory pipelines?

Answer

A

To orchestrate different ETL activities and land prepared data into the lakehouse.

Question 14

Q

What familiar tool do Dataflows Gen2 utilize?

Answer

A

Power Query.

Question 15

Q

What can you analyze using a lakehouse?

Answer

A

Using SQL.

Question 16

Q

What can be developed in Power BI using a lakehouse?

Question 17

Q

How is lakehouse access managed?

Answer

A

Through workspace roles or item-level sharing.

Question 18

Q

What are sensitivity labels used for in lakehouses?

Answer

A

Data governance features.

Question 19

Q

True or False: Item-level sharing is best for granting access for read-only needs.

Question 20

Q

Fill in the blank: Lakehouses support _______ transactions through Delta Lake formatted tables.

Question 21

Q

What is a key benefit of using a lakehouse for analytics?

Answer

A

Scalable analytics solution that maintains data consistency.

Question 22

Q

What three items are automatically created in your workspace when you create a new lakehouse?

Answer

A

Shortcuts, folders, files, and tables.

The lakehouse serves as a central hub for data management.

Question 23

Q

What does the Semantic model (default) provide for Power BI report developers?

Answer

A

An easy data source.

The Semantic model simplifies data representation for reporting.

Question 24

Q

What is the purpose of the SQL analytics endpoint in a lakehouse?

Answer

A

Allows read-only access to query data with SQL.

This endpoint enables SQL-based interaction with the lakehouse data.

Question 25

Q

In what two modes can you work with data in the lakehouse?

Answer

A

Lakehouse mode and SQL analytics endpoint mode.

Each mode offers different capabilities for managing and querying data.

Question 26

Q

What is the first step in the ETL process for a lakehouse?

Answer

A

Ingesting data into your lakehouse.

This step is crucial for preparing data for analysis.

Question 27

Q

List the methods to ingest data into a lakehouse.

Answer

A

Upload local files
Dataflows Gen2
Notebooks
Data Factory pipelines

Each method has its own use case and benefits.

Question 28

Q

What should you consider when ingesting data to determine your loading pattern?

Answer

A

Whether to load all raw data as files or use staging tables.

This decision impacts performance and data processing efficiency.

Question 29

Q

What can Spark job definitions be used for in a lakehouse?

Answer

A

To submit batch/streaming jobs to Spark clusters.

This allows for processing large volumes of data efficiently.

Question 30

Q

What is the purpose of shortcuts in a lakehouse?

Answer

A

To integrate data while keeping it stored in external storage.

Shortcuts enhance data accessibility across different storage solutions.

Question 31

Q

How are source data permissions and credentials managed when using shortcuts?

Answer

A

They are managed by OneLake.

This central management simplifies access control across data sources.

Question 32

Q

What is required for a user to access data through a shortcut to another OneLake location?

Answer

A

The user must have permissions in the target location to read the data.

This ensures secure and authorized access to the data.

Question 33

Q

Where can shortcuts be created?

Answer

A

In both lakehouses and KQL databases.

This versatility allows for broader data integration options.

Question 34

Q

True or False: Shortcuts appear as a folder in the lake.

Answer

A

True.

This structure allows for organized data management within the lakehouse.

Question 35

Q

What is the main role of data transformations in the data loading process?

Answer

A

Most data requires transformations before loading into tables.

Question 36

Q

What tools can be used to transform and load data?

Answer

A

The same tools used to ingest data can also transform and load data.

Question 37

Q

What is a Delta table?

Answer

A

Transformed data can be loaded as a file or a Delta table.

Question 38

Q

Who favors notebooks for data engineering tasks?

Answer

A

Data engineers familiar with different programming languages including PySpark, SQL, and Scala.

Question 39

Q

What interface do Dataflows Gen2 use?

Answer

A

The PowerQuery interface.

Question 40

Q

What do pipelines provide in the ETL process?

Answer

A

A visual interface to perform and orchestrate ETL processes.

Question 41

Q

How complex can pipelines be?

Answer

A

Pipelines can be as simple or as complex as needed.

Question 42

Q

What is required for data to be used after ingestion?

Answer

A

Data must be transformed and loaded.

Question 43

Q

What do Fabric items provide for organizations?

Answer

A

The flexibility needed for every organization.

Question 44

Q

What tools can data scientists use for exploring and training machine learning models?

Answer

A

Notebooks or Data wrangler.

Question 45

Q

What can report developers create using the semantic model?

Answer

A

Power BI reports.

Question 46

Q

What can analysts use the SQL analytics endpoint for?

Answer

A

To query, filter, aggregate, and explore data in lakehouse tables.

Question 47

Q

What is the benefit of combining Power BI with a data lakehouse?

Answer

A

You can implement an end-to-end analytics solution on a single platform.

Question 48

Q

Fill in the blank: After data is ingested, transformed, and loaded, it’s ready for _______.

Answer

A

others to use.

Question 49

Q

True or False: Dataflows Gen2 are excellent for developers familiar with SQL only.