Lakehouse Flashcards

1
Q

What is a lakehouse?

A

A lakehouse presents as a database and is built on top of a data lake using Delta format tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What capabilities do lakehouses combine?

A

The SQL-based analytical capabilities of a relational data warehouse and the flexibility and scalability of a data lake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of data formats can lakehouses store?

A

All data formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the advantage of lakehouses being cloud-based?

A

They can scale automatically and provide high availability and disaster recovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What processing engines do lakehouses use?

A

Spark and SQL engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the schema-on-read format?

A

Data is organized in a schema-on-read format, meaning the schema is defined as needed rather than having a predefined schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does ACID stand for in the context of lakehouses?

A

Atomicity, Consistency, Isolation, Durability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the roles of different users in a lakehouse?

A

Data engineers, data scientists, and data analysts access and use data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the ETL process?

A

Extract, Transform, Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What types of data sources can be ingested into a lakehouse?

A

Local files, databases, or APIs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Fabric shortcuts?

A

Links to data in external sources, such as Azure Data Lake Store Gen2 or OneLake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What tools can be used to transform ingested data?

A

Apache Spark with notebooks or Dataflows Gen2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of Data Factory pipelines?

A

To orchestrate different ETL activities and land prepared data into the lakehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What familiar tool do Dataflows Gen2 utilize?

A

Power Query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can you analyze using a lakehouse?

A

Using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can be developed in Power BI using a lakehouse?

A

Reports.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is lakehouse access managed?

A

Through workspace roles or item-level sharing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are sensitivity labels used for in lakehouses?

A

Data governance features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False: Item-level sharing is best for granting access for read-only needs.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: Lakehouses support _______ transactions through Delta Lake formatted tables.

21
Q

What is a key benefit of using a lakehouse for analytics?

A

Scalable analytics solution that maintains data consistency.

22
Q

What three items are automatically created in your workspace when you create a new lakehouse?

A

Shortcuts, folders, files, and tables.

The lakehouse serves as a central hub for data management.

23
Q

What does the Semantic model (default) provide for Power BI report developers?

A

An easy data source.

The Semantic model simplifies data representation for reporting.

24
Q

What is the purpose of the SQL analytics endpoint in a lakehouse?

A

Allows read-only access to query data with SQL.

This endpoint enables SQL-based interaction with the lakehouse data.

25
Q

In what two modes can you work with data in the lakehouse?

A

Lakehouse mode and SQL analytics endpoint mode.

Each mode offers different capabilities for managing and querying data.

26
Q

What is the first step in the ETL process for a lakehouse?

A

Ingesting data into your lakehouse.

This step is crucial for preparing data for analysis.

27
Q

List the methods to ingest data into a lakehouse.

A
  • Upload local files
  • Dataflows Gen2
  • Notebooks
  • Data Factory pipelines

Each method has its own use case and benefits.

28
Q

What should you consider when ingesting data to determine your loading pattern?

A

Whether to load all raw data as files or use staging tables.

This decision impacts performance and data processing efficiency.

29
Q

What can Spark job definitions be used for in a lakehouse?

A

To submit batch/streaming jobs to Spark clusters.

This allows for processing large volumes of data efficiently.

30
Q

What is the purpose of shortcuts in a lakehouse?

A

To integrate data while keeping it stored in external storage.

Shortcuts enhance data accessibility across different storage solutions.

31
Q

How are source data permissions and credentials managed when using shortcuts?

A

They are managed by OneLake.

This central management simplifies access control across data sources.

32
Q

What is required for a user to access data through a shortcut to another OneLake location?

A

The user must have permissions in the target location to read the data.

This ensures secure and authorized access to the data.

33
Q

Where can shortcuts be created?

A

In both lakehouses and KQL databases.

This versatility allows for broader data integration options.

34
Q

True or False: Shortcuts appear as a folder in the lake.

A

True.

This structure allows for organized data management within the lakehouse.

35
Q

What is the main role of data transformations in the data loading process?

A

Most data requires transformations before loading into tables.

36
Q

What tools can be used to transform and load data?

A

The same tools used to ingest data can also transform and load data.

37
Q

What is a Delta table?

A

Transformed data can be loaded as a file or a Delta table.

38
Q

Who favors notebooks for data engineering tasks?

A

Data engineers familiar with different programming languages including PySpark, SQL, and Scala.

39
Q

What interface do Dataflows Gen2 use?

A

The PowerQuery interface.

40
Q

What do pipelines provide in the ETL process?

A

A visual interface to perform and orchestrate ETL processes.

41
Q

How complex can pipelines be?

A

Pipelines can be as simple or as complex as needed.

42
Q

What is required for data to be used after ingestion?

A

Data must be transformed and loaded.

43
Q

What do Fabric items provide for organizations?

A

The flexibility needed for every organization.

44
Q

What tools can data scientists use for exploring and training machine learning models?

A

Notebooks or Data wrangler.

45
Q

What can report developers create using the semantic model?

A

Power BI reports.

46
Q

What can analysts use the SQL analytics endpoint for?

A

To query, filter, aggregate, and explore data in lakehouse tables.

47
Q

What is the benefit of combining Power BI with a data lakehouse?

A

You can implement an end-to-end analytics solution on a single platform.

48
Q

Fill in the blank: After data is ingested, transformed, and loaded, it’s ready for _______.

A

others to use.

49
Q

True or False: Dataflows Gen2 are excellent for developers familiar with SQL only.