Fabric Flashcards

1
Q

If you have a local file and you want to upload it to lakehouse, what do you do?

A

local file upload with lakehouse explorer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You need to recommend which Fabric architecture to create and the process for integrating the accounting data into Fabric. The solution must minimize administrative effort and costs.

Developers must be able to build reports from the lakehouse or warehouse based on the sales data. The developers must also be able to do ad-hoc analysis of the legacy data at the end of each year.

A

Because the legacy accounting data is large and infrequently accessed, there is no need to move the data into a Fabric workspace. Use a shortcut. Shortcuts enable the querying of remote data without having to move the data. Ingest the Sales Data in a Fabric lakehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a shortcut do/when to use it?

A

Shortcuts enable the querying of remote data without having to move the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When to use the Copy Data Activity?

A

The Copy data activity provides the best performance when copying data from large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s a Type 1 Slowly changing dimension (SCD)?

A

Type 1 SCD: Overwrites existing data, doesn’t keep history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s a Type 2 Slowly changing dimension (SCD)?

A

Type 2 SCD: Adds new records for changes, keeps full history for a given natural key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

From Table X to table Y = what join?

A

Left outer join x to y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What’s a Fabric Tenant?

A

Tenant is the most fundamental part of the structure of Fabric. A tenant is a single instance of Fabric for an organization and is aligned with a Microsoft Entra ID. One Tenant per organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Should semantic models be in separate workspaces?

A

They can be in the same workspace, but for good data governance, they probably should be in separate workspaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do you use WHERE and when do you use HAVING?

A

You use WHERE in order to filter rows before any groupings are made. HAVING filters after the GROUP BY clause is applied. That’s the context clue. “Groups where total” = HAVING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If you see TREATAS and USERELATIONSHIP, what are your context clues? What do they do?

A

Use TREATAS if there’s no pre-existing relationship. It creates relationships on the fly. USERELATIONSHIP relies on an existing defined relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Power BI capacity size is what?

A

Set by the admin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does RELATED mean in DAX and how is it used?

A

The RELATED function retrieves the value from one side of a relationship. The RELATED function acts similar to a VLOOKUP. It get a related value from a secondary table to go into the first, provided there is an active relationship.

TABLE_FACT = NEW_COLUMN = RELATED(TABLE_DIM[DESIRED_COLUMN])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does RELATEDTABLE do in DAX?

A

The RELATEDTABLE function retrieves a table of rows, given an established and active relationship. Useful in aggregations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

One entity in a table can correspond to many in another. One product can have multiple categories.

A

One to many relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

each customer has one account or one badge number

A

one to one relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

products can below to multiple categories and categories can contain multiple products

A

many to many relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When to use performance analyzer?

A

measuring report performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When to use DAX Studio?

A

optimize DAX queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When to use Tabular Editor?

A

model structure optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

You need to compare the sales amount and ensure that the statement displays the value from the previous year in the PreviousSalesAmount column. What function do you use?

A

The LAG() function accesses the data from a previous row in the same result set by using a given physical offset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

While a DataFrame can be saved to the Files section of a lakehouse, it will not be considered a table.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What’s an anti-join?

A

It means you’re excluding data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Languages and examples

A

DAX for PBI, T-SQL for SSMS, Pyspark for Notebooks, M for Gen2 and PowerQuery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Query Folding?

A

Query Folding is a technique used in data processing and analytics to optimize query performance by combining or “folding” multiple steps in a given query into a single operation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

You are planning a Fabric analytics solution.

You need to recommend a licensing strategy to support 10 Microsoft Power BI report authors and 600 report consumers.

The solution must use Dataflow Gen2 for data ingestion and minimize costs.

Which Fabric license type should you recommend?

A

Starting with the F64 license, report consumers can use a free per-user license. PPU is incorrect, because you cannot create non-Power BI items (in this case Dataflow Gen2) with PPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

XMLA endpoint

A

Just a way to connect. Like an API.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does OPTIMIZE do?

A

OPTIMIZE command; it merges all changes into bigger, consolidated parquet files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What do shortcuts do?

A

Shortcuts in a lakehouse allow users to reference data without copying it. It unifies data from different lakehouses,

30
Q

You are planning a Fabric analytics solution for the following users:

2,000 Microsoft Power BI consumers without an individual Power BI Pro license.
32 Power BI modelers with an individual Power BI Pro license.
16 data scientists
You need to recommend a Fabric capacity SKU. The solution must minimize costs.

What should you recommend?

A

F64 is the smallest Fabric capacity (equivalent to a P1 Power BI Premium capacity) that supports premium Fabric workspaces and does not require Power BI report consumers to have individual Power BI Pro licenses. F2 and F32 are incorrect since they require that the 2,000 employees have individual Power BI Pro licenses to consume Power BI content. F2048 is incorrect since it is not the smallest capacity that meets the stated requirements.

31
Q

You use Microsoft Power BI Desktop to connect to data stored in a CSV file.

You need to use Power Query Editor to identify the percentage of valid records in a column before loading the data to a report.

Which Power Query option should you use?

A

A percentage of valid records in the column is displayed when you enable Column quality.

32
Q

RLS

A

A way to restrict data access for given users. Filters restrict data access at the row level, and you can define filters within roles. Restricts access to specific attributes. Can be static or dynamic.

33
Q

OLS

A

OLS enables restricting access to semantic model objects, such as tables, columns, and calculations based on these columns.

34
Q

RLS & OLS Limitations

A

You can’t mix RLS and OLS in the same role.

35
Q

good modeling practices

A

Strive to define fewer datasets (data models) with well-designed roles.

Strive to create fewer roles by using dynamic rules. A data-driven solution is easier to maintain because you don’t need to add new roles.

When possible, create rules that filter dimension tables instead of fact tables. It will help to deliver faster query performance.

36
Q

You need to recommend a Fabric capacity SKU that minimizes costs.

A

F64 is the smallest Fabric capacity

37
Q

You need to alter the table to add CustomerKey as a primary key. Which command should you run?

A

PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are both used.

38
Q

What can the Admin capability/role do?

A

Admin capability can do everything

39
Q

What can the member role do?

A

Member can do all except add or remove admins and update or delete workspace. The only one who can remove is admin. The Member can add at or below their level.

Think member of inner circle. Supervisor?

40
Q

What can the contributor role do?

A

Contributors can’t add or subtract people, update or delete workspace, or allow others to reshare items.

they clock in. Trusted FTE. Grunt worker.

41
Q

What can the viewer role do?

A

Viewer can read info. Viewing information. View and Read, connect to SQL Endpoint, read data through shortcuts, and view execution output of pipelines. Can’t do writing or anything else though.

42
Q

why would adding index columns prevent query folding?

A

Query folding is the ability to translate the entire transformation into a single SQL query. Adding an index column into power query is adding an operation that cannot be translated into SQL.

43
Q

What is a pbip file?

A

pbip is a text based pbix file. Very important for version control.

44
Q

What can you do in tabular editor?

A

You can create semantic model in tabular editor and deploy via xmla endpoint.

45
Q

What is the XMLA endpoint and where can you find the endpoint?

A

the XMLA endpoint is just an API with a URL to the workspace or dataset. Endpoint is at workspace settings.

46
Q

What is the default for endpoint connectivity?

A

By default, endpoint connectivity is read-only. Read-Write allows you to edit things.

47
Q

What’s the limitation of using XMLA endpoint for PBI deployment?

A

That the PBIX file can’t be downloaded. Why? Once modified via the endpoint, the connection between the PBIX file and the data model changes.

48
Q

Can you create tables in the lakehouse?

A

Yes

49
Q

What is denormalization?

A

Denormalization is the process of combining database tables into larger tables. For example, combining customer and customer details table into a single table containing all information from both tables.

50
Q

What is a snowflaked dimension table?

A

A dimension table with a snowflake schema. A snowflake schema is a type of schema where dimension tables are further normalized.

51
Q

What is column profile feature?

A

Provides stat data about values in column.

52
Q

What is column quality feature?

A

Feature that shows percentage of valid records.

53
Q

What is difference between distinct and unique in terms of values? Use list [1,2,2,3] as an example.

A

In the given list, the distinct values are 1, 2, 3. In the same list, the unique values are 1 & 3 because 2 repeats.

Unique means must be different.

54
Q

Difference between DENSE_RANK() and RANK()?

A

Simply put, RANK() skips the number of positions after records with the same rank number. It makes it so that if two rows are tied for 2nd place, they both get rank 2, and the next rank will be 4.

DENSE_RANK() makes it so that if two rows are tied for 2nd place, they both get rank 2, and the next rank will be 3.

55
Q

If adding primary keys, what are needed keywords?

A

Non clustered means Key is stored separately instead of with records. Non enforced means uniqueness won’t be enforced. Fabric is only non clustered and non enforced. Reminder is both “non”.

56
Q

What is ALM Toolkit used for?

A

ALM Toolkit is used for schema comparison for semantic models.

57
Q

How to avoid getting overwhelmed when it comes to the case study?

A

My tip here is to skip ahead to the first question immediately, and then go back and start reading the case-study.

58
Q

What Is The Difference Between SQL GREATEST and MAX?

A

GREATEST can be used to go across the row, and MAX can be used to go down the columns, or compare values in the same column in different rows.

I had a problem where the answer was GREATEST but I could have made the mistake and picked MAX. Problem is defined as highest value from a list. The list is one of columns.

59
Q

What are characteristics of ALM Toolkit?

A

A schema compare tool for Power BI models and semantic models.

You can perform straightforward deployment across environments and retain incremental refresh historical data.

You can diff and merge metadata files, branches, and repos.

Keywords:
A schema compare tool.
You can perform deployment.
You can diff and merge metadata files

60
Q

What’s a field parameter?

A

parameter that allows for dynamic visuals through selection. The parameter is on the axis of the graph in addition to the parameter list. This is what makes the dynamic aspect.

By the selecting a value from a visual list, the axis on the graph changes, in turn changing everything else.

61
Q

Dynamic format string

A

Your text output is formatted based on another attribute or a condition. Hence dynamic.
Example is sales in diff currency should have amount reflect in said currency.

62
Q

What’s a Capacity?

A

A pool of resources

63
Q

What’s a Domain?

A

A way to group workspaces in Fabric

64
Q

Why is minimize implementation a key phrase in cases?

A

It should point you towards tighter integration. Azure Repos vs Git. Dataflows vs Stored Procedures.

65
Q

List Case Study Techniques:

A
  1. Write down question
  2. Highlight Key Information: When reading the case, quickly highlight or underline key points. This can help you focus on the most important details without getting bogged down. For example: where is the data? This would help on a data retrieval question.
  3. Summarize in Your Own Words: Briefly summarize the case in your mind or on paper. This can help you retain the critical information.
66
Q

What does Table.Profile do?

A

provides a statistical summary of the columns in a table, including the maximum values for numeric columns

67
Q

Display vs Show?

A

The show function is used to display the contents of a Spark DataFrame in a tabular format within the notebook.

It does not provide the interactive charting capabilities that display offers.

68
Q

What should external tools be an indictor of?

A

The phrase external tools should be pointing the test taker toward XMLA Endpoints then.

69
Q

Define a semantic model

A

A custom semantic model can be thought of as a detailed map or blueprint for your data.

70
Q

Direct Lake vs Direct Query:

A

Direct Lake mode is optimized for performance, loading data into memory from Delta tables.

DirectQuery is used when real-time data access is crucial. Supported data sources: Commonly used with SQL, OData, and other supported data sources.