DP-600 Part 1 Flashcards

Question

About Domains

Answer 1

It is a way of logically grouping together all the data in an organization that is relevant to a particular area or field. To group data into domains, workspaces are associated with domains!!! When a workspace is associated with a domain, all the items in the workspace are also associated with the domain and they receive a domain attribute as part of their metadata.

Answer 2

- Fabric admin (or higher): fabric admins can create and edit domains, specify domain admins and domain contributors, and associate workspaces with domains. Fabric admins see all the defined domains on the Domains tab in the admin portal and they can edit and delete domains - Domain admin: can only see and edit the donains they're admins of - domain contributor: are workspace admins whom a domain or fabric admin has authorized to assign the workspaces they're the admins of to a domain, or to change the current domain assignment.

Answer 3

1. The system scans the organization's workspaces. When it finds a workspace whose admin is a specified user or member of a specified security group: if the workspace already has a domain assignment, it is preserved. The default domain doesn't override the current assignment. If the workspace is unassigned, it is assigned to the default domain. 2. After this, whenever a specified user or member of a specified security group creates a new workspace, it is assigned to the default domain. The specified users and/or members of the specified security groups generally automatically become domain contributors of worskpaces that are assigned in this manner

Answer 4

Parquet: a way to store data in a very organized and efficient manner. Great for big data tools like Apache Spark and Hadoop because it helps save space and speeds up data retrieval. However once you write a parquet file, you cant change it. If you need to update it, you have to create a whole new file. Delta file is the upgraded version of parquet. It allows you to make changes to your data without creating bew files every time. So if you need to update or depete something, you can do it easily. Handy for real-time applications Relating to microsoft fabric: - Using Parquet files: getting the benefits of efficient storage and fast queries - Using Delta files: gain the ability to handle chanfes in your data more flexibly, which is great for applications that need to adapt quickly to new information.

Answer 5

table which is stored in the Fabric Tables section and data as well as metadata are managed by spark

Answer 6

The default lakehouse (for a notebook) at different stages

Answer 7

- managed through branching - managed through Azure DevOps Pipelines (YAML templates) - for semantic models, you can do it using the XMLA endpoint

Answer 8

1. Clone the repository to your local machine 2. Checkout a new feature branch from the MAIN branch 3. Make the required changes to the report 4. Commit and push the feature branch 5. Open a pull request in Azure Repos 6. Wait for approval, then merge into the main

Answer 9

Go to the workspace settings for the workspace you want to deploy your model to

Answer 10

- pbix: standard - pbit: template - pbip: track changes in Git for version control

Answer 11

Observe capacity utilisation trends to determine what processes are consuming CUs and whether any throttling is occurring

Answer 12

Retrieves information about the current state of the data warehouse

Answer 13

Returns info about data warehouse connections

Answer 14

Returns information about authenticated sessions

Answer 15

Returns information about active requests and provide details about SQL commands running in the data warehouse

Answer 16

Admin: all 3 Member,viewer,contributor: except sys.dm_exec_connection

Answer 17

- queryinsights.exec_requests_history: details of each completed SQL query - queryinsights.long_running_queries: details of query execution time - queryinsighrs.frequently_run_queries: details of frequently run queries

Answer 18

Use tabular editor to run apply refresh policy command on a table that has an incremental refresh policy defined in power bi desktop. This will create the partitions based on the policy but does not process them. This method is useful when working with very large datasets when the initial full load can take many hours

Answer 19

SQL Server Profiler and DAX Studio can detect whether queries were returned from the in-memory cache storage engine or pushed by DirectQuery to the data source

Answer 20

Only lakehouse can create shortcut to other lakehouses. Fabric data warehouse can use data pipeline but cannot use shortcuts!!!

Answer 21

The native refresh scheduler for dataflows, just like semantic models, can be scheduled every 30 minutes.

Answer 22

When manually creating or updating statistics for optimising query performance, you should focus on columns used in JOIN ORDER BY and GROUO. Y clauses

Answer 23

1. Data ingestion: Dataflow, data pipeline, notebook, eventstream 2. Shortcuts: external (amazon s3, adls) and internal (lakehouse, warehouse, kql tables) 3. Database mirroring: snowflaks,cosmosdb, azure sql

Answer 24

Pros: now low code, perform ETL, access on-premise, get multiple datasets at once (but better to soace it out for data validation), can upload raw diles or static files Cons: struggles with large datasets, difficult to implement data validation, cannot pass in external parameters

Answer 25

Pros: now low code, perform ETL, access on-premise, get multiple datasets at once (but better to soace it out for data validation), can upload raw diles or static files Cons: struggles with large datasets, difficult to implement data validation, cannot pass in external parameters

Answer 26

Pros: able to ingest large datasets, import cloud data, wheb you need control flow logic, trigger wide variety of actions in fabric (and outside of fabric) like dataflows, notebooks, stored procs, kql scripts, webhooks, azure functions, azure ml, azure databricks Cons: cannot do trasnform natively but can embed notebooks or dataflow, no ability to upload local files, does not work cross workspace currently

Answer 27

Pros: extraction from APIs (using Python requests library or similar), to use client libraries, good code for reuse, for data validation and data quality testing of incoming data, fastest in terms of data ingestion (and most efficient for CU soend) Cons: when you dint have a puthon caoabukuty in your organisation and when you want to write to a data warehouse

Answer 28

to track fabric items like semantic model

Answer 29

Compute: capacity units spend, time, overages and also breakdown by workspace and fabric item. Storage: GB storage by workspace

Answer 30

- Used for dataflows - Provides a high level summary of a dataflow run which is available in the monitoring hub - Lower-level data and specific understanding of a dataflow is available in the refresh history of a particular dataflow. (Inspect error messages and get a breakdown of the different sub-activities being performed by the dataflow)

Answer 31

- Staging: when enabled it's useful when you have huge data volume and require lots of transformation. Not useful for small dataset - Fast Copy: similar to data pipeline, used when struggling with performance - DMV: receive live SQL Query lifecycle insights

Answer 32

Context: Fabric is built on delta files. And while it is a great format as it updates the data into the same file. However, it can lead to poor performance and bloated storage size. Solution: V-Order, Optimise, VACUUM,coalesce.repartition are some of the options to help reduce the problem with delta tables

Answer 33

%%sql describe

Answer 34

reduce file size - if it's enabled: spark.conf.get - to enable/disable: spark.conf.set(____,'false'/'true')

Answer 35

It is idempotent (meaning that it won't reoptimise files that has been optimised) - performs bin compaction by joining small files into large files

Answer 36

removal of files no longer reference by a delta table

Answer 37

spark method to reduce the amount of partitions in a delta table. Let's say you have 100 partitions, you can coalesce into 10 partitions

Answer 38

Involves breaking of existing partitions to create new partitions, either be more or less than the original partitions. Repartitioning is an expensive operation because it involves shuffling (unlike coalesce) - always reoptimises files regardless whether it has been optimised previously or not

Answer 39

increases the write time but reduces the read performance

Answer 40

Contributor level. Viewer not possible

Answer 41

- read parquet files direct. - for real-time, stored in one fabric data store, and dataset size is large - must be in lakehouse or warehouse

Answer 42

for many-to-many relationships without the need for bridge tables. For example: for fact tables use direct query and for dim tables use import mode

Answer 43

* Implicit measures in Power BI are measures that are automatically created by Power BI based on the data model. Power BI generates implicit measures based on the aggregation used in the visualizations. For example, if you create a bar chart and drag a field to the Value area, Power BI automatically creates a sum aggregation for that field, which becomes an implicit measure. Implicit measures are also created when you use the Quick Measure feature in Power BI. * Explicit measures in Power BI are measures that are created by the user using DAX formulas. Explicit measures are highly customizable and can be used to create more complex calculations.

Answer 44

-Microsoft Power BI Desktop: allows you to control and manage how measures are used within your reports. By carefully defining and using explicit measures within your data model, you can ensure that report creators use only these predefined measures instead of creating implicit measures automatically -Tabular Editor: powerful tool for managing and editing Power BI and Analysis service tabular models. It allows you to enforce best practices, such as disabling implicit measures, by modifying model's properties and ensuring that only explicit measures are available for use in reports.

Answer 45

- removing or renaming columns -merging foldable queries that are based on the SAME source - appending foldable queries based on the SAME source - numeric calculations - joins - pivot and unpivot

Answer 46

- merging or appending queries that are based on different sources - using some functions while adding custom columns that do not have a counterpart in sql - adding columns with complex logic. these refer to functions that do not have equivalent functions in the data source.

Answer 47

- increased efficiency - optimisation of cpu usage - improved data security

Answer 48

- display() - display(df,summary=true) to check the statistics summary of a given Apache Spark Dataframe. The summary includes the column name, column type, unique values, and missing values for each column. - Microsoft Fabric also supports displayHTML() option

Answer 49

- import powerbiclient Capabilities: - you can render an existing Power BI report Istill in preview) - create report visuals from a pandas Dataframe - create report visuals from a spark dataframe

Answer 50

Allows users not only to read the data from the Power BI service but also to write back to it. Necessary for users to create and publish custom direct lake semantic models. Without write access, users would be unable to publish or update their models, which is critical part of managing semantic models.

Answer 51

We need to enable XMLA endpoints from the tenant settings to ensure that external tools interact with the data models as and when necessary within the tenant's workspace. The analyze in excel is just complimentary to this setting in Fabric and is irrelevant

Answer 52

- Applies to queries on warehouse or sql analytics endpoint - will fall back to direct query mode

Answer 53

provides information about memory grants for queries. Helps identify the frequently used columns that are loaded into memory

Answer 54

Microsoft fabric offers the capability to create near-instantaneous zero-copy clones with minimal storage costs. - table clones facilitate development and testing processes by creating copies of tables in lower environments - provides consistent reporting and zero-copy duplication of data for analytical workloads and ML modeling and testing - provide the capability of data recovery in the event of a failed release or data corruption by retaining the previous state of data. - help create historical reports that reflect the state of data as it existed as of a specific point-in-time in the past.

Answer 55

creates a replica of the table by copying the metadata, while still referencing the same data files in OneLake. The metadata is copied while the underlying data of the table stored as parquet files is not copied. The creation of a clone is similar to creating a table within a Warehouse in Microsoft Fabric

Answer 56

- users with admin,member, or contributor workspace roles can clone the tables within the workspace. The viewer workspace role cannot create a clone - select permission on all the rows and columns of the source of the table clone is required. - users must have create tale permission in the schema where the table clone will be created

Answer 57

- inherits OLS from the source table of the clone - inherits RLS and dynamic data masking - all atributes that exist at the source, same schema or not - inherits primary and unique key

Answer 58

ONLY IN THE LAKEHOUSE

Answer 59

Use Table.Profile instead of Table.Max because Table.Max returns the row in a table that contains the maximum value for a specified column, rather than providing the maximum values for all numeric columns.

Answer 60

By default, Premium capacity or Premium Per User semantic model workloads have the XMLA endpoint property setting enabled for read-only. This means applications can only query a semantic model. For applications to perform write operations, the XMLA Endpoint property must be enabled for read-write. - Go to Settings > Admin Portal > Capacity Settings > Power BI premium > capacity name - Expand workloads. In the XMLA endpoint setting, select read write.

Answer 61

removes all rows from a table, but the table structure and its columns, constraints, indexes, and so on, remain.

Answer 62

by default yes. No need to enable it .When writing parquet files, you can specify the desired compression codec, to further optimise storage and performance

DP-600 Part 1 Flashcards

(88 cards)