Databricks Data Analyst Flashcards

Question

Describe that Delta Lake manages table metadata.

Answer 1

1) Provides support for schema evolution meaning you can modify the table over time without having to rewrite the whole table. 2) Provides support for managing table properties, such as the location of the table data and the format of the data files

Answer 2

Each table modification creates a new version, you can use VERSION AS OF or RESTORE TABLE TO VERSION to access old versions or revert changes, query table at a specific point in time to time travel

Answer 3

1) ACID Transactions 2) Scalable metadata handling 3) efficient query processing 4) Schema evolution 5) Unified platform https://docs.databricks.com/en/introduction/delta-comparison.html

Answer 4

1) Global tables are available across all clusters in a workspacec anc can be accessed by all users 2) cluster scoped tables are available only within a specific cluster and not visible to other clusters or users 3) Notebook scoped tables are available only within a specific notebook and are not visible to other notebooks or users — Persisting tables in a storage format allows them to be stored on disk and accessed more efficiently

Answer 5

Managed tables: managed by databricks, easy to create (ideal for small to medium sized datasets), optimized for performance Unmanaged tables: more flexible than managed tables, ideal for larger datasets, faster to load and write data, but require more manual management

Answer 6

Check the info under details in the DB Catalog or use DESCRIBE EXTENDED in SQL Query Editor

Answer 7

Use LOCATION to specify where you want the table to go

Answer 8

Standard SQL commands. You can set USE with a dropdown above the SQL editor, don’t need to include that in your statement

Answer 9

views don't store data, but provide a way to access and query the underlying data, persisted in underlying tables so changes to table flow to the view, temp views are not persisted, so only availble during session that creates it, changes to underlying data will not be reflected in temp view

Answer 10

Select a table or view and preview with Preview tab, Secure data with permissions tab, query data with query tab, visualize with visualize tab

Answer 11

standard SQL commands

Answer 12

Details tab on Data Explorer or in SQL you can ALTER TABLE OWNER TO

Answer 13

permissions tab

Answer 14

Create and manage table, ensure data quality, grant and revoke access, monitor usage and performance, ensure compliance

Answer 15

Databricks, especially with the integration of Delta Lake, provides mechanisms for handling PII, such as fine-grained access control. This allows for specific permissions on sensitive data fields, ensuring that only authorized users can access PII.

Answer 16

SELECT statement with WHERE clause

Answer 17

MERGE INTO: merges data from target source to data table based on specific conditions INSERT TABLE: used to insert new rows into table COPY INTO used to load data into a Delta table, MERGE INTO is suitable for updating existing records and inserting new records, while INSERT INTO is used only for adding new records, and COPY INTO is used for loading data from files

Answer 18

ROLLUP: aggregates on a data cube by reducing one or more dimensions CUBE: aggregates based on a data cube by creating all possible combinations of dimensions

Answer 19

rollup generates hierarchical aggregrations starting from leftmost column in group by clause

Answer 20

SUM ___ OVER (Partition By) to aggregate data of specific time internals

Answer 21

works with other relational databases. Portability, interoperability, familiarity, standardization

Answer 22

bronze is raw data staging, silver is like a data warehouse, gold is like tableau data sources. Identify the data you need, Access the data (obtain necessary permissions and credentials), clean data

Answer 23

Query history allows you to use previous queries, caching stores results of a query in memory

Answer 24

higher order functions take other functions as input parameters. They optimize performance by 1) simplifying code 2) reducing amount of data that needs to be filtered 3) use with user defined functions (transform() to apply UDF to each element of an array) 4 use with window fucntions(ex: can use collect_list () to collect column into an array, then percentile_approx() to calculate approximate percentile of the values)

Answer 25

https://docs.databricks.com/en/sql/language-manual/sql-ref-functions-udf-aggregate.html

Answer 26

Formatting can determine where people focus their attention

Answer 27

layout, Colors, font sizes, etc.

Answer 28

more user flexibility within crosstabs than Tableau

Answer 29

parameter value is part of a WHERE clause in the underlying query

Answer 30

affects the entire dashboard

Answer 31

It dynamically generates a dropdown list based on the distinct output of a separate query, allowing users to select values as query parameters. Creates a parameter dropdown like we use in Tableau

Answer 32

Run as viewer' enhances data security by adhering to individual viewer's permissions (pro), but may limit data visibility (con). 'Run as owner' ensures consistent data visibility across users (pro), but might pose security risks if the owner has broader data access (con)

Answer 33

The dashboard will continue to refresh at the set interval, the SQL Warehouse will continue running.

Answer 34

then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis

Answer 35

Adding other sources of data into a data source

Answer 36

Adding weather data to sales metrics

Answer 37

Not the same as Tableau, just refers to joining data together

Databricks Data Analyst Flashcards

(62 cards)