Syntax Flashcards by Philip Lewis

Search for database

show databases like ‘*name*’

How well did you know this?

Not at all

Perfectly

Search for table

show tables in attribution like ‘*tablename*’

How well did you know this?

Not at all

Perfectly

Query files under table

select distinct input_file_name() from database.tablename

How well did you know this?

Not at all

Perfectly

Query table metadata (2 types)

describe table [extended] database.tablename

describe detail database.tablename

How well did you know this?

Not at all

Perfectly

Get table history

describe history database.tablename

How well did you know this?

Not at all

Perfectly

Get Point in time for table

select * from database.table timestamp as of ‘2023-04-12T13:43:30’

How well did you know this?

Not at all

Perfectly

Search file system

dbutils.fs.ls(“/”)

How well did you know this?

Not at all

Perfectly

Return X number of records in sql query

Select * from database.tablename Limit X

How well did you know this?

Not at all

Perfectly

Use data frame to build temporary view

df.createOrReplaceTempView(“database.viewname”

How well did you know this?

Not at all

Perfectly

Move file data into dataframe

df = (
spark.read
.format(“csv”)
.option(“delimter”,”,”)
.option(“header”,”true”)
.load(“/mnt/folder”)
)

How well did you know this?

Not at all

Perfectly

List secrets in scope

dbutils.secrets.list(“scopename”)

How well did you know this?

Not at all

Perfectly

View all scopes

dbutils.secrets.listscopes()

How well did you know this?

Not at all

Perfectly

Filter rows in data frame

df = df.filter(
(df.city == “Ki”) & (df.province == “Zu”)
)

How well did you know this?

Not at all

Perfectly

Execute notebook with widgets

dbutils.notebook.run( path=”/Users/”,
timeout_seconds=12,
{
“mountPoint”:”/mnt/”
}
)

How well did you know this?

Not at all

Perfectly

Activate intellisense

Tab

How well did you know this?

Not at all

Perfectly

Run cell

Ctr + Enter

Run cell & create new cell

Shift + enter

Create parameter

dbutils.widgets.text(
name=”mountPoint”,
defaultValue=””
)

Use parameter

dbutils.widgets.get(
“mountPoint”
)

Create Generated Surrogate key on delta table

personId as bigint GENERATED ALWAYS AS IDENTITY
(
START WITH 0
AND
INCREMENT BY 1
)

3 Ways to create unique key in databricks

1) Monotonically_increasing_id()
2) Window function
3) Generated Identity column

Set up Database

CREATE DATABASE database_name LOCATION (‘/mnt/gold/’)

Create Parquet File

df.write.format(“parquet”).mode(“append”).save(“/mnt/table_name/)

Delete table data from file storage

Dbutils.fs.rm(“/mnt/db/tablename”,True)

Create Parquet file and hive table

Df.write.format(“parquet”).option(“path”,”/mnt/database/table_name").saveAsTable(“database.filename")

sql function to load every file only once into table

Copy into database.table From "/mnt/database/tablename/" Fileformat = csv

Explain checkpointing

Changes 1-9 generate a separate json file in delta log. When Json file 10 is written a checkpoint parquet file is also written

Create Data Frame from List with metadata and data

df = createDataFrame().Collect()

Loop through data frame

for row in df: print(row[""])