Syntax Flashcards
Search for database
show databases like ‘*name*’
Search for table
show tables in attribution like ‘*tablename*’
Query files under table
select distinct input_file_name() from database.tablename
Query table metadata (2 types)
describe table [extended] database.tablename
describe detail database.tablename
Get table history
describe history database.tablename
Get Point in time for table
select * from database.table timestamp as of ‘2023-04-12T13:43:30’
Search file system
dbutils.fs.ls(“/”)
Return X number of records in sql query
Select * from database.tablename Limit X
Use data frame to build temporary view
df.createOrReplaceTempView(“database.viewname”
Move file data into dataframe
df = (
spark.read
.format(“csv”)
.option(“delimter”,”,”)
.option(“header”,”true”)
.load(“/mnt/folder”)
)
List secrets in scope
dbutils.secrets.list(“scopename”)
View all scopes
dbutils.secrets.listscopes()
Filter rows in data frame
df = df.filter(
(df.city == “Ki”) & (df.province == “Zu”)
)
Execute notebook with widgets
dbutils.notebook.run( path=”/Users/”,
timeout_seconds=12,
{
“mountPoint”:”/mnt/”
}
)
Activate intellisense
Tab
Run cell
Ctr + Enter
Run cell & create new cell
Shift + enter
Create parameter
dbutils.widgets.text(
name=”mountPoint”,
defaultValue=””
)
Use parameter
dbutils.widgets.get(
“mountPoint”
)
Create Generated Surrogate key on delta table
personId as bigint GENERATED ALWAYS AS IDENTITY
(
START WITH 0
AND
INCREMENT BY 1
)
3 Ways to create unique key in databricks
1) Monotonically_increasing_id()
2) Window function
3) Generated Identity column
Set up Database
CREATE DATABASE database_name LOCATION (‘/mnt/gold/’)
Create Parquet File
df.write.format(“parquet”).mode(“append”).save(“/mnt/table_name/)
Delete table data from file storage
Dbutils.fs.rm(“/mnt/db/tablename”,True)
Create Parquet file and hive table
Df.write.format(“parquet”).option(“path”,”/mnt/database/table_name”).saveAsTable(“database.filename”)
sql function to load every file only once into table
Copy into database.table
From “/mnt/database/tablename/”
Fileformat = csv
Explain checkpointing
Changes 1-9 generate a separate json file in delta log. When Json file 10 is written a checkpoint parquet file is also written
Create Data Frame from List with metadata and data
df = createDataFrame(<list>).Collect()</list>
Loop through data frame
for row in df:
print(row[“<column_name>"])</column_name>