DataFrameWriter Flashcards
How can you save the dateframe to the tmp folder
.write
.option(“path”,”tmp”)
.save
each partition creates how many files
1
How many partitions are written out
200 unless you repartition
How do you repartition
.repartition(number of partitions)
How do you overwrite date
.write.mode(SaveMode.Overwrite)
What is the default file format
parquet
how can you write files as json
.write.format(“json”)
What is the default compression
snappy
how can you specify a compression type
.write.option(“compression”, “snappy”)
How can you specify the partition by column on writing the data
.write.partitionBy(“column_name”)
how can you read in a parquet file
spark.read.parquet(“file path”)
partition by takes what as an argument
string