DataFrameWriter Flashcards

1
Q

How can you save the dateframe to the tmp folder

A

.write
.option(“path”,”tmp”)
.save

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

each partition creates how many files

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many partitions are written out

A

200 unless you repartition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you repartition

A

.repartition(number of partitions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you overwrite date

A

.write.mode(SaveMode.Overwrite)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the default file format

A

parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how can you write files as json

A

.write.format(“json”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the default compression

A

snappy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how can you specify a compression type

A

.write.option(“compression”, “snappy”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you specify the partition by column on writing the data

A

.write.partitionBy(“column_name”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how can you read in a parquet file

A

spark.read.parquet(“file path”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

partition by takes what as an argument

A

string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly