04 Working With Spark Data Spurces & Sinks Flashcards

1
Q

What are spark data sources.

A

Sources from where spark reads data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Different types of data sources

A
  1. External data sources.
  2. Internal data sources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some external data sources.

A

Oracle, SQL, Casandra, snowflake, redshift, khafka

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Eg of internal storage system.

A

HDFS, Azure, Amazon S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are data sinks

A

Place where output is stored.it can be both internal and external.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name file format where schema is well-defined.

A

AVRO & Parquet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can we mention schema in json file

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in what files can we mention schema information in the file

A

parquet & AVRO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

suggested file format to write in

A

Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how can we manage data layout

A

by using partitioning we can partition data on 1 or more than 1 column using which we can organize data really well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if there are partitioning how will data frame reader read it

A

data frame reader will read all the subdirectories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what will happen when we apply filter on partitioned data

A

in this case it will not go to all the files but it will go to the required data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly