Week 5 - practice quiz Flashcards

Question 1

Q

Which company has created the MapReduce framework as a concept?

1) Amazon
2) Oracle
3) Microsoft
4) Google

Answer

A

4) Google

Question 2

Q

Which company has implemented Hadoop an an open-source version of MapReduce?

1) Google
2) Amazon
3) Microsoft
4) Yahoo

Question 3

Q

Which of the following is true about the Hadoop file system?

1) Files are append-only
2) Files split in to 1 GB blocks
3) Meta node stores metadata
4) Each node stores distinct data blocks

Answer

A

1) Files are append-only

Question 4

Q

What does HDFS stand for?

1) Highly Distributed File System
2) Highly Disturbed File System
3) High Definition File System
4) Hadoop File System

Answer

A

4) Hadoop File System

Hadoop Disturbed File System

Question 5

Q

What is the data type used by Hadoop for a MapReduce process?

1) Column-based
2) Document-based
3) Graph-based
4) Key-value

Answer

A

4) Key-value

Question 6

Q

What is the output of the Map function in a MapReduce process?

1) List of graph nodes
2) List of key-value pairs.
3) List of table columns
4) List of network nodes

Answer

A

2) List of key-value pairs.

Question 7

Q

Where do mapper nodes save their outputs before serving to reducer nodes?

1) Local disk
2) Another node
3) Central node
4) Master node

Answer

A

1) Local disk

Question 8

Q

What does Hadoop do with a task that crashes in a node?

1) The task is retried on another node.
2) The node is rebooted.
3) The task is failed.
4) The node is shut down.

Answer

A

1) The task is retried on another node.

Question 9

Q

Apache Spark sorts its data processing operations, such as collect, filter, and sort, by building a graph called DAG. What does DAG stand for?

1) Derived Apache Graph
2) Distributed Apache Graph
3) Directed Acyclic Graph
4) Distributed Asymmetric Graph

Answer

A

3) Directed Acyclic Graph

Question 10

Q

Which of the following statements about the difference between Hadoop and Spark is true?

1) Hadoop supports in-memory cluster computing.
2) Hadoop is faster than Spark.
3) Both Hadoop and Spark can load data from Hadoop File System (HDFS)
4) Hadoop provides multiple built-in data processing operations such as filter and join.

Answer

A

3) Both Hadoop and Spark can load data from Hadoop File System (HDFS)

Question 11

Q

What is the input for the Reduce function in a MapReduce process?

1) Keys and their corresponding list of values.
2) Keys and their corresponding maps.
3) Keys and their corresponding nodes.
4) Maps and their corresponding values.

Answer

A

1) Keys and their corresponding list of values.

Question 12

Q

What is the output of the Reduce function in a MapReduce process?

1) List of key-value pairs
2) List of key-node pairs.
3) List of key-reducer pairs.
4) List of key-mapper pairs.

Answer

A

1) List of key-value pairs

Question 13

Q

Which of the following is the correct sequence of phases in a MapReduce process?

1) Input, Splitting, Shuffling, Mapping, Reducing, Output
2) Input, Splitting, Mapping, Reducing, Shuffling, Output
3) Input, Splitting, Mapping, Shuffling, Reducing, Output
4) Input, Mapping, Splitting, Shuffling, Reducing,

Answer

A

3) Input, Splitting, Mapping, Shuffling, Reducing, Output

Question 14

Q

What does Hadoop do with a task that repeatedly crashes in a MapReduce system?

1) The task is failed.
2) The task is retried on another system.
3) The system is rebooted.
4) The system is shut down.

Answer

A

1) The task is failed.

Question 15

Q

What does Hadoop do when a node crashes during a MapReduce process?

1) Ignores all of the maps created on all of the nodes.
2) Ignores all of the maps created on the node crashed.
3) Re-launches any maps the node previously ran.
4) Re-launches any maps all of the nodes previously ran.

Answer

A

3) Re-launches any maps the node previously ran.

Question 16

Q

Which of the following data operators requires implementation of a reduce function in a MapReduce

1) GROUP BY
2) SELECT
3) PROJECT
4) SORT

Answer

A

1) GROUP BY

Question 17

Q

What is the output of a JOIN operation in a MapReduce process?

1) Key-column pairs
2) Key-node pairs
3) Key-map pairs
4) Key-value pairs

Answer

A

4) Key-value pairs

Question 18

Q

What is Apache Spark?

1) A cloud-based spreadsheet software.
2) Interconnected computing nodes.
3) A cluster of server computers.
4) A distributed data-processing software.

Answer

A

4) A distributed data-processing software.

Question 19

Q

Apache Spark relies on a database concept called RDD. What does RDD stand for?

1) Relational Dynamic Database
2) Recoverable Distributed Database
3) Resilient Distributed Dataset
4) Rigorous Distributed Database

Answer

A

3) Resilient Distributed Dataset

Question 20

Q

There are two types of RDD operations in Apache Spark: transformation and action. Which of the following is an action operation?

1) Count
2) Map
3) Filter
4) Join

Question 21

Q

Which of the following was written on top of the Apache Spark software?

1) Python
2) GraphX
3) Java
4) Scala

Answer

A

2) GraphX

Question 22

Q

Which of the following big data software is implemented by Google to rank websites using their popular PageRank algorithm?

1) Oracle
2) MySQL
3) Spark SQL
4) GraphX

Answer

A

4) GraphX

Question 23

Q

What is the method implemented by Apache Spark to process live streaming data?

1) Real time processing
2) Batch processing
3) Binary processing
4) On-demand processing

Answer

A

2) Batch processing

Question 24

Q

Which of the following is an example of live streaming data?

1) Student grades submitted by an instructor.
2) An online banking statement for an individual.
3) A Wikipedia article about a historical figure.
4) A Twitter hashtag containing a company name.

Answer

A

4) A Twitter hashtag containing a company name.

Question 25

Q

During the processing of live streaming data by Apache Spark, what does each batch correspond to?
1 point

1) RDD (Resilient Distributed Dataset)
2) Node
3) Query
4) Second

Answer

A

1) RDD (Resilient Distributed Dataset)

Question 26

Q

How is spatial data different from traditional data?

1) Spatial data is tied to physical space.
2) Spatial data represents simple spaces.
3) Spatial data has one dimension.

Answer

A

1) Spatial data is tied to physical space.

Question 27

Q

What is the best definition of a KNN query?

1) A KNN query is a query that is nested inside a SQL statement and is embedded in the where clause.
2) A KNN query retrieves all records where a value is between an upper and lower boundary.
3) KNN query is the nearest neighbor of a given query point q to find k closest objects from q based on it’s spatial distance.

Answer

A

3) KNN query is the nearest neighbor of a given query point q to find k closest objects from q based on it’s spatial distance.

Question 28

Q

How does Hadoop and MapReduce relate to each other?

1) MapReduce is the framework used by Hadoop software.
2) Hadoop is a computer operating system while MapReduce is a software application.
3) Hadoop is the framework used by MapReduce software.
4) Hadoop is a server-side application while MapReduce runs on client computers.

Answer

A

1) MapReduce is the framework used by Hadoop software.

Question 29

Q

Which of the following is the correct order of functions in the typical processing of big data?

1) Map and Reduce functions can run in parallel.
2) Map and Reduce functions can run simultaneously.
3) The Map function has to finish before the Reduce function starts.
4) The Reduce function has to finish before the Map function starts.

Answer

A

3) The Map function has to finish before the Reduce function starts.

Question 30

Q

What is the name of the transitional phase between the Map and Reduce phases in a big data process?

1) Data mapping
2) Data mining
3) Data scrubbing
4) Data shuffling

Answer

A

4) Data shuffling

Question 31

Q

What happens during the data shuffling phase in a typical big data process?

1) Data generated during the reduce phase is encrypted to make it secure.
2) Data generated during the reduce phase is routed to different nodes in the cluster.
3) Data generated during the map phase is routed to different nodes in the cluster.
4) Data generated during the map phase is encrypted to make it secure.

Answer

A

3) Data generated during the map phase is routed to different nodes in the cluster.

Question 32

Q

Which of the following phases in a typical Hadoop process provide full programming control to users?

1) Map and Reduce
2) Map and Shuffling
3) Reduce and Shuffling
4) Compress and Shuffling

Answer

A

1) Map and Reduce

Question 33

Q

How many copies of a piece of data are generated by the Hadoop File System (HDFS) in order to allow for fault tolerance?

1) 3
2) 8
3) 10
4) 64

Question 34

Q

In which programming language are Map and Reduce functions written?

1) HTML
2) Java
3) C++
4) Python

Question 35

Q

What is the size of each data block in the Hadoop file system?

1) 128 MB
2) 1 GB
3) 100 MB
4) 1 MB

Answer

A

1) 128 MB

Question 36

Q

What does each node correspond to in a Hadoop cluster?

1) Data center
2) A data block
3) A computing machine
4) A data cloud

Answer

A

3) A computing machine

Question 37

Q

What is the name of the special node in a Hadoop cluster that stores metadata of the entire cluster?

1) Name node
2) Master node
3) Hub node
4) Meta node

Answer

A

2) Master node