4. Hadoop Related Projects Flashcards
Hive was originally developed by:
A) Google
B) Facebook
C) Apache Software Foundation
D) IBM
B) Facebook
Which of the following is NOT a characteristic of Hive?
A) It supports real-time data processing
B) It uses a SQL-like language called HiveQL
C) It is built on top of Hadoop
D) It is used for data warehousing
A) It supports real-time data processing
What is the main purpose of Spark?
A) To provide a more efficient alternative to MapReduce
B) To support online transaction processing
C) To manage Hadoop clusters
D) To store large datasets
A) To provide a more efficient alternative to MapReduce
Resilient Distributed Datasets (RDDs) in Spark are:
A) Mutable collections of data items
B) Fault-tolerant and can be operated on in parallel
C) Stored on disk by default
D) Only accessible in Scala
B) Fault-tolerant and can be operated on in parallel
Which of the following is an advantage of Spark over MapReduce?
A) Spark cannot handle large datasets
B) Spark writes intermediate results to disk
C) Spark can cache intermediate results in memory
D) Spark supports only batch processing
C) Spark can cache intermediate results in memory
In which language was Spark originally developed?
A) Java
B) Python
C) R
D) Scala
D) Scala
Which of the following is a limitation of HiveQL compared to ANSI SQL?
A) It supports “insert into” for existing tables
B) It does not support the equality operator in join predicates
C) It does not support “update” or “delete” operations
D) It is fully ANSI-compliant
C) It does not support “update” or “delete” operations
Spark’s ability to cache intermediate results in memory is particularly useful for:
A) Online transaction processing
B) Iterative algorithms
C) Long-term data storage
D) Reducing network traffic
B) Iterative algorithms
In Hive, which command is used to load data into a table?
A) INSERT INTO
B) LOAD DATA INPATH
C) UPDATE TABLE
D) SET DATA
B) LOAD DATA INPATH
Which of the following is NOT a feature of Spark’s RDDs?
A) They are mutable
B) They are distributed across the cluster
C) They are resilient
D) They can be cached in memory
A) They are mutable
Which of the following operations is an action in Spark?
A) map()
B) filter()
C) reduce()
D) flatMap()
C) reduce()
HiveQL supports which of the following operations?
A) Real-time processing
B) Transactional updates
C) Ad-hoc querying
D) In-memory computations
C) Ad-hoc querying
In Spark, an RDD can be created from:
A) Only HDFS files
B) Only local files
C) Both HDFS files and local files
D) Neither HDFS files nor local files
C) Both HDFS files and local files
Which of the following is a limitation of HiveQL?
A) It does not support JOIN operations
B) It cannot handle large datasets
C) It does not support “insert into” for existing tables
D) It requires data to be structured
C) It does not support “insert into” for existing tables
Spark’s ability to cache data in memory is beneficial for:
A) Long-term data storage
B) Real-time transaction processing
C) Iterative algorithms
D) Disk-based data processing
C) Iterative algorithms