spark1 Flashcards
Spark is best suited for_____data.
* Real time
* virtual
* structured
* All of the above
All of the above
Which of the following is a feature of Apache Spark?
* Speeds
* Supports multiple languages
* Advanced Analytics
* All of the above
All of the above
What does Spark Engine do?
* Scheduling
* Distributing data across cluster
* Monitoring data across cluster
* all of the above
all of the above
RDD can NOT be created from data stored on?
* LocalFS
* Oracle
* S3
* HDFS
Oracle
For resource management spark can use?
* Yam
* Mesos
* Standalone cluster manager
* All of the above
All of the above
Fault Tolerance in RDD is achieved using?
* Immutable nature of RDD
* DAG(Directed Acyclic Graph) or Data Lineage
* Both A&B
* Neither A nor B
Both A&B
What is transformation in Spark RDD?
* Takes RDD as input and produces one or more RDD as output
* Return final results of RDD computations
* The way to sent results from executors to the driver
* None of the above
Takes RDD as input and produces one or more RDD as output
Which of the following is a feature of Spark RDD?
* In-memory computation
* Lazy evaluations
* Fault Tolerance
* All of the mentioned
All of the mentioned
Four main component built in top of spark core
- Spark ML
- Spark SQL
- Spark streaming
- Spark GraphX
Describe Spark ML
Spark ML provides simple APIs for execute the functions (classifications , clustering , regression) and creating execution pipelines
Describe spark SQL
spark module for working with structured data
Describe spark streaming
large-scale near-real-time stream processing framework
Describe spark GraphX
spark API for graphs-parallel computation, include
-growing collection of graph algorithms
-builders to simplify graph analytics tasks
features of HIVE
good abstraction
declarative language
less error prone
easier to learn & analyze
compile to java map-reduce code
Four key component at Hive architecture
meta store
thrift server
driver
Hive QL
Hive CLI
Different mode of execution in Apache pig