Processing: 24% (EMR, Spark, Hive, Lambda, Glue, ECS) Flashcards
Be able to: a) determine appropriate data processing solution requirements b) design a solution for transforming and preparing data for analysis c) automate and operationalize a data processing solution
1
Q
Name the four data processing methods
A
a) batch, for processing of massive datasets at once
b) periodic, for unpredictable workloads
c) near real-time, for small bursts of data that must be collected and processed within minutes
d) real-time, for tiny bursts of data that must be processed continually
2
Q
Name the four Hadoop modules
A
a) Common (or Core)
b) the Hadoop Distributed File System (or HDFS)
c) Yet Another Resource Negotiator (or YARN)
d) MapReduce
3
Q
Name the difference in purpose between Hive and Presto
A
Hive is optimised for query throughput whereas Presto is optimised for interactivity