MapReduce And ACID vs BASE Flashcards

Question 1

Q

MapReduce

Answer

A

The way that Google splits the job of finding data into tasks for separate machines

Like a distributed GROUP BY

Question 2

Q

Phases

Answer

A

Map phase: Find and aggregate key values on each node. done on all nodes in parallel
Shuffle phase: group all like categories in each node
Reduce phase: totals the number of things in each category

Question 3

Q

ACID properties of big data

Answer

A

Atomicity
Consistency
Isolation
Durability

Question 4

Q

Atomicity

Answer

A

All or nothing. In a transaction all the operations must succeed or fail as a group
-> ensures data integrity

Achieved through COMMIT and ROLLBACK transactions

Question 5

Q

Consistency

Answer

A

Data must be consistent before and after a transaction

Achieved through Forward recovery, backward recovery

Question 6

Q

Isolation

Answer

A

Transactions never interfere with each other

Achieved through Locking

Question 7

Q

Durability

Answer

A

Transactions are permanent even if they fail

Achieved through: Forward recovery, backward recovery

Question 8

Q

Base properties (a great alternative only if query results can handle some inconsistencies)

Answer

A

Basic Availability
Soft state
Eventual Consistency

Question 9

Q

Basic availability

Answer

A

A Big data alternative that makes it able to tolerate partial failure (failure of a node)

Question 10

Q

Soft state

Answer

A

State of the system is in flux and may change over time
(Correctness of big data is not that important)

Question 11

Q

Eventual consistency

Answer

A

May not be consistent in the short run but will eventually become consistent as more data is added

Question 12

Q

Hadoop ecosystem

Answer

A

Tools that make Hadoop easy to use for people without Java programming skills.

Question 13

Q

Elements of the Hadoop ecosystem

Answer

A

Hive, Sqoop, Pig, Flume, Hbase, Impala

Question 14

Q

Hive

Answer

A

DW system that works with HDFS and it’s not relational.
HiveQL is a declarative (what) SQL like query language. Processes queries into MapReduce jobs.
Works best on large sets of data; doesn’t return small sets of data quickly.

Question 15

Q

Pig

Answer

A

Pig Latin scripting language. Procedural (how).
Compiles pig Latin into MapReduce jobs.
Good for data transformation

Question 16

Q

Flume

Answer

Study These Flashcards

A

(Web click streams one)
Harvests large sets of data from server log files. Can be configured to import data on a regular schedule and can move data into HDFS.

Question 17

Q

Sqoop

Answer

Study These Flashcards

A

SQL to Hadoop
Converts data back and forth between a relational dbms and Hadoop

Question 18

Q

HBase

Answer

Study These Flashcards

A

A NoSQl database that works directly with HDFS
Does not rely on Map-Reduce
Suitable for fast processing of small data sets
Very good at quickly processing sparse data sets
Used for Facebook messaging system

Question 19

Q

Impala

Answer

Study These Flashcards

A

Supports SQL queries that pull data directly from HDFS
Works well for processing large datasets into small results set

MapReduce And ACID vs BASE Flashcards

(19 cards)