L8; BIG DATA SYSTEMS Flashcards

Question 1

Q

two types of big data scaling

Answer

A

to handle increased data in a similar time frame you need to scale computing power.
1. Vertical Scaling;
install more processors, memory and better/ faster hardware in a single machine

Horizontal Scaling
spread workload across many machines.

Question 2

Q

Hadoop Horizontal scaling

Answer

A

framework of open-source tools for supporting the examination of data sets that are too large to fit into a traditional data warehouse ot relational database through reliable, scalable and distributed computing.

Question 3

Q

Spark

Answer

A

is a new tool(2010) that can run directly on Hadoop Distributed File System ( HDFS ), inside MapReduce and alongside MapReduce on the same cluster.
unique feature is ability to perform in-memory computations.
it allows data to be catched in memory, thus eliminating Hadoop’s hard disk over head limitation for iterative tasks.
100 times faster than MapReduce when the data can fit in the memory and up to 10 times faster when data resides on the hard disk.
Spark supports streaming data and more complex analytics such as graph algorithms and machine learning such as spark SQL, machine learning.

Question 4

Q

Cloud Computing

Answer

A

Cloud Computing allows for the use of remote vertical or horizontally scaled serves for data storage and analysis. (microsoft Azure)