L8; BIG DATA SYSTEMS Flashcards
two types of big data scaling
to handle increased data in a similar time frame you need to scale computing power.
1. Vertical Scaling;
install more processors, memory and better/ faster hardware in a single machine
- Horizontal Scaling
spread workload across many machines.
Hadoop Horizontal scaling
framework of open-source tools for supporting the examination of data sets that are too large to fit into a traditional data warehouse ot relational database through reliable, scalable and distributed computing.
Spark
is a new tool(2010) that can run directly on Hadoop Distributed File System ( HDFS ), inside MapReduce and alongside MapReduce on the same cluster.
unique feature is ability to perform in-memory computations.
it allows data to be catched in memory, thus eliminating Hadoop’s hard disk over head limitation for iterative tasks.
100 times faster than MapReduce when the data can fit in the memory and up to 10 times faster when data resides on the hard disk.
Spark supports streaming data and more complex analytics such as graph algorithms and machine learning such as spark SQL, machine learning.
Cloud Computing
Cloud Computing allows for the use of remote vertical or horizontally scaled serves for data storage and analysis. (microsoft Azure)