Big Data Flashcards
What is big data?
Big data is a collection of large and complex data sets which are difficult to process using common database management tools or traditional data processing applications.
What are the four dimensions (V’s) of Big Data
Volume, Variety, Veracity, Velovity
What do the V’s imply?
Volume - Data at Rest (terabytes to exabytes of existing data to process)
Velocity - Data in Motion (Streaming data, milliseconds to seconds to respond)
Variety - Data in Many Forms (Structured, unstructured, text, multimedia)
Veracity - Data in Doubt (Uncertainty due to data inconsistency, incompleteness, ambiguity etc.)
What does scaling mean?
Scaling is the ability of the system to adapt to increased demands in terms of processing
What are the two types of scaling? What do they mean?
Horizontal Scaling
- involves distributing work load across many servers
Vertical Scaling
- involves installing more processors, more memory and faster hardware typically within a single server
Name one advantage and one disadvantage of
a. horizontal scaling
b. vertical scaling
Horizontal
adv. - increases performance in small steps as needed
disadv. - limited no. of software are available that can handle horizontal scaling
Vertical
adv. - easy to manage and install hardware within a single machine
disadv. - requires substantial financial investment
Give examples of horizontal and vertical scaling platforms.
Horizontal
- peer to peer networks
- apache hadoop
- apache spark
Vertical
- HPC
- multicore processors
- GPU
What is the strategy that horizontal scaling focuses on?
Divide and Conquer Strategy
partition work in the beginning
let separate servers do the divided work
combine in the end for the result
Which one is better? Horizontal or vertical scaling?
It highly depends on your requirements