Big Data Flashcards

1
Q

What is big data?

A

Big data is a collection of large and complex data sets which are difficult to process using common database management tools or traditional data processing applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four dimensions (V’s) of Big Data

A

Volume, Variety, Veracity, Velovity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do the V’s imply?

A

Volume - Data at Rest (terabytes to exabytes of existing data to process)
Velocity - Data in Motion (Streaming data, milliseconds to seconds to respond)
Variety - Data in Many Forms (Structured, unstructured, text, multimedia)
Veracity - Data in Doubt (Uncertainty due to data inconsistency, incompleteness, ambiguity etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does scaling mean?

A

Scaling is the ability of the system to adapt to increased demands in terms of processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of scaling? What do they mean?

A

Horizontal Scaling
- involves distributing work load across many servers

Vertical Scaling
- involves installing more processors, more memory and faster hardware typically within a single server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name one advantage and one disadvantage of

a. horizontal scaling
b. vertical scaling

A

Horizontal

adv. - increases performance in small steps as needed
disadv. - limited no. of software are available that can handle horizontal scaling

Vertical

adv. - easy to manage and install hardware within a single machine
disadv. - requires substantial financial investment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give examples of horizontal and vertical scaling platforms.

A

Horizontal

  • peer to peer networks
  • apache hadoop
  • apache spark

Vertical

  • HPC
  • multicore processors
  • GPU
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the strategy that horizontal scaling focuses on?

A

Divide and Conquer Strategy
partition work in the beginning
let separate servers do the divided work
combine in the end for the result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which one is better? Horizontal or vertical scaling?

A

It highly depends on your requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly