Big data Flashcards

(9 cards)

1
Q

What are the 3V of big data?

A
  • Volume
  • Variety
  • Velocity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Map Reduce Phases

A
  • Input
  • Map
  • Shuffle & Sort
  • Reduce
  • Output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the NameNode do?

A

Meta data management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do DataNodes do?

A

Store data of file, split into small chunks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the JobTracker do?

A
  • job scheduling, resource management
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do TaskTrackers do?

A

Execution of map/reduce tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to handle TaskTracker failure?

A
  • detect failure via heartbeats
  • re-execute all in-progress tasks on failed node
  • re-execute finished map tasks of running MapReduce Jobs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to handle JobTracker failure?

A
  • single point of failure, resume from execution log
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a straggler?

A

Slowly running task on a task tracker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly