Week 2: Foundation For Big Data Systems Flashcards

1
Q

What is a distributed file system

A
  • A physically distributed implementation of the traditional file system
  • Allowing users to manipulate organise and share data seamlessly
  • regardless of its actual location on the network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 benefits of data replication

A
  1. Makes the system more fault tolerant
  2. Helps with scaling the access to this data by many users
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What 3 does distributed file systems provide?

A
  1. Data scalability
  2. Fault tolerance
  3. High concurrency through partitioning and replication of data on many nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the frequency of updates in big data systems

A
  • Written once
  • updates maintained as additional data sets over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parallel computing

A
  • computation needing more than one node or parallel processing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

commodity clusters

A
  • affordable parallel computers with an average number of computing nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 commodity cluster cons

A
  1. Not as powerful as traditional parallel computers
  2. often built out of less specialised nodes
  3. higher potential for partial failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Apache Hadoop

A

A framework that allows distributed processing of large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 parts of the Hadoop eco system

A
  1. Hadoop Distributed File System
  2. Hadoop Yarn
  3. Hadoop Map Reduce
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Hadoop HDFS?

A

A distributed file system that provides high-throughput access to application data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Hadoop YARN?

A

A framework for job scheduling and cluster resource mangement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Hadoop MapReduce?

A

A YARN based system for parallel procssing of large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly