Week 2: Foundation For Big Data Systems Flashcards

Question 1

Q

What is a distributed file system

Answer

A

A physically distributed implementation of the traditional file system
Allowing users to manipulate organise and share data seamlessly
regardless of its actual location on the network

Question 2

Q

2 benefits of data replication

Answer

A

Makes the system more fault tolerant
Helps with scaling the access to this data by many users

Question 3

Q

What 3 does distributed file systems provide?

Answer

A

Data scalability
Fault tolerance
High concurrency through partitioning and replication of data on many nodes

Question 4

Q

What is the frequency of updates in big data systems

Answer

A

Written once
updates maintained as additional data sets over time

Question 5

Q

Parallel computing

Answer

A

computation needing more than one node or parallel processing

Question 6

Q

commodity clusters

Answer

A

affordable parallel computers with an average number of computing nodes

Question 7

Q

3 commodity cluster cons

Answer

A

Not as powerful as traditional parallel computers
often built out of less specialised nodes
higher potential for partial failures

Question 8

Q

What is Apache Hadoop

Answer

A

A framework that allows distributed processing of large data sets

Question 9

Q

What are the 3 parts of the Hadoop eco system

Answer

A

Hadoop Distributed File System
Hadoop Yarn
Hadoop Map Reduce

Question 10

Q

What is Hadoop HDFS?

Answer

A

A distributed file system that provides high-throughput access to application data

Question 11

Q

What is Hadoop YARN?

Answer

A

A framework for job scheduling and cluster resource mangement

Question 12

Q

What is Hadoop MapReduce?

Answer

A

A YARN based system for parallel procssing of large data sets

Week 2: Foundation For Big Data Systems Flashcards

(12 cards)