Module 1 Flashcards

Question 1

Q

Name the four V’s of Big Data and explain what they represent

Answer

A

Velocity, Volume, Variety and Veracity
Features of data set that help characterize big data

Big data is tradionally defined as having high velocity, volume, variety

Question 2

Q

Define Big Data

Answer

A

Massive data sets (volume) that cannot be maintained using traditional data processing techniques. They grow at an incredible rate (velocity) and are complex in nature as they can store structured, semi-structured or unstructured data (variety).

Key characteristics of big data include substantial volume, high velocity, and diverse variety.

Question 3

Q

Explain Variety from the 3 V’s of Big Data

Answer

A

Big data is complex, having high variety, meaning it can contain structure, semi-structured or unstructured data (heterogenous).

This arises as consequence from the ever growing number of data sources available (e.g. processes, sensors, mobile equipments, people, etc.)

Question 4

Q

Explain Volume from the 3 V’s of Big Data

Answer

A

Big data, per definition, involves massive amounts of data that are ever growing.

Question 5

Q

Explain Velocity from the 3 V’s of Big DataExplain variety in terms of Big Data

Answer

A

Big Data needs to be able to ingest and process data continously at a high rate, sometime in real-time or in near real time.

Question 6

Q

Explain Veracity, which sometimes is said to be the fourth V of Big Data

Answer

A

As we rely on such massive amount of data, we need ensure this data is as much trustworthy as possible. Good quality of data is accurate, complete and unambiguous.

Question 7

Q

Describe the Big Data life cycle

Question 8

Q

Explain the difference between Linear and Parallel processing when solving a problem

Answer

A

In Linear Processing of a problem, solution is broken down into a set of sequential instructions.
In Parallel Processing of a problem, solution is broken down in a set of instructions with each assigned to specific node from the cluster.

Question 9

Q

What is a node?
What is a cluster and its purpose?

Answer

A

Node is an individual compute or server that has compute and storage capacity.

Cluster is a collection of nodes interlinked, allowing it to perform parallel processing.

Question 10

Q

Explain the difference between scaling up (vertical) and out (horizontal)

Answer

A

Scaling up (vertically) means to increase compute and/or storage capacity in a single node. Because of three V’s of big data, scaling up might not be a sustainable solution.

Scaling out (horizontally) means adding more nodes to the cluster, ultimately increase the cluster’s storande and/or compute capacity.

Question 11

Q

How is Parallel Processing superior to Linear Processing in case of errors during Big Data problems?

Answer

A

In case of errors during calculations, linear processing requires the whole set of instructions to be executed again, while in parallel processing, only the node that failed needs to be executed again.

Question 12

Q

How Linear and Parallel processing compare in terms of node requirements and flexibility?

Answer

A

Storage and compute requirements are lower in the latter as task has been broken down into a set of smaller instructions compared to doing everything in one node.

The latter is also more flexible as nodes can be added or removed from the cluster dependeding on the complexity of the task.

Question 13

Q

What are embarrassingly parallel calculations?

Answer

A

Workloads that can easily be divided and run independently. If one workload fails, it has no impact on the other workloads and is easily rerun.

Question 14

Q

Scaling in the context of big data refers to ____

Answer

A

… adding more computing resources to handle increased data volume and processing demands

Question 15

Q

What are the types of data associated with Big Data?

Answer

A

structured
semi-structured
unstructured

Question 16

Q

Why is parallel processing important in big data?

Answer

A

It reduces processing times.

Question 17

Q

What are the differences between structured, unstructured and semi-structued data?

Question 18

Q

Question 19

Q

Brainscape's Knowledge GenomeTM

Module 1 Flashcards

Brainscape's Knowledge Genome^TM