Big Data Flashcards

Question 1

Q

Elements that conform Big Data

Answer

A

Velocity.
Volume.
Variety.
Veracity.
Value.

Question 2

Q

Velocity

Answer

A

is the speed at which data accumulates, it generates really fast and never stops.

Question 3

Q

Volume

Answer

A

is the scale of the data or the increase in the amount of data stored.

Question 4

Q

Variety

Answer

A

is the diversity of the data, data can be structured, semi-structured, and unstructured. Variety means different sources of which data comes from like machines, people, and processes.

Question 5

Q

Veracity

Answer

A

it’s the quality, origin of data, and its conformity to facts and accuracy. With large amount of data obtain and accumulated, it needs to be classified as real, false, accurate or reliable.

Question 6

Q

Value

Answer

A

is our ability and need to turn data into value. Value can be profit, medical, or social benefit.

Question 7

Q

Big Data Processing Tools

Answer

A

Hadoop.
Hive.
Spark.

Question 8

Q

Hadoop

Answer

A

It is a Java-based (Text form) open-source framework, allows distributed storage and processing of large datasets across clusters of computers.

Question 9

Q

Hadoop Benefits

Answer

A

You can incorporate into the system emerging data formats like streaming audio, video, social media, etc. Along with structured, semi-structures, and unstructured data.
Provides real-time access for stakeholders to the data.
Optimize and streamline costs in your enterprise data warehouse by consolidating data across the organization and moving “cold” data (data that is not frequent use) to a Hadoop-based system.

Question 10

Q

Hadoop Distributed File System (HDFS)

Answer

A

Storage system for big data.

Question 11

Q

Hadoop Distributed File System (HDFS) capacities

Answer

A

HDFS provides reliable big data storage by partitioning files over multiples notes.
It splits large files across multiple computers, allowing parallel access to them (different specifics spaces to access data).
It replicates (copies) file blocks on different nodes to prevent data loss.
Fast recovery from hardware failures, HDFS can detect faults and automatically recover.
Access to streaming data (videos).

Question 12

Q

Hive

Answer

A

It is an open-source data warehouse software for reading, writing, and managing large data that are stored directly on Hadoop or other data storage system.

Question 13

Q

Spark capacities

Answer

A

Has in-memory processing which increases the speed of computations.
It has interfaces for major programing interfaces like Java, Python, R, and SQL.
It can access data in a large variety of data sources, including HDFS and Hive.
It can process streaming data fast.
It can do complex analytics in real-time.

Question 14

Q

Spark

Answer

A

A general-purpose data processing engine designed to extract and process large volumes of data for a wide range of application in real-time.