Big Data Flashcards

1
Q

Elements that conform Big Data

A
  • Velocity.
  • Volume.
  • Variety.
  • Veracity.
  • Value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Velocity

A

is the speed at which data accumulates, it generates really fast and never stops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Volume

A

is the scale of the data or the increase in the amount of data stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variety

A

is the diversity of the data, data can be structured, semi-structured, and unstructured. Variety means different sources of which data comes from like machines, people, and processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Veracity

A

it’s the quality, origin of data, and its conformity to facts and accuracy. With large amount of data obtain and accumulated, it needs to be classified as real, false, accurate or reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Value

A

is our ability and need to turn data into value. Value can be profit, medical, or social benefit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Big Data Processing Tools

A
  • Hadoop.
  • Hive.
  • Spark.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hadoop

A

It is a Java-based (Text form) open-source framework, allows distributed storage and processing of large datasets across clusters of computers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hadoop Benefits

A
  • You can incorporate into the system emerging data formats like streaming audio, video, social media, etc. Along with structured, semi-structures, and unstructured data.
  • Provides real-time access for stakeholders to the data.
  • Optimize and streamline costs in your enterprise data warehouse by consolidating data across the organization and moving “cold” data (data that is not frequent use) to a Hadoop-based system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hadoop Distributed File System (HDFS)

A

Storage system for big data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hadoop Distributed File System (HDFS) capacities

A
  • HDFS provides reliable big data storage by partitioning files over multiples notes.
  • It splits large files across multiple computers, allowing parallel access to them (different specifics spaces to access data).
  • It replicates (copies) file blocks on different nodes to prevent data loss.
  • Fast recovery from hardware failures, HDFS can detect faults and automatically recover.
  • Access to streaming data (videos).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hive

A

It is an open-source data warehouse software for reading, writing, and managing large data that are stored directly on Hadoop or other data storage system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Spark capacities

A
  • Has in-memory processing which increases the speed of computations.
  • It has interfaces for major programing interfaces like Java, Python, R, and SQL.
  • It can access data in a large variety of data sources, including HDFS and Hive.
  • It can process streaming data fast.
  • It can do complex analytics in real-time.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Spark

A

A general-purpose data processing engine designed to extract and process large volumes of data for a wide range of application in real-time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly