Chapter 1: Introduction Flashcards

1
Q

Big Data Lifecycle

A
  1. Generation
  2. Collection
  3. Storage
  4. Processing
  5. Analysis
  6. Visualization
  7. Disposal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Failure of Traditional DB in Handling Big Data

A
  • Exponential increase in volume
  • Majority of semi-structured/unstructured data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 V’s of Big Data

A
  • Volume: the size of the data
  • Velocity: the rate at which data is generated and processed
  • Variety: the format of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Human-Generated Data

A
  • Data generated as an outcome of interactions of humans with the machines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Machine-Generated Data

A
  • Data generated by computer applications or hardware devices without active human intervention
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Types of Data

A
  • Structured
  • Unstructured
  • Semi-structured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hadoop

A
  • Open-source framework to support processing of large data sets
  • Core components: HDFS, Hadoop common, and MapReduce
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hadoop Distributed File System (HDFS)

A
  • Designed to store large data sets with streaming access pattern running on low-cost community hardware
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MapReduce

A
  • Uses divide and conquer
  • Scalable, reliable, and fault-tolerant
  • Used in parallel and distributed computing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Challenges with Big Data (Textbook)

A
  • Heterogeneity and incompleteness
  • Volume and velocity
  • Storage
  • Privacy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly