Chapter 1: Introduction Flashcards
1
Q
Big Data Lifecycle
A
- Generation
- Collection
- Storage
- Processing
- Analysis
- Visualization
- Disposal
2
Q
Failure of Traditional DB in Handling Big Data
A
- Exponential increase in volume
- Majority of semi-structured/unstructured data
3
Q
3 V’s of Big Data
A
- Volume: the size of the data
- Velocity: the rate at which data is generated and processed
- Variety: the format of the data
4
Q
Human-Generated Data
A
- Data generated as an outcome of interactions of humans with the machines
5
Q
Machine-Generated Data
A
- Data generated by computer applications or hardware devices without active human intervention
6
Q
Types of Data
A
- Structured
- Unstructured
- Semi-structured
7
Q
Hadoop
A
- Open-source framework to support processing of large data sets
- Core components: HDFS, Hadoop common, and MapReduce
8
Q
Hadoop Distributed File System (HDFS)
A
- Designed to store large data sets with streaming access pattern running on low-cost community hardware
9
Q
MapReduce
A
- Uses divide and conquer
- Scalable, reliable, and fault-tolerant
- Used in parallel and distributed computing
10
Q
Challenges with Big Data (Textbook)
A
- Heterogeneity and incompleteness
- Volume and velocity
- Storage
- Privacy