introduction Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Big Data?

A

Refers to large, complex datasets that exceed the processing capabilities of traditional tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 V’s of Big Data?

A

Volume: Large quantity of data; Velocity: Speed of data production, consumption, and analysis; Variety: Structured, unstructured, and multimedia data; Veracity: Trustworthiness and quality of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can Big Data be referred to as a noun and adjective?

A

As a noun: vague boundary between normal and big data; As an adjective: specific meaning (e.g., Big Data tools, architecture).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is there hype around Big Data?

A

Growth from new data sources; Opportunities for insights; Smarter applications like Google Translate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some success stories of Big Data applications?

A

Crime prevention, healthcare, finance, astronomy, sports injury prevention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are challenges in Big Data acquisition?

A

Selecting valuable data, filtering, and metadata collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are challenges in Big Data processing?

A

Parallelization, fault tolerance, scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What frameworks address Big Data processing challenges?

A

Hadoop and Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three main scenarios for data processing solutions?

A

Analytics (batch), Interactive (near real-time), Streaming (near real-time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Data Lake?

A

A centralized repository for raw data in various formats, processed as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are NoSQL/NewSQL DBMSs designed for?

A

Scalability and distributed environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the types of analytics in Big Data?

A

Descriptive: Insights into past events; Diagnostic: Explains why events occurred; Predictive: Anticipates future trends; Prescriptive: Recommends actions to leverage or mitigate trends.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What roles exist in Big Data careers?

A

Data analysts, architects, engineers, scientists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What skills are required for Big Data careers?

A

Programming, data management, statistical analysis, domain expertise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two types of scaling in Big Data infrastructure?

A

Scale-Up (Vertical) and Scale-Out (Horizontal).

17
Q

What is SMP architecture and its limitations?

A

Symmetric MultiProcessing with bottlenecks due to shared resources and limited scalability.

18
Q

What is MPP architecture and its challenges?

A

Massively Parallel Processing with vendor lock-in and limited scalability.

19
Q

What is cluster architecture and its advantages/trade-offs?

A

Unlimited scalability without vendor lock-in; slower interconnect speed compared to MPP.

20
Q

What are the pros and cons of commodity hardware in clusters?

A

Pros: Cost-effective and scalable; Cons: Requires handling failures.

21
Q

What is Lambda Architecture?

A

Combines Hot Path (real-time processing) and Cold Path (delayed but accurate processing).

22
Q

What is Kappa Architecture?

A

Unified stream processing where all events are processed in real-time.

23
Q

Who introduced MapReduce and what is it used for?

A

Introduced by Dean & Ghemawat at Google; used for processing large datasets using Map and Reduce functions.

24
Q

What does the Map function do in MapReduce?

A

Processes key-value pairs to generate intermediate key-value pairs.

25
Q

What does the Reduce function do in MapReduce?

A

Aggregates intermediate values associated with the same key.

26
Q

How does Hadoop MapReduce handle execution steps?

A

Input Splitting -> Mapping -> Shuffling & Sorting -> Reducing.

27
Q

How does MapReduce handle word count as an example task?

A

Map emits word-key pairs; Reduce aggregates counts for each word.