introduction Flashcards

Question 1

Q

Question 2

Q

What is Big Data?

Answer

A

Refers to large, complex datasets that exceed the processing capabilities of traditional tools.

Question 3

Q

What are the 4 V’s of Big Data?

Answer

A

Volume: Large quantity of data; Velocity: Speed of data production, consumption, and analysis; Variety: Structured, unstructured, and multimedia data; Veracity: Trustworthiness and quality of data.

Question 4

Q

How can Big Data be referred to as a noun and adjective?

Answer

A

As a noun: vague boundary between normal and big data; As an adjective: specific meaning (e.g., Big Data tools, architecture).

Question 5

Q

Why is there hype around Big Data?

Answer

A

Growth from new data sources; Opportunities for insights; Smarter applications like Google Translate.

Question 6

Q

What are some success stories of Big Data applications?

Answer

A

Crime prevention, healthcare, finance, astronomy, sports injury prevention.

Question 7

Q

What are challenges in Big Data acquisition?

Answer

A

Selecting valuable data, filtering, and metadata collection.

Question 8

Q

What are challenges in Big Data processing?

Answer

A

Parallelization, fault tolerance, scalability.

Question 9

Q

What frameworks address Big Data processing challenges?

Answer

A

Hadoop and Spark.

Question 10

Q

What are the three main scenarios for data processing solutions?

Answer

A

Analytics (batch), Interactive (near real-time), Streaming (near real-time).

Question 11

Q

What is a Data Lake?

Answer

A

A centralized repository for raw data in various formats, processed as needed.

Question 12

Q

What are NoSQL/NewSQL DBMSs designed for?

Answer

A

Scalability and distributed environments.

Question 13

Q

What are the types of analytics in Big Data?

Answer

A

Descriptive: Insights into past events; Diagnostic: Explains why events occurred; Predictive: Anticipates future trends; Prescriptive: Recommends actions to leverage or mitigate trends.

Question 14

Q

What roles exist in Big Data careers?

Answer

A

Data analysts, architects, engineers, scientists.

Question 15

Q

What skills are required for Big Data careers?

Answer

A

Programming, data management, statistical analysis, domain expertise.

Question 16

Q

What are the two types of scaling in Big Data infrastructure?

Answer

A

Scale-Up (Vertical) and Scale-Out (Horizontal).

Question 17

Q

What is SMP architecture and its limitations?

Answer

A

Symmetric MultiProcessing with bottlenecks due to shared resources and limited scalability.

Question 18

Q

What is MPP architecture and its challenges?

Answer

A

Massively Parallel Processing with vendor lock-in and limited scalability.

Question 19

Q

What is cluster architecture and its advantages/trade-offs?

Answer

A

Unlimited scalability without vendor lock-in; slower interconnect speed compared to MPP.

Question 20

Q

What are the pros and cons of commodity hardware in clusters?

Answer

A

Pros: Cost-effective and scalable; Cons: Requires handling failures.

Question 21

Q

What is Lambda Architecture?

Answer

A

Combines Hot Path (real-time processing) and Cold Path (delayed but accurate processing).

Question 22

Q

What is Kappa Architecture?

Answer

A

Unified stream processing where all events are processed in real-time.

Question 23

Q

Who introduced MapReduce and what is it used for?

Answer

A

Introduced by Dean & Ghemawat at Google; used for processing large datasets using Map and Reduce functions.

Question 24

Q

What does the Map function do in MapReduce?

Answer

A

Processes key-value pairs to generate intermediate key-value pairs.

Question 25

Q

What does the Reduce function do in MapReduce?

Answer

A

Aggregates intermediate values associated with the same key.

Question 26

Q

How does Hadoop MapReduce handle execution steps?

Answer

A

Input Splitting -> Mapping -> Shuffling & Sorting -> Reducing.

Question 27

Q

How does MapReduce handle word count as an example task?

Answer

A

Map emits word-key pairs; Reduce aggregates counts for each word.