Infrastructure & Architecture Flashcards

Question 1

Q

What is scaling in the context of big data infrastructure?

Answer

A

Scaling refers to increasing the capacity of a system to handle more data or higher loads. It can be done through vertical scaling (adding resources to a single machine) or horizontal scaling (adding more machines).

Question 2

Q

What is vertical scaling (scale-up)?

Answer

A

Vertical scaling involves adding more resources like processors, RAM, and disks to a single machine or upgrading to a more powerful server.

Question 3

Q

What is horizontal scaling (scale-out)?

Answer

A

Horizontal scaling involves adding more machines to a system to increase capacity. This approach supports distributed computing and unlimited scalability but depends on network speed.

Question 4

Q

What is Symmetric MultiProcessing (SMP) architecture?

Answer

A

SMP architecture is where multiple processors share the same RAM, I/O bus, and disks. It is common in traditional workstations but has physical and speed limitations.

Question 5

Q

What is Massively Parallel Processing (MPP) architecture?

Answer

A

MPP architecture involves a shared-nothing system where each module has its own RAM and disks. It is used for tasks split into independent processes and is common in data warehousing.

Question 6

Q

What is a cluster architecture in big data?

Answer

A

A cluster is a group of connected computers (nodes) working together to perform as a single system. It offers scalability, is connected via fast LAN, and avoids vendor lock-in.

Question 7

Q

What are the pros of using MPP architecture?

Answer

A

MPP architecture has high-speed message passing, specialized hardware/software, better reliability, and is ideal for single, vertical solutions like data warehousing.

Question 8

Q

What are the pros of using cluster architecture?

Answer

A

Cluster architecture supports infinite scaling, is cheaper to set up, uses commodity hardware, avoids vendor lock-in, and is ideal for varied applications.

Question 9

Q

What is grid computing?

Answer

A

Grid computing involves using distributed computer resources from multiple locations for a common goal. It differs from clusters in that nodes perform different tasks and are geographically dispersed.

Question 10

Q

What is a data lake?

Answer

A

A data lake is a central repository for storing raw data in its original format, which is processed only when needed. It supports various data formats, including structured, semi-structured, and unstructured.

Question 11

Q

What is the NIST’s reference architecture for big data?

Answer

A

The NIST’s reference architecture outlines the ecosystem of tools and hardware needed for big data operations, defining roles like Big Data Framework Provider, Application Provider, and System Orchestrator.

Question 12

Q

What is the role of a Big Data Framework Provider?

Answer

A

A Big Data Framework Provider supplies the general resources or services for creating big data applications, including infrastructure, data management, and processing frameworks.

Question 13

Q

What is the role of a System Orchestrator in big data architecture?

Answer

A

A System Orchestrator integrates application activities into a system, configures resources, manages workloads, and ensures quality requirements are met.

Question 14

Q

What is the Lambda architecture?

Answer

A

The Lambda architecture processes incoming data through two paths: a hot path for real-time processing and a cold path for more accurate but delayed batch processing.

Question 15

Q

What is the Kappa architecture?

Answer

A

The Kappa architecture processes all data using a single stream processing system, simplifying architecture by eliminating the need for separate batch processing.

Question 16

Q

How do Lambda and Kappa architectures differ?

Answer

Study These Flashcards

A

Lambda architecture uses two separate paths (batch and streaming) while Kappa architecture relies on a single path for stream processing, aiming for unified batch and streaming capabilities.

Infrastructure & Architecture Flashcards

(16 cards)