Infrastructure & Architecture Flashcards

1
Q

What is scaling in the context of big data infrastructure?

A

Scaling refers to increasing the capacity of a system to handle more data or higher loads. It can be done through vertical scaling (adding resources to a single machine) or horizontal scaling (adding more machines).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is vertical scaling (scale-up)?

A

Vertical scaling involves adding more resources like processors, RAM, and disks to a single machine or upgrading to a more powerful server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is horizontal scaling (scale-out)?

A

Horizontal scaling involves adding more machines to a system to increase capacity. This approach supports distributed computing and unlimited scalability but depends on network speed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Symmetric MultiProcessing (SMP) architecture?

A

SMP architecture is where multiple processors share the same RAM, I/O bus, and disks. It is common in traditional workstations but has physical and speed limitations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Massively Parallel Processing (MPP) architecture?

A

MPP architecture involves a shared-nothing system where each module has its own RAM and disks. It is used for tasks split into independent processes and is common in data warehousing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a cluster architecture in big data?

A

A cluster is a group of connected computers (nodes) working together to perform as a single system. It offers scalability, is connected via fast LAN, and avoids vendor lock-in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the pros of using MPP architecture?

A

MPP architecture has high-speed message passing, specialized hardware/software, better reliability, and is ideal for single, vertical solutions like data warehousing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the pros of using cluster architecture?

A

Cluster architecture supports infinite scaling, is cheaper to set up, uses commodity hardware, avoids vendor lock-in, and is ideal for varied applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is grid computing?

A

Grid computing involves using distributed computer resources from multiple locations for a common goal. It differs from clusters in that nodes perform different tasks and are geographically dispersed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a data lake?

A

A data lake is a central repository for storing raw data in its original format, which is processed only when needed. It supports various data formats, including structured, semi-structured, and unstructured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the NIST’s reference architecture for big data?

A

The NIST’s reference architecture outlines the ecosystem of tools and hardware needed for big data operations, defining roles like Big Data Framework Provider, Application Provider, and System Orchestrator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the role of a Big Data Framework Provider?

A

A Big Data Framework Provider supplies the general resources or services for creating big data applications, including infrastructure, data management, and processing frameworks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the role of a System Orchestrator in big data architecture?

A

A System Orchestrator integrates application activities into a system, configures resources, manages workloads, and ensures quality requirements are met.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Lambda architecture?

A

The Lambda architecture processes incoming data through two paths: a hot path for real-time processing and a cold path for more accurate but delayed batch processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Kappa architecture?

A

The Kappa architecture processes all data using a single stream processing system, simplifying architecture by eliminating the need for separate batch processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do Lambda and Kappa architectures differ?

A

Lambda architecture uses two separate paths (batch and streaming) while Kappa architecture relies on a single path for stream processing, aiming for unified batch and streaming capabilities.