Infrastructure & Architecture Flashcards

1
Q

scale-up (vertical scaling)

A

upgrading existing machine (ram, processor, storage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

scale-out (horizontal scaling)

A

adding more machines to network, unlimited scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Symmetric Multi-processing (SMP)

A

traditional notebooks/desktops, multiple processors share same memory and storage (bus bottleneck)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Massively Parallel Processing (MPP)

A

Each processor has its own dedicated memory and storage (vendor lock-in)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster architecture

A

many computers connected to work as a single system. (Like MPP, but this implies multiple processors per node).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how are nodes connected in cluster

A

usually gigabit ethernet, 8-64 per rack, racks connected by another level of network or switch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Only pro of MPP over cluster

A

faster message passing between nodes. Ideal for single, vertical solutions like data warehousing \

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

commodity hardware

A

standardized, market priced hardware cheaper than proprietary solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

distributed computing

A

tasks split into smaller units processed simultaneously across machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

challenges of distributed computing

A

splitting or assigning tasks/parallelization, resource allocation, fault-tolerance, aggregating results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

solutions to challenges of distributed computing

A

use big data framework to hide complexity of distributed computing from developers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Grid computing

A

collection of computer resources from multiple locations, each node perform different task, commonly used for variety of purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is HPC

A

high-performance computing, use of GPU increasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

NIST reference architecture definition

A

National Institute of Standards and Technology (NIST) recommendation for big data system architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NIST Reference Architecture components

A
  1. Big Data Framework Provider (process, storage, network)
  2. Data Provider
  3. Application provider (ex: bi analytics, ML)
  4. Data Consumer
  5. System Orchestrator (integrate components, meet goals, allocate resources)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Big Data Application Provider definition

A

Component for which programs must be written to process the data (from collection to visualization and user access)

17
Q

4 jobs of data provider component

A

besides just collecting data:
scrubbing sensitive information
create metadata describing data source, access rights, usage policies
enforce access/authorizations

18
Q

4 extra components of big data architecture that professor describes, not explicitly in NIST

A

Analytical data store
Analysis and reporting

Real-time message ingestion (and buffer)
stream processing

19
Q

batch vs stream

A

analytical algorithms:
launched over large amounts of stored data
vs
continuously running over potentially infinite data as soon as its collected

20
Q

Lamda Architecture

A

Two pipelines, cold path for batch, hot path for real-time/stream

21
Q

Kappa Architecture

A

One hot path for real-time/stream

22
Q
A