Infrastructure & Architecture Flashcards

1
Q

scale-up (vertical scaling)

A

upgrade existing machine (SMP,MPP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

scale-out (horizontal scaling)

A

adding more machines to network, unlimited scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Symmetric Multi-processing (SMP)

A

traditional PC, multiple processors share same RAM and storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Massively Parallel Processing (MPP)

A

Each processor has its own dedicated RAM and storage (vendor lock-in)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster Architecture

A

many computers connected to work as a single system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how are nodes connected in cluster

A

usually gigabit ethernet, 8-64 per rack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Only pro of MPP over cluster

A

faster message passing between nodes. Ideal for single, vertical solutions like data warehousing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

commodity hardware

A

standardized, market priced hardware

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

distributed computing

A

tasks split into smaller units processed simultaneously across machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

challenges of distributed computing (4) (ARFA)

A

Assigning SPLIT tasks,
Resource Allocation, Fault-tolerance, Aggregating (results)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

solution to challenges of distributed computing

A

Use a framework to hide complexity of distributed computing from developers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Grid computing

A
  • Collect computer resources from multiple locations.
  • Each node perform different task
  • Multi-purposed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is HPC?

A

high-performance computing (GPU intensive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is NIST reference architecture for?

A

How a Big Data system should be designed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

5 Components of NIST Reference Architecture

A
  1. Big Data Framework Provider
    - processing
    - storage
    - networking
  2. Data Provider
  3. Application provider
  4. Data Consumer
  5. System Orchestrator
    - integrate components
    - meet goals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens in application component?

A

Programs are written to process data from collection to visualization and user access

17
Q

What happens in Data Provider component besides just data collection? (4) (Suck My Ass!)

A
  1. Scrub sensitive info
  2. Create Metadata
    - data provenance
    - access rights
    - usage policies
  3. Enforce access and authorizations
18
Q

4 components professor adds to NIST

A
  1. Analytical data store
  2. Analysis and reporting
  3. Real-time message
    ingestion (and buffer)
  4. Stream processing
19
Q

Batch vs Stream

A

Running analytical algorithms on:

  • Large amounts of
    stored data
  • Continuously running,
    potentially infinite
    amount of data as soon
    as its collected
20
Q

Lamda Architecture

A

Two pipelines:
1. Cold path (batch)
2. Hot path (real-time)

21
Q

Kappa Architecture

A

One hot path (real-time)

22
Q
A