Infrastructure & Architecture Flashcards
scale-up (vertical scaling)
upgrading existing machine (ram, processor, storage)
scale-out (horizontal scaling)
adding more machines to network, unlimited scaling
Symmetric Multi-processing (SMP)
traditional notebooks/desktops, multiple processors share same memory and storage (bus bottleneck)
Massively Parallel Processing (MPP)
Each processor has its own dedicated memory and storage (vendor lock-in)
Cluster architecture
many computers connected to work as a single system. (Like MPP, but this implies multiple processors per node).
how are nodes connected in cluster
usually gigabit ethernet, 8-64 per rack, racks connected by another level of network or switch
Only pro of MPP over cluster
faster message passing between nodes. Ideal for single, vertical solutions like data warehousing \
commodity hardware
standardized, market priced hardware cheaper than proprietary solutions
distributed computing
tasks split into smaller units processed simultaneously across machines
challenges of distributed computing
splitting or assigning tasks/parallelization, resource allocation, fault-tolerance, aggregating results
solutions to challenges of distributed computing
use big data framework to hide complexity of distributed computing from developers
Grid computing
collection of computer resources from multiple locations, each node perform different task, commonly used for variety of purposes
what is HPC
high-performance computing, use of GPU increasing
NIST reference architecture definition
National Institute of Standards and Technology (NIST) recommendation for big data system architecture
NIST Reference Architecture components
- Big Data Framework Provider (process, storage, network)
- Data Provider
- Application provider (ex: bi analytics, ML)
- Data Consumer
- System Orchestrator (integrate components, meet goals, allocate resources)