Systems and Architecture Flashcards
Define Round Robin
Distributes client requests across a group of servers. Going down the list of servers in the group, the round‑robin load balancer forwards a client request to each server in turn. When it reaches the end of the list, the load balancer loops back and goes down the list again
Define Load Balancer
A load balancer sits between the client and the server farm accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various methods
Give an example of DNS load balancing
A company can have a single domain name and four absolutely identical company home pages on four physical servers based in Europe, Asia, North America, and Africa.
Advantage of DNS Round Robin
It is simple to implement
Troubleshoot one load balancer failure
If one load balancer fails, the secondary picks up the failure and becomes active. They have a heartbeat link between them that monitors status.
Troubleshoot all load balancer failures
If all load balancers fail (or are accidentally misconfigured), servers down-stream are knocked offline until the problem is resolved, or you manually route around them.
What is the Slash Dot Effect?
Also known as slashdotting, occurs when a popular website links to a smaller website, causing a massive increase in traffic. This overloads the smaller site, causing it to slow down or even temporarily become unavailable.
Active-Active High Availability Cluster
An active-active cluster is typically made up of at least two nodes, both actively running the same kind of service simultaneously. The main purpose of an active-active cluster is to achieve load balancing. Load balancing distributes workloads across all nodes in order to prevent any single node from getting overloaded. Because there are more nodes available to serve, there will also be a marked improvement in throughput and response times.
Active-Passive High Availability Cluster
An active-passive cluster consists of at least two nodes.. Not all nodes are active. The passive (a.k.a. failover) server serves as a backup that’s ready to take over as soon as the active (a.k.a. primary) server gets disconnected or is unable to serve.
Define SATA
Serial Advanced Technology Attachment : is a serial communication method and began as an enhancement to the original ATA specifications, also known as IDE and, today, PATA
SCSI
The Small Computer System Interface
SAS
Serial Attached SCSI: is a type of SCSI that uses serial operation rather than parallel as the original SCSI did.
IDE
Integrated development environment
PATA
Parallel Advanced Technology Attachment
Define NAS
Network Attached Storage:
- Name given to a dedicated storage unit that can be directly attached to the network.
- Transfers data as files.
- Transfers files over local area network. (i.e. ethernet, wireless, fiber)
Define SAN
Storage Area Network
- It is a network that consists of storage units.
- Transfers data as blocks.
- Transfers blocks over wide area network using FCIP or iSCSI media.
FCIP
Fibre Channel over IP
iSCSI
Internet Small Computer Systems Interface
- an IP-based networking storage standard method of encapsulating SCSI commands within IP packets.
- Allows you to use the same network for storage that you use for the balance of the network.
- Can be used in a NAS
- First used in a SAN.
RAID
Redundant array of independent risks
-RAID arrays often use caching to improve performance
RAID 0
- Disk Striping with no parity.
- Need at least 2 Disks
- No Fault Tolerance
RAID 1
- Disk Mirroring
- Needs at least 2 Disks.
RAID 5
- Disk Striping with parity
- Need at least 3 Disks
- Most Widely used
RAID 0+1
- Disk Striping + Disk Mirroring.
- For every disk striping, there needs to be one mirroring it.
- 2 Disk Striping = 2 Disk Mirroring. 1:1 Ratio
What is Backplane?
- Provide data
- Control signal connectors for the hard drives.
- Provide the interconnect for the front I/O board, power and locator buttons, and system/component status LEDs.
When a backplane fails, it affects all the drives that connect to it.
-Backplane failures are less likely than drive failures
Types of RAID failures
RAID arrays often use caching to improve performance.
- A battery-backed cache is one that can maintain the data in the cache during a power outage, preventing the loss of data still residing in the cache at the moment of the power failure. When this battery fails, it can cause the loss of data.
- Disk failure disk is unable to be accessed, data corrupted, unusual noises from the disk (click of death).
Cache
Hardware or software that is used to store something, usually data, temporarily in a computing environment.
When the cache is turned off, you lose all of those performance benefits. Cache can be enabled in both the operating system and in the storage software.
Ways to prevent RAID failure
- No SPOF
- Secondary/backup RAID controller/disks
- Monitoring systems to identify issues before they become a failure
- Cloud based backup
Full Backup
- All Data is backed-up
- Slowest backup time
- Fastest restore time
- High storage space
Incremental Backup
- New/Modified data backed-up
- Fast backup time
- Slower restore time
- Low storage space
Differential Backup
- All data since last full backed-up
- Moderate backup time
- Faster restore time
- Moderate storage space
NFS
Network File System
SMB
Server Message Block
AFS
Andrew File System
Types of Storage failures
- Loss of data/data corruption
- Security issues
- Loss of connectivity
- Exposure of data
HBA
Host Bus Adaptor
What is a Fibre channel switch?
A network switch compatible with the FC protocol. It allows the creation of a Fibre Channel fabric, that is the core component of a SAN
Troubleshoot HBA failures
- Check cables
- Reseat the adaptor in its slot
- Implement redundant HBAs
- If the HBA fails it is a SPOF
FCoE
Fibre Channel over Ethernet
-Encapsulates Fibre Channel traffic within Ethernet frames -Unlike iSCSI, it does not use IP at all.
Causes and consequences of SAN failures over FCoE
-TCP/IP misconfiguration (inability for some / all nodes to access storage;)
-Failure of a single NIC (increased load on remaining
NIC on a single node and possible reduced throughput for this node or complete
outage if this is the only onboard NIC)
-incorrect / invalid LUN (inability to access logical storage
device)
-loss of network - total outage.
- single misconfiguration or failed standard switch (increased load on remaining
switches and possible reduced throughput or storage outage. The standard data
network may also be impacted;)
Causes and consequences of storage area network (SAN) failures over the FC protocol
-single misconfigured or failed Fibre switch
-Loss of all Fibre switches (complete loss of access to storage. The standard data
network is unaffected)
-Failure of a single HBA (increased load on remaining HBA on a
single node and possible reduced throughput for this node or complete outage if this
is the only onboard HBA.)
misconfigured NFS
loss of access for Linux/ NAS network shares
misconfigured SMB
loss of access to Windows network shares
misconfigured AFS
loss of access for Apple systems shares
misconfigured authentication /authorisation
-loss of access to some / all NAS
What is a LUN
Logical Unit Number
-A number used to identify a logical unit, which is a device addressed by the SCSI protocol or SAN protocols which encapsulate SCSI, such as Fibre Channel or iSCSI.
LUN masking
The process of controlling access to a LUN by effectively “hiding” its existence from those who should not have access. This makes the storage available to some hosts but not to others.
Causes and consequences of SAN failures over the iSCSI
-single misconfiguration or failed standard switch (increased load on remaining
switches and possible reduced throughput or storage outage. Standard data network)
may also be impacted;
-TCP/IP misconfiguration (Inability for some / all nodes to access storage;)
-failure of a single NIC (increased load on remaining NIC on a single node and
possible;)
-reduced throughput for this node or complete outage if this is the only onboard NIC.
-incorrect / invalid IQN address (inability to access logical storage device)
IQN
iSCSI Qualified Name
- Logical name that is not linked to an IP address
- Unique
Causes and consequences of cloud storage failures
-router / ISP failure (complete loss of access;)
-TCP/IP misconfiguration (inability for some / all nodes to access storage;) -misconfigured authentication / authorisation (loss of access to some / all cloud
storage;)
-cloud service provider failure (loss of access to data and / or loss of data.)
Types of cloud services
OneDrive, Dropbox, Google Drive, Amazon EC2 and Microsoft Azure.
Causes and impact of computer system failures (Hardware)
- memory component failure (individual node crash;)
- SSD/HDD failure (system crash and possible loss of data;)
- CPU failure
- power supply
- cooling (intermittent crash or possibly permanent damage to components)
- Heated/power related issues
SSD/HDD
Solid State Drive/Hard disk drive
Causes and impact of network failures
- NIC failure (loss of access from/to one network node;)
- Switch failure (loss of access to LAN;)
- Router failure (loss of access to WAN;)
- Firewall (blocked IP’s, no protocols, ports;)
- Web proxy (No internet access)
- Cabling - incorrect cable type (straight through / cross over or exceeding recommended lengths and / or EMI)
- Wireless (exceeding maximum distance and / or EMI or RFI|)
EMI/RFI
Electromagnetic interference/ Radio-frequency interference
Patches and Hotfix
- Patch - Publicly released update to fix a known bug/issue.
- Hotfix - update to fix a very specific issue, not always publicly released