3.4 Resiliency Flashcards

1
Q

High availability

A

Redundancy doesn’t always mean always available
– May need to be powered on manually
* HA (high availability)
– Always on, always available
* May include many different components
working together
– Active/Active can provide scalability advantages
* Higher availability almost always means higher costs
– There’s always another contingency you could add
– Upgraded power, high-quality server components, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Server clustering

A

Combine two or more servers
– Appears and operates as a single large server
– Users only see one device
* Easily increase capacity and availability
– Add more servers to the cluster
* Usually configured in the operating system
– All devices in the cluster commonly use the same OS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Load balancing

A

Load is distributed across multiple servers
– The servers are often unaware of each other
* Distribute the load across multiple devices
– Can be different operating systems
* The load balancer adds or removes devices
– Add a server to increase capacity
– Remove any servers not responding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Site resiliency

A

Recovery site is prepped
– Data is synchronized
* A disaster is called
– Business processes failover to the alternate
processing site
* Problem is addressed
– This can take hours, weeks, or longer
* Revert back to the primary location
– The process must be documented for
both directions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hot site

A

An exact replica
– Duplicate everything
* Stocked with hardware
– Constantly updated
– You buy two of everything
* Applications and software are constantly updated
– Automated replication
* Flip a switch and everything moves
– This may be quite a few switches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cold site

A

No hardware
– Empty building
* No data
– Bring it with you
* No people
– Bus in your team

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Warm site

A

Somewhere between cold and hot
– Just enough to get going
* Big room with rack space
– You bring the hardware
* Hardware is ready and waiting
– You bring the software and data
– Geographic dispersion
* These sites should be physically different than the
organization’s primary location
– Many disruptions can affect a large area
– Hurricane, tornado, floods, etc.
* Can be a logistical challenge
– Transporting equipment
– Getting employee’s on-site
– Getting back to the main office

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Platform diversity

A

Every operating system contains potential security issues
– You can’t avoid them
* Many security vulnerabilities are specific to a single OS
– Windows vulnerabilities don’t commonly affect Linux
or macOS
– And vice versa
* Use many different platforms
– Different applications, clients, and OSes
– Spread the risk around

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multi-cloud systems

A

There are many cloud providers
– Amazon Web Service, Microsoft Azure,
Google Cloud, etc.
* Plan for cloud outages
– These can sometimes happen
* Data is both geographically dispersed and
cloud service dispersed
– A breach with one provider would not affect the
others
– Plan for every contingency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuity of operations planning (COOP)

A
  • Not everything goes according to plan
    – Disasters can cause a disruption to the norm
  • We rely on our computer systems
    – Technology is pervasive
  • There needs to be an alternative
    – Manual transactions
    – Paper receipts
    – Phone calls for transaction approvals
  • These must be documented and tested before a
    problem occurs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Capacity planning

A

Match supply to the demand
– This isn’t always an obvious equation
* Too much demand
– Application slowdowns and outages
* Too much supply
– You’re paying too much
* Requires a balanced approach
– Add the right amount of people
– Apply appropriate technology
– Build the best infrastructure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

People

A

Some services require human intervention
– Call center support lines
– Technology services
* Too few employees
– Recruit new staff
– It may be time consuming to add more staff
* Too many employees
– Redeploy to other parts of the organization
– Downsize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Technology

A

Pick a technology that can scale
– Not all services can easily grow and shrink
* Web services
– Distribute the load across multiple web services
* Database services
– Cluster multiple SQL servers
– Split the database to increase capacity
* Cloud services
– Services on demand
– Seemingly unlimited resources (if you pay the money)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Infrastructure

A

The underlying framework
– Application servers, network services, etc.
– CPU, network, storage
* Physical devices
– Purchase, configure, and install
* Cloud-based devices
– Easier to deploy
– Useful for unexpected capacity changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Recovery testing

A

Test yourselves before an actual event
– Scheduled update sessions (annual, semi-annual, etc.)
* Use well-defined rules of engagement
– Do not touch the production systems
* Very specific scenario
– Limited time to run the event
* Evaluate response
– Document and discuss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tabletop exercises

A

Performing a full-scale disaster drill can be costly
– And time consuming
* Many of the logistics can be determined through analysis
– You don’t physically have to go through a disaster or drill
* Get key players together for a tabletop exercise
– Talk through a simulated disaster

17
Q

Fail over

A

A failure is often inevitable
– It’s “when,” not “if”
* We may be able to keep running
– Plan for the worst
* Create a redundant infrastructure
– Multiple routers, firewalls, switches, etc.
* If one stops working, fail over to the operational unit
– Many infrastructure devices and services can do this
automatically

18
Q

Simulation

A

Test with a simulated event
– Phishing attack, password requests, data breaches
* Going phishing
– Create a phishing email attack
– Send to your actual user community
– See who bites
* Test internal security
– Did the phishing get past the filter?
* Test the users
– Who clicked?
– Additional training may be required

19
Q

Parallel processing

A

Split a process through multiple (parallel) CPUs
– A single computer with multiple CPU cores or
multiple physical CPUs
– Multiple computers
* Improved performance
– Split complex transactions across
multiple processors
* Improved recovery
– Quickly identify a faulty system
– Take the faulty device out of the list of available
processors
– Continue operating with the remaining processors

20
Q

Backups

A

Incredibly important
– Recover important and valuable data
– Plan for disaster
* Many different implementations
– Total amount of data
– Type of backup
– Backup media
– Storage location
– Backup and recovery software
– Day of the week

21
Q

Onsite vs. offsite backups

A

On site backups
– No Internet link required
– Data is immediately available
– Generally less expensive than off site
* Off site backups
– Transfer data over Internet or WAN link
– Data is available after a disaster
– Restoration can be performed from anywhere
* Organizations often use both
– More copies of the data
– More options when restoring

22
Q

Frequency

A

How often to backup
– Every week, day, hour?
* This may be different between systems
– Some systems may not change much each day
* May have multiple backup sets
– Daily, weekly, and monthly
* This requires significant planning
– Multiple backup sets across different days
– Lots of media to manage

23
Q

Encryption

A

A history of data is on backup media
– Some of this media may be offsite
* This makes it very easy for an attacker
– All of the data is in one place
* Protect backup data using encryption
– Everything on the backup media is unreadable
– The recovery key is required to restore the data
* Especially useful for cloud backups and storage
– Prevent anyone from eavesdropping

24
Q

Snapshots

A

Became popular on virtual machines
– Very useful in cloud environments
* Take a snapshot
– An instant backup of an entire system
– Save the current configuration and data
* Take another snapshot after 24 hours
– Contains only the changes between snapshots
* Take a snapshot every day
– Revert to any snapshot
– Very fast recovery

25
Q

Recovery testing

A

It’s not enough to perform the backup
– You have to be able to restore
* Disaster recovery testing
– Simulate a disaster situation
– Restore from backup
* Confirm the restoration
– Test the restored application and data
* Perform periodic audits
– Always have a good backup
– Weekly, monthly, quarterly checks

26
Q

Replication

A

An ongoing, almost real-time backup
– Keep data synchronized in multiple locations
* Data is available
– There’s always a copy somewhere
* Data can be stored locally to all users
– Replicate data to all remote sites
* Data is recoverable
– Disasters can happen at any time

27
Q

Journaling

A

Power goes out while writing data to storage
– The stored data is probably corrupted
* Recovery could be complicated
– Remove corrupted files, restore from backup
* Before writing to storage, make a journal entry
– After the journal is written, write the data to storage
* After the data is written to storage, update the journal
– Clear the entry and get ready for the next

28
Q

Power resiliency

A

Power is the foundation of our technology
– It’s important to properly engineer and plan for outages
* We usually don’t make our own power
– Power is likely provided by third-parties
– We can’t control power availability
* There are ways to mitigate power issues
– Short power outages
– Long-term power issues

29
Q

UPS

A

Uninterruptible Power Supply
– Short-term backup power
– Blackouts, brownouts, surges
* UPS types
– Offline/Standby UPS
– Line-interactive UPS
– On-line/Double-conversion UPS
* Features
– Auto shutdown, battery capacity, outlets,
phone line suppression

30
Q

Generators

A

Long-term power backup
– Fuel storage required
* Power an entire building
– Some power outlets may be marked
as generator-powered
* It may take a few minutes to get the generator
up to speed
– Use a battery UPS while the generator is starting