Section 28 - Planning for the Worst Flashcards
How can you plan for the worst?
By conducting redundancy and disaster recovery planning
What is “redundancy”?
A term that usually refers to something that’s extra or unnecessary. In the IT world this is a great thing.
*** It is always a good idea to have backups and multiples of things - like two servers, two connections to the internet, etc. That’s why mobile phones have both wi-fi and cellular data.
When look at redundancy planning, what are you look for?
You’re looking at your networks, servers and systems to make sure that they’re fault tolerant and that they can continue to provide services 24 hours a day.
Even with redundancy planning, things still go wrong. In these situations, what comes into play?
Disaster Recovery
What is a “redundant power supply”?
An enclosure that provides two or more complete power supplies inside of one.
Most servers are going to utilize how many individual power supplies?
Two at all times
This is an example of a redundant power supply
What are some conditions that describe the various issues that can be involved with a power source?
Surges
Spikes
Sags
Brownouts
Blackouts
What is a “surge”?
There is an unexpected increase in the amount of voltage that’s being provided
*** The US power supply is 120 volts. If it jumped up to 124 that small increase would be considered a surge. If it jumped up a lot, that would be considered a spike.
What is a “spike”?
This is a short transient voltage that’s going to be due to a short circuit, a tripped circuit breaker, a power outage, or even a lightning strike.
How can you protect against a surge or a spike?
A surge protector
What is a “sag”?
Kind of like a surge, but in reverse.
Where a surge goes up, a sag is an unexpected decrease in the amount of voltage provided.
*** Typically, this is for a short duration and it won’t affect the power to your computers. When it goes for a long time, it’s known as a brownout.
What is a “brownout”?
When the voltage is reduced for a long time.
*** Usually, your lights start dimming and your computer will turn off.
What is a “blackout”?
A total loss of power for an extended period of time
A problem associated with blackouts is when power is re-established after one occurs, you get a ___ in power that occurs and can damage your computers.
spike
*** This is why surge protectors are so important
What are the different types of backup power?
- UPS
What is a UPS?
Uninterruptible Power Supply
Combines the functionality of a surge suppressor with a battery backup
These also provide line conditioning so they protect against brownouts, sags, and surges.
What is line conditioning?
Protects your machines against short duration of power going up and back down and keeps them running smoothly
*** This is helpful against things like brownouts, sags and surges.
What is “backup generator”?
Part of an emergency power system.
Used when there’s an outage of your regular power supply for the electric grid.
What are the different types of generators?
- Portable gas-engine generators
- permanently installed generators
- battery-inverter generators
How does a portable gas-engine generator work?
The least expensive type to run with gasoline (and sometimes solar power.)
*** They are noisy, high maintenance, and have to be started manually. For this reason, they’re not typically used for large server rooms.
How does a permanently installed generator work?
Generally, these run on natural gas, propane or diesel fuel.
*** These are generally connected directly to your organization’s electrical panel. They are expensive, quiet and complex to install.
How does a battery inverter generator work?
These are based on lead acid batteries.
*** They are super quiet, require little interaction and require the batteries changed out every few years. These however can’t withhold for long periods of time so often times these types of generators will be coupled with a diesel generator.
What does RAID stand for?
Redundant Array of Independent Disks
What does RAID do?
Essentially allows you to combine multiple physical hard disks into a single logical hard disk drive inside of the OS
What the different types of RAID?
RAID 0
RAID 1
RAID 5
RAID 6
RAID 10
How does RAID 0 work?
Provides data STRIPING across multiple disks and is used to increase your performance
*** Two disks working in tandem are required for RAID 0 to work
How does RAID 1 work?
This is going to provide redundancy by mirroring the data identically to two hard drives. So if one fails, the other one continues to operate.
*** This can only be used with two physical hard disks that provide one single logical hard disk inside the OS.
How does RAID 5 work?
This is known for striping the disk with parity.
It requires at least three physical disk drives to work and it provides fault tolerance by striping the data cross multiple disks.
*** This means if one of those drives fail, I can pull it out, put in a new drive and it will rebuild itself inside the RAID.
How does RAID 6 work?
This is a modified form of RAID 5.
Just like RAID 5, it uses data striping across multiple disks but instead of having one stripe for parity data it has two. This requires another disk in the array to work. So you’ll need at least four physical disks. This means you can lose up to two of these disks and the RAID will continue to function.
How does RAID 10 work?
Combines the advantages of RAID 1 and RAID 0
This requires four physical disks, just like RAID 6 and provides you with a redundant mirror of striped drives that is fully fault-tolerant.
RAIDS can be categorized as…?
failure resistant
fault tolerant
disaster tolerant
Which RAIDs are failure resistant?
RAID 1 and RAID 5
This is because it’s going to protect against any loss of data if a single disk fails inside of it.
Which RAIDs are fault tolerant?
RAID 1, RAID 5 and RAID 6
This is because even if a single component fails, the RAID can continue to function properly.
Which RAIDs are disaster tolerant?
RAID 1 and RAID 10
This is because it has two independent zones with full access to the data at all times
What is network redundancy?
This is focused on ensuring our network remains up and running at all times to increase its availability.
How do we ensure network redundancy?
- Servers often have two or more network interface parts.
- Switches or routers are configured with multiple and redundant network cables going up between them.
To create redundancy for our servers, we’re going to use a concept known as…?
Clustering
What is “clustering”?
When you take two or more servers and have them work together to perform a particular job function
We can cluster our servers as either…?
Failover Clusters
Load-Balancing Clusters
What are “failover clusters?”
These are designed so the secondary server can take over, in case the primary one fails with limited or no downtime.
They are concerned with high availability
What is an example of failover clusters?
Domain controllers
** You have DC1 and DC2. DC1 acts as your primary server and if it goes down then DC2 would take over.
What is “load-balancing clustering”?
This is focused on servers sharing their resources such as their processing power, their memory and hard disks.
What is an example of load-balancing clustering?
This site is very busy and they can’t have a single server handling all of the load. THey use a large cluster of load-balancing servers to do that because a single server cannot handle all of the requests on its own.
What are redundant sites?
Literally another place to go in case the one you’re in experiences some kind of disaster
Redundant sites are classified as one of three categories:
Hot Sites
Warm Sites
Cold Sites
What is a “hot site”?
A near duplicate to your original location
It will have servers, phones, desks, lights, power, connectivity, everything. It’s just as if you’ve picked up and went to a different building.
What is a “warm site”?
Will have most of your prior capabilities but not all of them
It will have things like computers phones and servers but they may not be configured, patched or updated.
What is a “cold site”?
This is going to have things like tables, chairs, bathrooms and some technical setup like phones and data/electric lines.
It will not have computers or servers. None of it is configured.
What are the three different types of data backups?
Full
Incremental
Differential
What happens when you do a “full” data backup?
All of the contents of your drive are backed up. Every single file.
What happens when you do a “incremental” back up?
This is going to back up only the contents of the drive that have changed since the last full backup, or since your last incremental backup
(drive = meaning the way in which you choose you back up your data. This could be an entire hard disk, or just a file/folder on that disk)
What happens when you do a “differential” back up?
This will only back up the contents of the drive that have changed since the last full backup
What is the most commonly used backup media?
Backup tape
There are three main rotation schemes when it comes to backup tapes. What are they?
10 tape rotation
Grandfather-father rotation
Tower of Hanoi
What is a “10 tape rotation”?
This method provides easy access to the data that been backed up. It could be accomplished during a two-week backup period.
*** Each tape is going to be used once per day for two weeks and then the entire set is reused again. This means after two weeks, you don’t have any more backups, though.
What is the “grandfather-father rotation”?
When attempting to use this design, there are three sets of backup tapes:
- The daily (the son)
- The weekly (the father)
- The monthly (grandfather)
These tapes are rotated on a daily basis, and the last one of the week graduates to the father status. Then, these tapes, the weekly ones, are rotated on a weekly basis, and after four weeks they become the grandfather. The monthly tapes are kept offsite.
What is the “Towers of Hanoi” rotation?
This is a system based on the puzzle with the same name.
This set also uses three sets of backups but they’re rotated differently.
- 1st tape - used every second day
- 2nd tape - used every fourth day
- 3rd tape - used every eighth day
*** This system becomes complicated to use especially when used in conjunction with the different categories of backups. For example, if you used full, differential and incremental then how do those work out with this rotation?
What is “snapshot”?
With this, all of the applications, the hard drives, and even the OS is backed up to create a full backup of the system as a virtual disk image.
*** This makes it easy to redeploy that system onto a cloud server or another offsite location but it takes up a lot of storage.
What is Disaster Recovery Planning?
This is a development of an organize and in depth plan for problems that could affect the access of your data or your organization’s building
What is Business Impact Analysis?
Also known as BIA
This is a systemic activity that identifies organizational risks and determines their effect on ongoing mission-critical operations
When conducting BIA, this is governed by metrics that express your system in terms of availability. What are the different metrics used?
MTD
RTO
WRT
RPO
What is MTD?
Maximum Tolerable Downtime
This is the longest period of time a business can be inoperable without causing irrevocable business failure
*** Think, how long can you be down before you’re out of business?
What is RTO?
Recovery Time Objective
This is the length of time it takes after an event to resume your normal business operations and activities
*** If something goes down, how quickly do you need it back?
What is WRT?
Work Recovery Time
This is the length of time in addition to the RTO of individual systems to perform re-integration and testing of a restored or upgraded system, following an event.
What is RPO?
Recovery Point Objective
This is the longest period of time that an organization can tolerate lost data being unrecoverable
*** How long can you be without your data
What does “mean time to repair” mean?
MTTR
The time on average that it takes to go from system failure to resuming operations
What does “mean time between failures” mean?
MTBF
Basically, if you start running your servers today and they run for one year without failure then that is the time to failure. You can then take that information and compare it to your last time to repair. If you average that out, it gives you the mean time between failures.