Lesson 14: Explaining Risk Management and Disaster Recovery Concepts Flashcards
vulnerable business processes
If a company operates with one or more vulnerable business processes, it could result in disclosure, modification, loss, destruction, or interruption of critical data or it could lead to loss of service to customers. Quite apart from immediate financial losses arising from such security incidents, either outcome will reduce a company’s reputation. If a bank lost its trading floor link to its partners, even for an hour, since the organization’s primary function (trading) would be impossible, huge losses may result. Consequently, when planning a network or other IT system, you must consider the impact of data loss and service unavailability on the organization.
Risk management
rocess for identifying, assessing, and mitigating vulnerabilities and threats to the essential functions that a business must perform to serve its customers.
Risk management performed over five phases:
- Identify mission essential functions—mitigating risk can involve a large amount of expenditure, so it is important to focus efforts. Part of risk management is to analyze workflows and identify the mission essential functions that could cause the whole business to fail if they are not performed. Part of this process also involves identifying critical systems and assets that support these functions.
- Identify vulnerabilities—for each function or workflow (starting with the most critical), analyze systems and assets to discover and list any vulnerabilities or weaknesses to which they may be susceptible. Vulnerability refers to a specific flaw or weakness that could be exploited to overcome a security system.
- Identify threats—for each function or workflow, identify the threats that may take advantage of or exploit or accidentally trigger vulnerabilities. Threat refers to the sources or motivations of people and things that could cause loss or damage.
- Analyze business impacts—the likelihood of a vulnerability being activated as a security incident by a threat and the impact of that incident on critical systems give factors for evaluating risks. There are quantitative and qualitative methods of analyzing impacts.
- Identify risk response—for each risk, identify possible countermeasures and assess the cost of deploying additional security controls. Most risks require some sort of mitigation, but other types of response might be more appropriate for certain types and level of risks.
mission essential function (MEF)
one that cannot be deferred. This means that the organization must be able to perform the function as close to continually as possible, and if there is any service disruption, the mission essential functions must be restored first.
Analysis of mission essential functions is generally governed by four main metrics:
- Maximum tolerable downtime (MTD) is the longest period of time that a business function outage may occur for without causing irrecoverable business failure. Each business process can have its own MTD, such as a range of minutes to hours for critical functions, 24 hours for urgent functions, 7 days for normal functions, and so on. MTDs vary by company and event. Each function may be supported by multiple systems and assets. The MTD sets the upper limit on the amount of recovery time that system and asset owners have to resume operations. For example, an organization specializing in medical equipment may be able to exist without incoming manufacturing supplies for three months because it has stockpiled a sizeable inventory. After three months, the organization will not have sufficient supplies and may not be able to manufacture additional products, therefore leading to failure. In this case, the MTD is three months.
- Recovery time objective (RTO) is the period following a disaster that an individual IT system may remain offline. This represents the amount of time it takes to identify that there is a problem and then perform recovery (restore from backup or switch in an alternative system, for instance).
- Work Recovery Time (WRT). Following systems recovery, there may be additional work to reintegrate different systems, test overall functionality, and brief system users on any changes or different working practices so that the business function is again fully supported.
Note: RTO+WRT must not exceed MTD!
• Recovery Point Objective (RPO) is the amount of data loss that a system can sustain, measured in time. That is, if a database is destroyed by a virus, an RPO of 24 hours means that the data can be recovered (from a backup copy) to a point not more than 24 hours before the database was infected.
For example, a customer leads database might be able to sustain the loss of a few hours’ or days’ worth of data (the salespeople will generally be able to remember who they have contacted and re-key the data manually). Conversely, order processing may be considered more critical, as any loss will represent lost orders and it may be impossible to recapture web orders or other processes initiated only through the computer system, such as linked records to accounting and fulfilment.
MTD and RPO help to determine which business functions are critical and also to specify appropriate risk countermeasures. For example, if your RPO is measured in days, then a simple tape backup system should suffice; if RPO is zero or measured in minutes or seconds, a more expensive server cluster backup and redundancy solution will be required.
For most businesses, the most critical functions will be those that enable customers to find them and for the business to interact with those customers. In practical terms, this means telecoms and web presence. Following that is probably the capability to fulfil products and services. Back-office functions such as accounting, HR, and marketing are probably necessary rather than critical.
identification of critical systems
To support the resiliency of mission essential and primary business functions, it is crucial for an organization to perform the identification of critical systems. This means compiling an inventory of its business processes and its tangible and intangible assets and resources. These could include:
- People (employees, visitors, and suppliers).
- Tangible assets (buildings, furniture, equipment and machinery (plant), ICT equipment, electronic data files, and paper documents).
- Intangible assets (ideas, commercial reputation, brand, and so on).
- Procedures (supply chains, critical procedures, standard operating procedures).
It is important to be up to date with best practice and standards relevant to the type of business or organization. This can help to identify procedures or standards that are not currently being implemented but should be. Make sure that the asset identification process captures system architecture as well as individual assets (that is, understand and document the way assets are deployed, utilized, and how they work together).
business process analysis (BPA)
For mission essential functions, it is important to reduce the number of dependencies between components. Dependencies are identified by performing a business process analysis (BPA) for each function.
The BPA should identify the following factors:
- Inputs—the sources of information for performing the function (including the impact if these are delayed or out of sequence).
- Hardware—the particular server or data center that performs the processing.
- Staff and other resources supporting the function.
- Outputs—the data or resources produced by the function.
- Process flow—a step-by-step description of how the function is performed.
Reducing dependencies makes it easier to provision redundant systems to allow the function to failover to a backup system smoothly. This means the system design can more easily eliminate the sort of weakness that comes from having single points of failure (SPoF) that can disrupt the function.
Key performance indicators (KPI)
Each IT system will be supported by assets, such as servers, disk arrays, switches, routers, and so on. Key performance indicators (KPI) can be used to determine the reliability of each asset.
Some of the main KPIs relating to service availability are as follows:
- Mean Time to Failure (MTTF) and Mean Time Between Failures (MTBF) represent the expected lifetime of a product. MTTF should be used for non-repairable assets. For example, a hard drive may be described with an MTTF, while a server (which could be repaired by replacing the hard drive) would be described with an MTBF. You will often see MTBF used indiscriminately, however. For most devices, failure is more likely early and late in life, producing the so-called “bathtub curve.”
- The calculation for MTBF is the total time divided by the number of failures. For example, if you have 10 devices that run for 50 hours and two of them fail, the MTBF is 250 hours/failure (10*50)/2.
- The calculation for MTTF for the same test is the total time divided by the number of devices, so (10*50)/10, with the result being 50 hours/failure.
MTTF/MTBF can be used to determine the amount of asset redundancy a system should have. A redundant system can failover to another asset if there is a fault and continue to operate normally. It can also be used to work out how likely failures are to occur.
• Mean Time to Repair (MTTR) is a measure of the time taken to correct a fault so that the system is restored to full operation. This can also be described as mean time to “replace” or “recover.” This metric is important in determining the overall Recovery Time Objective (RTO).
asset management process
An asset management process takes inventory of and tracks all the organization’s critical systems, components, devices, and other objects of value. It also involves collecting and analyzing information about these assets so that personnel can make more informed changes or otherwise work with assets to achieve business goals. There are many software suites and associated hardware solutions available for tracking and managing assets (or inventory). An asset management database can be configured to store as much or as little information as is deemed necessary, though typical data would be type, model, serial number, asset ID, location, user(s), value, and service information. Tangible assets can be identified using a barcode label or Radio Frequency ID (RFID) tag attached to the device (or more simply, using an identification number). An RFID tag is a chip programmed with asset data. When in range of a scanner, the chip activates and signals the scanner. The scanner alerts management software to update the device’s location. As well as asset tracking, this allows the management software to track the location of the device, making theft more difficult.
Within the inventory of assets and business processes, it is important to assess their relative importance. In the event of a disaster that requires that recovery processes take place over an extended period, critical systems must be prioritized over merely necessary ones.
It is also important to realize that asset management procedures can easily go astray—assets get mislabeled, new assets are not recorded, and so on. In these cases, some troubleshooting tactics can include:
- Ensure that all relevant assets are participating in a tracking system like barcodes or passive radio frequency IDs (RFIDs).
- Ensure that there is a process in place for tagging newly acquired or developed assets.
- Ensure that there is a process in place for removing obsolete assets from the system.
- Check to see if any assets have conflicting IDs.
- Check to see if any assets have inaccurate metadata.
- Ensure that asset management software can correctly read and interpret tracking tags.
- Update asset management software to fix any bugs or security issues.
Threat assessment
means compiling a prioritized list of probable and possible threats. Some of these can be derived from the list of assets (that is, threats that are specific to your organization); others may be non-specific to your particular organization.
important to note that threats could be created by something that the organization is not doing or an asset that it does not own as much as they can from things that it is doing or assets it does own. Consider (for instance) the impact on business processes of the following:
- Public infrastructure (transport, utilities, law and order).
- Supplier contracts (security of supply chain).
- Customer’s security (the sudden failure of important customers due to their own security vulnerabilities can be as damaging as an attack on your own organization).
- Epidemic disease.
A large part of threat assessment will identify human threat actors, both internal and external to the organization, so try to understand their motives to assess the level of risk that each type of threat actor poses. Threat actors discussed earlier—such as hackers, organized crime, nation state actors, and insider threat—can all be described as working with some sort of intent. Another threat source is the all-too-human propensity for carelessness and, consequently, accidental damage. Misuse of a system by a naïve user may not intend harm but can nonetheless cause widespread disruption. Misconfiguration of a system can create vulnerabilities that might be exploited by other threat agents. Threat actors also need not be human.
Threat awareness must consider threats posed by events such as natural disasters, accidents, and by legal liabilities:
- Natural disaster—threat sources such as river or sea floods, earthquakes, storms, and so on. Natural disasters may be quite predictable (as is the case with areas prone to flooding or storm damage) or unexpected, and therefore difficult to plan for.
- Manmade disaster—intentional man-made threats such as terrorism, war, or vandalism/arson or unintentional threats, such as user error or information disclosure through social media platforms.
- Environmental—those caused by some sort of failure in the surrounding environment. These could include power or telecoms failure, pollution, or accidental damage (including fire).
- Legal and commercial—some examples include:
- Downloading or distributing obscene material.
- Defamatory comments published on social networking sites.
- Hijacked mail or web servers used for spam or phishing attacks.
•
Third-party liability for theft or damage of personal data.
• Accounting and regulatory liability to preserve accurate records.
These cases are often complex, but even if there is no legal liability, the damage done to the organization’s reputation could be just as serious.
supply chain
Threat assessment should not be confined to analyzing your own business. You must also consider critical suppliers. A supply chain is a series of companies involved in fulfilling a product. Assessing a supply chain involves determining whether each link in the chain is sufficiently robust. Each supplier in the chain may have their own suppliers, and assessing “robustness” means obtaining extremely privileged company information. Consequently, assessing the whole chain is an extremely complex process and is an option only available to the largest companies. Most businesses will try to identify alternative sources for supplies so that the disruption to a primary supplier does not represent a single point of failure.
For each business process and each threat, you must assess the degree of risk that exists. Calculating risk is complex, but the two main variables are likelihood and impact:
- Likelihood is the probability of the threat being realized.
- Impact is the severity of the risk if realized as a security incident. This may be determined by factors such as the value of the asset or the cost of disruption if the asset is compromised.
Business impact analysis (BIA)
process of assessing what losses might occur for each threat scenario. For instance, if a roadway bridge crossing a local river is washed out by a flood and employees are unable to reach a business facility for five days, estimated costs to the organization need to be assessed for lost manpower and production. Impacts can be categorized in several ways.
impacts on life and safety
The most critical type of impact is one that could lead to loss of life or critical injury. The most obvious risks to life and safety come from natural disasters, man-made disasters, and accidents (such as fire). Sometimes industries have to consider life and safety impacts in terms of the security of their products, however. For example, a company makes wireless adapters, originally for use with laptops. The security of the firmware upgrade process is important, but it has no impact on life or safety. The company, however, earns a new contract to supply the adapters to provide connectivity for in-vehicle electronics systems. Unknown to the company, a weakness in the design of the in-vehicle system allows an adversary to use compromised wireless adapter firmware to affect the car’s control systems (braking, acceleration, and steering). The integrity of the upgrade process now has an impact on safety.
impacts on property
Again, risks whose impacts affect property (premises) mostly arise due to natural disaster, war/terrorism, and fire.
impacts on finance and reputation
It is important to realize that the value of an asset does not refer solely to its material value. The two principal additional considerations are direct costs associated with the asset being compromised (downtime) and consequent costs to intangible assets, such as the company’s reputation. For example, a server may have a material cost of a few hundred dollars. If the server were stolen, the costs incurred from not being able to do business until it can be recovered or replaced could run to thousands of dollars. In addition, that period of interruption where orders cannot be taken or go unfulfilled leads customers to look at alternative suppliers, resulting in perhaps more thousands of lost sales and goodwill.
impacts on privacy
Another important source of risk is the unauthorized disclosure of personally identifiable information (PII). The theft or loss of PII can have an enormous impact on an individual because of the risk of identity theft and because once disclosed, the PII cannot easily be changed or recovered.
Organizations should perform regular audits to assess whether PII is processed securely. These may be modelled on formal audit documents mandated by US laws, notably The Privacy Act and the Federal Information Security Management Act (FISMA):
- Privacy Threshold Analysis (PTA)—An initial audit to determine whether a computer system or workflow collects, stores, or processes PII to a degree where a PIA must be performed. PTAs must be repeated every three years.
- Privacy Impact Assessment (PIA)—A detailed study to assess the risks associated with storing, processing, and disclosing PII. The study should identify vulnerabilities that may lead to data breach and evaluate controls mitigating those risks.
- System of Records Notice (SORN)—A formal document listing PII maintained by a federal agency of the US government.
There are two methods of assessing likelihood and risk:
quantitative and qualitative