Lesson 14: Explaining Risk Management and Disaster Recovery Concepts Flashcards
vulnerable business process
if a company operates with one or more vulnerable business processes, it could result in disclosure, modification, loss, destruction, or interruption of critical data or it could lead to loss of service to customers
risk management
process for identifying, assessing, and mitigating vulnerabilities and threats to the essential functions that a business must perform to serve its customers
phases of risk management
- Identify mission essential functions—mitigating risk can involve a large amount of expenditure, so it is important to focus efforts. Part of risk management is to analyze workflows and identify the mission essential functions that could cause the whole business to fail if they are not performed. Part of this process also involves identifying critical systems and assets that support these functions.
- Identify vulnerabilities—for each function or workflow (starting with the most critical), analyze systems and assets to discover and list any vulnerabilities or weaknesses to which they may be susceptible. Vulnerability refers to a specific flaw or weakness that could be exploited to overcome a security system.
- Identify threats—for each function or workflow, identify the threats that may take advantage of or exploit or accidentally trigger vulnerabilities. Threat refers to the sources or motivations of people and things that could cause loss or damage.
- Analyze business impacts—the likelihood of a vulnerability being activated as a security incident by a threat and the impact of that incident on critical systems give factors for evaluating risks. There are quantitative and qualitative methods of analyzing impacts.
- Identify risk response—for each risk, identify possible countermeasures and assess the cost of deploying additional security controls. Most risks require some sort of mitigation, but other types of response might be more appropriate for certain types and level of risks.
mission essential function (MEF)
- mission essential function (MEF) is one that cannot be deferred
- means that the organization must be able to perform the function as close to continually as possible, and if there is any service disruption, the mission essential functions must be restored first
analysis of mission essential functions is generally governed by four main metrics
- Maximum tolerable downtime (MTD) is the longest period of time that a business function outage may occur for without causing irrecoverable business failure. Each business process can have its own MTD, such as a range of minutes to hours for critical functions, 24 hours for urgent functions, 7 days for normal functions, and so on. MTDs vary by company and event. Each function may be supported by multiple systems and assets. The MTD sets the upper limit on the amount of recovery time that system and asset owners have to resume operations
- Recovery time objective (RTO) is the period following a disaster that an individual IT system may remain offline. This represents the amount of time it takes to identify that there is a problem and then perform recovery (restore from backup or switch in an alternative system, for instance)
- Work Recovery Time (WRT). Following systems recovery, there may be additional work to reintegrate different systems, test overall functionality, and brief system users on any changes or different working practices so that the business function is again fully supported
- Recovery Point Objective (RPO) is the amount of data loss that a system can sustain, measured in time. That is, if a database is destroyed by a virus, an RPO of 24 hours means that the data can be recovered (from a backup copy) to a point not more than 24 hours before the database was infected
identification of critical systems
- means compiling an inventory of its business processes and its tangible and intangible assets and resources
- include:
- People (employees, visitors, and suppliers).
- Tangible assets (buildings, furniture, equipment and machinery (plant), ICT equipment, electronic data files, and paper documents).
- Intangible assets (ideas, commercial reputation, brand, and so on).
- Procedures (supply chains, critical procedures, standard operating procedures).
business process analysis (BPA)
- for mission essential functions, it is important to reduce the number of dependencies between components
- dependencies are identified by performing a business process analysis (BPA) for each function
- BPA should identify the following factors:
• Inputs—the sources of information for performing the function (including the impact if these are delayed or out of sequence). - Hardware—the particular server or data center that performs the processing.
- Staff and other resources supporting the function.
- Outputs—the data or resources produced by the function.
- Process flow—a step-by-step description of how the function is performed.
single points of failure (SPoF)
reducing dependencies makes it easier to provision redundant systems to allow the function to failover to a backup system smoothly. This means the system design can more easily eliminate the sort of weakness that comes from having single points of failure (SPoF) that can disrupt the function
key performance indicators (KPI)
- used to determine the reliability of each asset
- main KPIs relating to service availability are as follows:
• Mean Time to Failure (MTTF) and Mean Time Between Failures (MTBF) represent the expected lifetime of a product. MTTF should be used for non-repairable assets
• The calculation for MTBF is the total time divided by the number of failures. For example, if you have 10 devices that run for 50 hours and two of them fail, the MTBF is 250 hours/failure (1050)/2
• The calculation for MTTF for the same test is the total time divided by the number of devices, so (1050)/10, with the result being 50 hours/failure
• MTTF/MTBF can be used to determine the amount of asset redundancy a system should have. A redundant system can failover to another asset if there is a fault and continue to operate normally. It can also be used to work out how likely failures are to occur
• Mean Time to Repair (MTTR) is a measure of the time taken to correct a fault so that the system is restored to full operation. This can also be described as mean time to “replace” or “recover.” This metric is important in determining the overall Recovery Time Objective (RTO)
asset management
- process takes inventory of and tracks all the organization’s critical systems, components, devices, and other objects of value
- also involves collecting and analyzing information about these assets so that personnel can make more informed changes or otherwise work with assets to achieve business goals
- many software suites and associated hardware solutions available for tracking and managing assets (or inventory)
- asset management database can be configured to store as much or as little information as is deemed necessary, though typical data would be type, model, serial number, asset ID, location, user(s), value, and service information
asset management troubleshooting tactics
- Ensure that all relevant assets are participating in a tracking system like barcodes or passive radio frequency IDs (RFIDs).
- Ensure that there is a process in place for tagging newly acquired or developed assets.
- Ensure that there is a process in place for removing obsolete assets from the system.
- Check to see if any assets have conflicting IDs.
- Check to see if any assets have inaccurate metadata.
- Ensure that asset management software can correctly read and interpret tracking tags.
- Update asset management software to fix any bugs or security issues.
threat assessment
- means compiling a prioritized list of probable and possible threats
- consider (for instance) the impact on business processes of the following:
• Public infrastructure (transport, utilities, law and order). - Supplier contracts (security of supply chain).
- Customer’s security (the sudden failure of important customers due to their own security vulnerabilities can be as damaging as an attack on your own organization).
- Epidemic disease.
- threat awareness must consider threats posed by events such as natural disasters, accidents, and by legal liabilities:
- Natural disaster—threat sources such as river or sea floods, earthquakes, storms, and so on. Natural disasters may be quite predictable (as is the case with areas prone to flooding or storm damage) or unexpected, and therefore difficult to plan for.
- Manmade disaster—intentional man-made threats such as terrorism, war, or vandalism/arson or unintentional threats, such as user error or information disclosure through social media platforms.
- Environmental—those caused by some sort of failure in the surrounding environment. These could include power or telecoms failure, pollution, or accidental damage (including fire).
- Legal and commercial—some examples include:
- Downloading or distributing obscene material.
- Defamatory comments published on social networking sites.
- Hijacked mail or web servers used for spam or phishing attacks.
- Third-party liability for theft or damage of personal data.
- Accounting and regulatory liability to preserve accurate records.
supply chain
- series of companies involved in fulfilling a product
- assessing a supply chain involves determining whether each link in the chain is sufficiently robust
- each supplier in the chain may have their own suppliers, and assessing “robustness” means obtaining extremely privileged company information
degree of risk
- two main variables are likelihood and impact:
- Likelihood is the probability of the threat being realized.
- Impact is the severity of the risk if realized as a security incident. This may be determined by factors such as the value of the asset or the cost of disruption if the asset is compromised.
business impact analysis (BIA)
process of assessing what losses might occur for each threat scenario
impacts on property
risks whose impacts affect property (premises) mostly arise due to natural disaster, war/terrorism, and fire
impacts on finance and reputation
- important to realize that the value of an asset does not refer solely to its material value
- two principal additional considerations are direct costs associated with the asset being compromised (downtime) and consequent costs to intangible assets, such as the company’s reputation
impacts on privacy
- important source of risk is the unauthorized disclosure of personally identifiable information (PII)
- modelled on formal audit documents mandated by US laws, notably The Privacy Act and the Federal Information Security Management Act (FISMA):
- Privacy Threshold Analysis (PTA)—An initial audit to determine whether a computer system or workflow collects, stores, or processes PII to a degree where a PIA must be performed. PTAs must be repeated every three years.
- Privacy Impact Assessment (PIA)—A detailed study to assess the risks associated with storing, processing, and disclosing PII. The study should identify vulnerabilities that may lead to data breach and evaluate controls mitigating those risks.
- System of Records Notice (SORN)—A formal document listing PII maintained by a federal agency of the US government.
methods of assessing likelihood and risk
- quantitative risk assessment aims to assign concrete values to each risk factor:
• Single Loss Expectancy (SLE)—The amount that would be lost in a single occurrence of the risk factor. This is determined by multiplying the value of the asset by an Exposure Factor (EF). EF is the percentage of the asset value that would be lost.
• Annual Loss Expectancy (ALE)—The amount that would be lost over the course of a year. This is determined by multiplying the SLE by the Annual Rate of Occurrence (ARO).
- qualitative risk assessment:
- avoids the complexity of the quantitative approach and is focused on identifying significant risk factors
- security Categorizations (SC) to information systems based on the impact that a breach of confidentiality, integrity, or availability would have on the organization as a whole. Potential impacts can be classified as:
• Low—minor damage or loss to an asset or loss of performance (though essential functions remain operational). - Moderate—significant damage or loss to assets or performance.
- High—major damage or loss or the inability to perform one or more essential functions.
risk response strategies
- Avoidance means that you stop doing the activity that is risk-bearing
- Transference (or sharing) means assigning risk to a third party (such as an insurance company or a contract with a supplier that defines liabilities)
- Acceptance (or retention) means that no countermeasures are put in place either because the level of risk does not justify the cost or because there will be unavoidable delay before the countermeasures are deployed
risk register
- document showing the results of risk assessments in a comprehensible format
- register may resemble the “traffic light” grid shown earlier with columns for impact and likelihood ratings, date of identification, description, countermeasures, owner/route for escalation, and status
- commonly depicted as scatterplot graphs, where impact and likelihood are each an axis, and the plot point is associated with a legend that includes more information about the nature of the plotted risk
- risk register should be shared between stakeholders (executives, department managers, and senior technicians) so that they understand the risks associated with the workflows that they manage
reactive
need to change is often described either as reactive, where the change is forced on the organization
proactive
need for change is initiated internally
Request for Change (RFC)
- in a formal change management process, the need for change and the procedure for implementing the change is captured in a Request for Change (RFC) document and submitted for approval
- RFC will then be considered at the appropriate level
- major or significant changes might be managed as a separate project and require approval through a Change Advisory Board (CAB)
risk management processes
- Identify mission-essential functions and the critical systems within each function.
- Identify those assets supporting business functions and critical systems, and determine their values.
- Calculate MTD, RPO, RTO, MTTF, MTTR, and MTBF for functions and assets.
- Look for possible vulnerabilities that, if exploited, could adversely affect each function or system.
- Determine potential threats to functions and systems.
- Determine the probability or likelihood of a threat exploiting a vulnerability.
- Determine the impact of the potential threat, whether it be recovery from a failed system or the implementation of security controls that will reduce or eliminate risk.
- Identify impact scenarios that put your business operations at risk.
- Identify the risk analysis method that is most appropriate for your organization. For quantitative and semi-quantitative risk analysis, calculate SLE and ARO for each threat, and then calculate the ALE.
•
Identify potential countermeasures, ensuring that they are cost-effective and perform as expected. For example, identify single points of failure and, where possible, establish redundant or alternative systems and solutions.
• Clearly document all findings discovered and decisions made during the assessment in a risk register.
Continuity of Operations Planning (COOP) or business continuity plan (BCP)
- collection of processes that enable an organization to maintain normal business operations in the face of some adverse event
single points of failure
- when implementing a network, the goal will always be to minimize the single points of failure and to allow ongoing service provision despite a disaster
IT Contingency Planning (ITCP)
- to perform IT Contingency Planning (ITCP), think of all the things that could fail, determine whether the result would be a critical loss of service, and whether this is unacceptable
high availability
availability is the percentage of time that the system is online, measured over the defined period (typically one year)