Lesson 14: Explaining Risk Management and Disaster Recovery Concepts Flashcards

Question

Quantitative risk assessment aims to assign concrete values to each risk factor.

Answer 1

* Single Loss Expectancy (SLE)—The amount that would be lost in a single occurrence of the risk factor. This is determined by multiplying the value of the asset by an Exposure Factor (EF). EF is the percentage of the asset value that would be lost. * Annual Loss Expectancy (ALE)—The amount that would be lost over the course of a year. This is determined by multiplying the SLE by the Annual Rate of Occurrence (ARO). The problem with quantitative risk assessment is that the process of determining and assigning these values is complex and time consuming. The accuracy of the values assigned is also difficult to determine without historical data (often, it has to be based on subjective guesswork). However, over time and with experience, this approach can yield a detailed and sophisticated description of assets and risks and provide a sound basis for justifying and prioritizing security expenditure.

Answer 2

avoids the complexity of the quantitative approach and is focused on identifying significant risk factors. The qualitative approach seeks out people's opinions of which risk factors are significant. Assets and risks may be placed in simple categories. For example, assets could be categorized as Irreplaceable, High Value, Medium Value, and Low Value; risks could be categorized as one-off or recurring and as Critical, High, Medium, and Low probability. Another simple approach is the "Traffic Light" impact grid. For each risk, a simple Red, Yellow, or Green indicator can be put into each column to represent the severity of the risk, its likelihood, cost of controls, and so on. This approach is simplistic but does give an immediate impression of where efforts should be concentrated to improve security.

Answer 3

* Low—minor damage or loss to an asset or loss of performance (though essential functions remain operational). * Moderate—significant damage or loss to assets or performance. * High—major damage or loss or the inability to perform one or more essential functions.

Answer 4

* High value asset, regardless of the likelihood of the threat(s). * Threats with high likelihood (that is, high ARO). * Procedures, equipment, or software that increase the likelihood of threats (for example, legacy applications, lack of user training, old software versions, unpatched software, running unnecessary services, not having auditing procedures in place, and so on). In theory, security controls or countermeasures could be introduced to address every vulnerability. The difficulty is that security controls can be expensive, so you must balance the cost of the control with the cost associated with the risk.

Answer 5

It is not often possible to eliminate risk; rather the aim is to mitigate risk factors to the point where the organization is exposed only to a level of risk that it can afford (residual risk). Risk mitigation (or remediation) is the overall process of reducing exposure to or the effects of risk factors. There are several ways of mitigating risk. If you deploy a countermeasure that reduces exposure to a threat or vulnerability that is risk deterrence (or reduction). Risk reduction refers to controls that can either make a risk incident less likely or less costly (or perhaps both). For example, if fire is a threat, a policy strictly controlling the use of flammable materials on site reduces likelihood while a system of alarms and sprinklers reduces impact by (hopefully) containing any incident to a small area. Another example is offsite data backup, which provides a remediation option in the event of servers being destroyed by fire.

Answer 6

• Avoidance means that you stop doing the activity that is risk-bearing. For example, a company may develop an in-house application for managing inventory and then try to sell it. If while selling it, the application is discovered to have numerous security vulnerabilities that generate complaints and threats of legal action, the company may make the decision that the cost of maintaining the security of the software is not worth the revenue and withdraw it from sale. Obviously, this would generate considerable bad feeling amongst existing customers. Avoidance is not often a credible option. • Transference (or sharing) means assigning risk to a third party (such as an insurance company or a contract with a supplier that defines liabilities). For example, a company could stop in-house maintenance of an e‑commerce site and contract the services to a third party, who would be liable for any fraud or data theft. Note: Note that in this sort of case, it is relatively simple to transfer the obvious risks, but risks to the company's reputation remain. If a customer's credit card details are stolen because they used your unsecure e‑commerce application, the customer won't care if you or a third party were nominally responsible for security. It is also unlikely that legal liabilities could be completely transferred in this way. • Acceptance (or retention) means that no countermeasures are put in place either because the level of risk does not justify the cost or because there will be unavoidable delay before the countermeasures are deployed. In this case, you should continue to monitor the risk (as opposed to ignoring it).

Answer 7

document showing the results of risk assessments in a comprehensible format. The register may resemble the "traffic light" grid shown earlier with columns for impact and likelihood ratings, date of identification, description, countermeasures, owner/route for escalation, and status. Risk registers are also commonly depicted as scatterplot graphs, where impact and likelihood are each an axis, and the plot point is associated with a legend that includes more information about the nature of the plotted risk. A risk register should be shared between stakeholders (executives, department managers, and senior technicians) so that they understand the risks associated with the workflows that they manage.

Answer 8

In order to reduce the risk that changes to configuration items will cause service disruption, a documented change management process can be used to implement changes in a planned and controlled way. The need to change is often described either as reactive, where the change is forced on the organization, or as proactive, where the need for change is initiated internally. Changes can also be categorized according to their impact and level of risk (major, significant, minor, or normal, for instance).

Answer 9

In a formal change management process, the need for change and the procedure for implementing the change is captured in a Request for Change (RFC) document and submitted for approval. The RFC will then be considered at the appropriate level. This might be a supervisor or department manager if the change is normal or minor. Major or significant changes might be managed as a separate project and require approval through a Change Advisory Board (CAB).

Answer 10

* Identify mission-essential functions and the critical systems within each function. * Identify those assets supporting business functions and critical systems, and determine their values. * Calculate MTD, RPO, RTO, MTTF, MTTR, and MTBF for functions and assets. * Look for possible vulnerabilities that, if exploited, could adversely affect each function or system. * Determine potential threats to functions and systems. * Determine the probability or likelihood of a threat exploiting a vulnerability. * Determine the impact of the potential threat, whether it be recovery from a failed system or the implementation of security controls that will reduce or eliminate risk. * Identify impact scenarios that put your business operations at risk. * Identify the risk analysis method that is most appropriate for your organization. For quantitative and semi-quantitative risk analysis, calculate SLE and ARO for each threat, and then calculate the ALE. * Identify potential countermeasures, ensuring that they are cost-effective and perform as expected. For example, identify single points of failure and, where possible, establish redundant or alternative systems and solutions. * Clearly document all findings discovered and decisions made during the assessment in a risk register.

Answer 11

a collection of processes that enable an organization to maintain normal business operations in the face of some adverse event. There are numerous types of events, both natural and man-made, that could disrupt the business and require a continuity effort to be put in place. They may be instigated by a malicious party, or they may come about due to careless or negligence on the part of non-malicious personnel. The organization may suffer loss or leakage of data; damage to or destruction of hardware and other physical property; impairment of communications infrastructure; loss of or harm done to personnel; and more. When these negative events become a reality, the organization will need to rely on resiliency and automation strategies to mitigate their effect on day-to-day operations.

Answer 12

Computer systems require protection from hardware failure, software failure, and system failure (failure of network connectivity devices, for instance). When implementing a network, the goal will always be to minimize the single points of failure and to allow ongoing service provision despite a disaster. To perform IT Contingency Planning (ITCP), think of all the things that could fail, determine whether the result would be a critical loss of service, and whether this is unacceptable. Then identify strategies to make the system resilient. How resilient a system is can be determined by measuring or evaluating several properties.

Answer 13

One of the key properties of a resilient system is high availability. Availability is the percentage of time that the system is online, measured over the defined period (typically one year). The corollary of availability is downtime (that is, the percentage or amount of time during which the system is unavailable). The maximum tolerable downtime (MTD) metric states the requirement for a particular business function. High availability is usually loosely described as 24x7 (24 hours per day, 7 days per week) or 24x365 (24 hours per day, 365 days per year). For a critical system, availability will be described as "two-nines" (99%) up to five- or six-nines (99.9999%).

Answer 14

A system that can experience failures and continue to provide the same (or nearly the same) level of service is said to be fault tolerant. Fault tolerance is often achieved by provisioning redundancy for critical components and single points of failure. A redundant component is one that is not essential to the normal function of a system but that allows the system to recover from the failure of another component.

Answer 15

* Redundant components (power supplies, network cards, drives (RAID), and cooling fans) provide protection against hardware failures. Hot swappable components allow for easy replacement (without having to shut down the server). * Uninterruptible Power Supplies (UPS) and Standby Power Supplies. * Backup strategies—provide protection for data. * Cluster services are a means of ensuring that the total failure of a server does not disrupt services generally. While these computer systems are important, thought also needs to be given about how to make a business "fault tolerant" in terms of staffing, utilities (heat, power, communications, transport), customers, and suppliers.

Answer 16

A resilient system does not just need to be able to cope with faults and outages, but it must also be able to cope with changing demand levels. These properties are measured as scalability and elasticity means that the costs involved in supplying the service to more users are linear. For example, if the number of users doubles in a scalable system, the costs to maintain the same level of service would also double (or less than double). If costs more than double, the system is less scalable. To scale out is to add more resources in parallel with existing resources. To scale up is to increase the power of existing resources.

Answer 17

A resilient system does not just need to be able to cope with faults and outages, but it must also be able to cope with changing demand levels. These properties are measured as scalability and elasticity refers to the system's ability to handle changes in demand in real time. A system with high elasticity will not experience loss of service or performance if demand suddenly doubles (or triples, or quadruples). Conversely, it may be important for the system to be able to reduce costs when demand is low. Elasticity is a common selling point for cloud services. Instead of running a cloud resource for 24 hours a day, 7 days a week, that resource can diminish in power or shut down completely when demand for that resource is low. When demand picks up again, the resource will grow in power to the level required. This results in cost-effective operations.

Answer 18

refers to the ability to switch between available processing and data resources to meet service requests. This is typically achieved using load balancing services during normal operations or automated failover during a disaster.

Answer 19

many disks can act as backups for each other to increase reliability and fault tolerance. If one disk fails, the data is not lost, and the server can keep functioning. The RAID advisory board defines RAID levels, numbered from 0 to 6, where each level corresponds to a specific type of fault tolerance. There are also proprietary and nested RAID solutions. Some of the most commonly implemented types of RAID are listed in the following table.

Answer 20

Striping without parity (no fault tolerance). This means that data is written in blocks across several disks simultaneously. This can improve performance, but if one disk fails, so does the whole volume and data on it will be corrupted.

Answer 21

Mirroring—Data is written to two disks simultaneously, providing redundancy (if one disk fails, there is a copy of data on the other). The main drawback is that storage efficiency is only 50%.

Answer 22

Striping with parity—Data is written across three or more disks, but additional information (parity) is calculated. This allows the volume to continue if one disk is lost. This solution has better storage efficiency than RAID 1.

Answer 23

Double parity or level 5 with an additional parity stripe. This allows the volume to continue when two disks have been lost.

Answer 24

Nesting RAID sets generally improves performance or redundancy (for example, some nested RAID solutions can support the failure of more than one disk).

Answer 25

Network cabling should be designed to allow for multiple paths between the various servers, so that during a failure of one part of the network, the rest remains operational (redundant connections). Routers are great fault tolerant devices, because they can communicate system failures and IP packets can be routed via an alternate device.

Answer 26

There are very few parts of IT infrastructure that cannot be automated through some sort of code (either a program or a script). Technologies such as Software Defined Networking (SDN), virtualization, and DevOps make it possible to provision network links and server systems through programming and scripting. This means that a resiliency strategy can specify automated courses of action that can work to maintain or to restore services with minimal human intervention or even no intervention at all.

Answer 27

An automation solution will have a system of continuous monitoring to detect service failures and security incidents. Continuous monitoring might use a locally installed agent or heartbeat protocol or may involve checking availability remotely. As well as monitoring the primary site, it is important to observe the failover components to ensure that they are recovery ready. You can also automate the courses of action that a monitoring system takes, like configuring an IPS to automatically block traffic that it deems suspicious.

Answer 28

* Master image—this is the "gold" copy of a server instance, with the OS, applications, and patches all installed and configured. This is faster than using a template, but keeping the image up to date can involve more work than updating a template. * Template—similar to a master image, this is the build instructions for an instance. Rather than storing a master image, the software may build and provision an instance according to the template instructions. Another important process in automating resiliency strategies is to provide configuration validation. This process ensures that a recovery solution is working at each layer (hardware, network connectivity, data replication, and application). An automation solution for incident and disaster recovery will have a dashboard of key indicators and may be able to evaluate metrics such as compliance with RPO and RTO from observed data.

Answer 29

When recovering systems, it may be necessary to ensure that any artifacts from the disaster, such as malware or backdoors, are removed when reconstituting the production environment. This can be facilitated in an environment designed for non-persistence. Non-persistence means that any given instance is completely static in terms of processing function. Data is separated from the instance so that it can be swapped out for an "as new" copy without suffering any configuration problems.

Answer 30

* Snapshot/revert to known state—This is a saved system state that can be reapplied to the instance. * Rollback to known configuration—A physical instance might not support snapshots but has an "internal" mechanism for restoring the baseline system configuration, such as Windows System Restore. * Live boot media—another option is to use an instance that boots from read-only storage to memory rather than being installed on a local read/write hard disk.

Answer 31

* Be aware of the different ways your business could be threatened. * Implement an overall business continuity process in response to real events. * Ensure the continuity planning is comprehensive and addresses all critical dimensions of the organization. * Draft an IT contingency plan to ensure that IT procedures continue after an adverse event. * Ensure that IT personnel are trained on this plan. * Incorporate failover techniques into continuity planning. * Ensure that systems are highly available and meet an adequate level of performance. * Ensure that critical systems have redundancy to mitigate loss of data and resources due to adverse events. * Ensure that critical systems are fault tolerant so that service disruption is minimized in the event of failure or compromise. * Ensure that systems are adequately scalable and can meet the long-term increase in demand as the business grows. * Ensure that systems are elastic and can meet the short-term increase and decrease in resource demands. * Consider consolidating multiple storage devices in a RAID for redundancy and fault tolerance. * Choose the RAID level that provides the appropriate level of redundancy and fault tolerance for your business needs. * Supplement manual security processes with automated processes in order to increase efficiency and accuracy. * Consider incorporating non-persistent virtual infrastructure to more easily maintain baseline security.

Answer 32

As you have seen, part of Continuity of Operation Planning (COOP) is to provision fault tolerant systems that provide high availability through redundancy and failover. This sort of well-engineered system will hopefully be resilient to most types of fault and allow any recovery or maintenance operations to be performed in the background.

Answer 33

Providing redundant devices and spares or configuring a server cluster on the local network allows the redundant systems to be swapped in if existing systems fail. Enterprise-level networks often also provide for alternate processing sites or recovery sites. A site is another location that can provide the same (or similar) level of service. An alternate processing site might always be available and in use, while a recovery site might take longer to set up or only be used in an emergency.

Answer 34

Operations are designed to failover to the new site until the previous site can be brought back online. Failover is a technique that ensures a redundant component, device, application, or site can quickly and efficiently take over the functionality of an asset that has failed. For example, load balancers provide failover in the event that one or more servers or sites behind the load balancer are down or are taking too long to respond. Once the load balancer detects this, it will redirect inbound traffic to an alternate processing server or site. Thus, redundant servers in the load balancer pool ensure there is no interruption of service.

Answer 35

Recovery sites are referred to as being hot, warm, or cold. A hot site can failover almost immediately. It generally means that the site is already within the organization's ownership and is ready to deploy.

Answer 36

A cold site takes longer to set up (up to a week), and a warm site is something between the two.

Answer 37

A warm site could be similar, but with the requirement that the latest data set will need to be loaded.

Answer 38

Clearly, providing redundancy on this scale can be very expensive. Sites are often leased from service providers, such as Comdisco or IBM (a subscription service).

Answer 39

Another option is for businesses to enter into reciprocal arrangements to provide mutual support. This is cost effective but complex to plan and set up. Another issue is that creating a duplicate of anything doubles the complexity of securing that resource properly. The same security procedures must apply to redundant sites, spare systems, and backup data as apply to the main copy.

Answer 40

- location selection - distance and replication - legal implications/data sovereignty

Answer 41

Choosing the location for a processing facility or data center requires considering multiple factors. A geographically remote site has advantages in terms of deterring and detecting intruders. It is much easier to detect suspicious activity in a quiet, remote environment than it is in a busy, urban one. On the other hand, a remote location carries risks. Infrastructure (electricity, heating, water, telecommunications, and transport links) may not be as reliable and require longer to repair. Recruitment and retention of skilled employees can also be more difficult. In many locations, flooding is the most commonly encountered natural disaster hazard. Rising sea levels and changing rainfall patterns mean that previously safe areas can become subject to flood risks within just a few years. Without spending a lot of money on a solution, common-sense measures can be taken to minimize the impact of flood. If possible, the computer equipment and cabling should be positioned above the ground floor and away from major plumbing. Certain local areas may also be subject to specific known hazards, such as earthquakes, volcanoes, and storms. If there is no other choice as to location, natural disaster risks such as this can often be mitigated by building designs that have been developed to cope with local conditions.

Answer 42

As well as being a suitable location for a data processing center, you must also consider the distance between the primary site and the secondary (alternate or recovery) site. Determining the optimum distance between two replicating sites depends on evaluating competing factors: * Locating the alternate site a short distance from the primary site—in the same city, for example—makes it easier for personnel at the primary site to resume operations at the recovery site, or to physically transfer data from the backup site to the primary site. * If the sites are too close together (within about 500km), they could both be affected by the same disaster. For example, the entire Southeastern United States is susceptible to hurricane season. To avoid a disaster resulting from a hurricane, an organization with a primary site in Florida may choose to keep a recovery site in a different part of the country. * The farther apart the sites are, the costlier replication will be. Replication is the process of duplicating data between different servers or sites. RAID mirroring and server clustering are examples of disk-to-disk and server-to-server replication. Replication can either be synchronous or asynchronous. Synchronous replication means that the data must be written at both sites before it can be considered committed. Asynchronous replication means that data is mirrored from a primary site to a secondary site. Disk-to-disk and server-to-server replication are relatively simple to accomplish as they can use direct access RAID or local network technologies. Site-to-site replication is considerably harder and more expensive as it relies on Wide Area Network technologies. Synchronous replication is particularly sensitive to distance, as the longer the communications pathway, the greater the latency of the link. Latency can be mitigated by provisioning fiber optic links.

Answer 43

For an organization handling cross-border transactions, there is the need to respect the national laws affecting privacy and data processing in which a site is located. A different state or country will likely have its own specific laws and regulations that your data will be subject to. You may be forced to apply different data retention practices than what you're used to at your primary site or other local alternate sites. Aside from the direct legal implications, you must also consider the concept of data sovereignty. Data sovereignty describes the sociopolitical outlook of a nation concerning computing technology and information. Some nations may respect data privacy more or less than others; and likewise, some nations may disapprove of the nature and content of certain data. They may even be suspicious of security measures such as encryption. There might be data sovereignty implications for cloud services, for replicating sites, and for data backups and archiving, if data is copied from one country to another.

Answer 44

If a site suffers an uncontrolled outage, in ideal circumstances processing will be switched to the alternate site and the outage can be resolved without any service interruption. If an alternate processing site is not available, then the main site must be brought back online as quickly as possible to minimize service disruption. This does not mean that the process can be rushed, however. A complex facility such as a data center or campus network must be reconstituted according to a carefully designed order of restoration. If systems are brought back online in an uncontrolled way, there is the serious risk of causing additional power problems or of causing problems in the network, OS, or application layers because dependencies between different appliances and servers have not been met.

Answer 45

1. Enable and test power delivery systems (grid power, Power Distribution Units (PDUs), UPS, secondary generators, and so on). 2. Enable and test switch infrastructure, then routing appliances and systems. 3. Enable and test network security appliances (firewalls, IDS, proxies). 4. Enable and test critical network servers (DHCP, DNS, NTP, and directory services). 5. Enable and test backend and middleware (databases and business logic). Verify data integrity. 6. Enable and test front-end applications. 7. Enable client workstations and devices and client browser access.

Answer 46

An alternate business practice will allow the information flow to resume to at least some extent. A typical fallback plan is to handle transactions using pen-and-paper systems. This type of fallback can work only if it is well planned, though. Staff must know how to use the alternate system—what information must be captured (supply standard forms) and to whom it should be submitted (and how, if there are no means of electronic delivery). Alternate business practices can only work if the information flow is well-documented and there are not too many complex dependencies on gathering and processing the data

Answer 47

As well as risks to systems, a COOP has to take on the macabre issue of human capital resilience. Put bluntly, this means "Is someone else available to fulfill the same role if an employee is incapacitated?" Succession planning targets the specific issue of leadership and senior management. Most business continuity and DR plans are heavily dependent on a few key people to take charge during the disaster and ensure that the plan is put into effect. Succession planning ensures that these sorts of competencies are widely available to an organization.

Answer 48

* In the short term, files that change frequently might need retaining for version control. Short-term retention is also important in recovering from malware infection. Consider the scenario where a backup is made on Monday, a file is infected with a virus on Tuesday, and when that file is backed up later on Tuesday, the copy made on Monday is overwritten. This means that there is no good means of restoring the uninfected version of the file. Short term retention is determined by how often the youngest media sets are overwritten. * In the long term, data may need to be stored to meet legal requirements or to comply with company policies or industry standards. Any data that must be retained in a particular version past the oldest sets should be moved to archive storage.

Answer 49

For these reasons, backups are kept back to certain points in time. As backups take up a lot of space, and there is never limitless storage capacity, this introduces the need for storage management routines and techniques to reduce the amount of data occupying backup storage media while giving adequate coverage of the required recovery window. The recovery window is determined by the Recovery Point Objective (RPO), which is determined through business continuity planning. Advanced backup software can prevent media sets from being overwritten in line with the specified retention policy.

Answer 50

Full - Data Selection: All selected data regardless of when it was previously backed up - Backup/Restore Time: High/low (one tape set) - Archive Attribute: Cleared Incremental - Data Selection: New files and files modified since the last backup - Backup/Restore Time: Low/high (multiple tape sets) - Archive Attribute: Cleared Differential - Data Selection: All data modified since the last full backup - Backup/Restore Time: Moderate/moderate (no more than two sets) - Archive Attribute: Not Cleared

Answer 51

The criteria for determining which method to use is based on the time it takes to restore versus the time it takes to back up. Assuming a backup is performed every working day, an incremental backup only includes files changed during that day, while a differential backup includes all files changed since the last full backup. Incremental backups save backup time but can be more time-consuming when the system must be restored. The system must be restored from the last full backup set and then from each incremental backup that has subsequently occurred. A differential backup system only involves two tape sets when restoration is required. Doing a full backup on a large network every day takes a long time. A typical strategy for a complex network would be a full weekly backup followed by an incremental or differential backup at the end of each day.

Answer 52

means of getting around the problem of open files. If the data that you're considering backing up is part of a database, such as SQL data or a messaging system, such as Exchange, then the data is probably being used all the time. Often copy-based mechanisms will be unable to back up open files. Short of closing the files, and so too the database, a copy-based system will not work. A snapshot is a point-in-time copy of data maintained by the file system. A backup program can use the snapshot rather than the live data to perform the backup. In Windows, snapshots are provided for on NTFS volumes by the Volume Shadow Copy Service (VSS). They are also supported on Sun's ZFS file system, and under some enterprise distributions of Linux.

Answer 53

Backed up and archived data need to be stored as securely as "live" data. A data backup has the same confidentiality and integrity requirements as its source. Typically, backup media is physically secured against theft or snooping by keeping it in a restricted part of the building, with other server and network equipment. Many backup solutions also use encryption to ensure data confidentiality should the media be stolen. Additionally, you must plan for events that could compromise both the live data and the backup set. Natural disasters, such as fires, earthquakes, and floods could leave an organization without a data backup, unless they have kept a copy offsite. Offsite storage is obviously difficult to keep up to date. Without a network that can support the required bandwidth, the offsite media must be physically brought onsite (and if there is no second set of offsite media, data is at substantial risk at this time), the latest backup performed, and then removed to offsite storage again. Quite apart from the difficulty and expense of doing this, there are data confidentiality and security issues in transporting the data

Answer 54

Within the scope of business continuity planning, disaster recovery plans (DRPs) describe the specific procedures to follow to recover a system or site to a working state. A disaster could be anything from a loss of power or failure of a minor component to man-made or natural disasters, such as fires, earthquakes, or acts of terrorism.

Answer 55

* Identify scenarios for natural and non-natural disaster and options for protecting systems. Plans need to account for risk (a combination of the likelihood the disaster will occur and the possible impact on the organization) and cost. * There is no point implementing disaster recovery plans that financially cripple the organization. The business case is made by comparing the cost of recovery measures against the cost of downtime. Downtime cost is calculated from lost revenues and ongoing costs (principally salary). The recovery plan should not generally exceed the downtime cost. Of course, downtime will include indefinable costs, such as loss of customer goodwill, restitution for not meeting service contracts, and so on. * Identify tasks, resources, and responsibilities for responding to a disaster. * Who is responsible for doing what? How can they be contacted? What happens if they are not available? * Which functions are most critical? Where should effort first be concentrated? * What resources are available? Should they be pre-purchased and held in stock? Will the disaster affect availability of supplies? * Which functions are most critical? Where should effort first be concentrated? * What resources are available? Should they be pre-purchased and held in stock? Will the disaster affect availability of supplies? * What are the timescales for resumption of normal operations? * Train staff in the disaster planning procedures and how to react well to change. As well as restoring systems, the disaster recovery plan should identify stakeholders who need to be informed about any security incidents. There may be a legal requirement to inform the police, fire service, or building inspectors about any safety-related or criminal incidents. If third-party or personal data is lost or stolen, the data subjects may need to be informed. If the disaster affects services, customers need to be informed about the time-to-fix and any alternative arrangements that can be made.

Answer 56

* Walkthroughs, workshops, and orientation seminars—often used to provide basic awareness and training for disaster recovery team members, these exercises describe the contents of DRPs, and other plans, and the roles and responsibilities outlined in those plans. * Tabletop exercises—staff "ghost" the same procedures as they would in a disaster, without actually creating disaster conditions or applying or changing anything. These are simple to set up but do not provide any sort of practical evidence of things that could go wrong, time to complete, and so on. * Functional exercises—action-based sessions where employees can validate DRPs by performing scenario-based activities in a simulated environment. * Full-scale exercises— action-based sessions that reflect real situations, these exercises are held onsite and use real equipment and real personnel as much as possible. Full-scale exercises are often conducted by public agencies, but local organizations might be asked to participate.

Answer 57

Also identify timescales for disaster plans to be reviewed, to take account of changing circumstances and business needs. Following an incident, it is vital to hold a review meeting to analyze why the incident occurred, what could have been done to prevent it, and how effective was the response? An After-Action Report (AAR) or "lessons learned" report is a process to determine how effective COOP and DR planning and resources were. An AAR would be commissioned after DR exercises or after an actual incident. In an ideal situation, someone will be delegated the task of recording actions taken and making notes about the progress of the exercise or incident. This is obviously easier in an exercise than a real-life incident though! The next phase would be to have a post-incident or exercise meeting to discuss implementation of the lessons learned. It is vital that all staff are able to contribute freely and openly to the discussion, so these meetings must avoid apportioning blame and focus on improving procedures. If there are disciplinary concerns in terms of not following procedure, those should be dealt with separately. The delegated person (or persons) will then complete a report containing a history of the incident, impact assessment, and recommendations for upgrading resources or procedures.

Answer 58

* Implement disaster recovery to restore IT operations after a major adverse event. * Form a recovery team with multiple job roles and responsibilities. * Follow a disaster recovery process from notifying stakeholders to actually beginning recovery. * Ensure the DRP includes alternate sites, asset inventory, backup procedures, and other critical information. * Ensure that recovery processes are secure from attack or other compromise. * Consider maintaining alternate recovery sites to quickly restore operations when the main site is compromised. * Choose between a hot, warm, and cold site depending on your business needs and means. * Determine an order of restoration to get business-critical systems back online first. * Incorporate alternate business practices into the BCP if necessary. * Draft a succession plan in case personnel are not available to put the DRP into effect. * Choose a data backup type that meets your speed, reliability, and storage needs. * Ensure that backups are stored in a secure location. * Consider the security implications of maintaining multiple backups. * Regularly test the integrity of your backups. * Consider placing backups offsite to mitigate damage to a particular location. * Be aware of the advantages and disadvantages of close vs. distant backup sites. * Research the legal and data sovereignty issues affecting regions where your backup sites are located. * Conduct testing exercises to prepare personnel for executing the DRP. * Draft AARs to learn from your successes and mistakes. * Ask yourself key questions about the event to identify areas for improvement. * Modify the DRP as needed in response to lessons learned.

Answer 59

Computer forensics is the practice of collecting evidence from computer systems to a standard that will be accepted in a court of law. It is unlikely that a computer forensic professional will be retained by an organization, so such investigations are normally handled by law enforcement agencies. In some cases, however, an organization may conduct a forensic investigation without the expectation of legal action. Law enforcement agencies will prioritize the investigation of the crime over business continuity. This can greatly compromise the recovery process, especially in smaller businesses, as an organization's key assets may be taken as evidence.

Answer 60

Like DNA or fingerprints, digital evidence—often referred to as electronically stored information (ESI)—is mostly latent. Latent means that the evidence cannot be seen with the naked eye; rather, it must be interpreted using a machine or process. Forensic investigations are most likely to be launched against crimes arising from insider threats, notably fraud or misuse of equipment (to download or store obscene material, for instance). Prosecuting external threat sources is often extremely difficult, as the attacker may well be in a different country or have taken effective steps to disguise his or her location and identity. Such prosecutions are normally initiated by law enforcement agencies, where the threat is directed against military or governmental agencies or is linked to organized crime. Cases can take years to come to trial.

Answer 61

term used in US and UK common law to require that people only be convicted of crimes following the fair application of the laws of the land. More generally, due process can be understood to mean having a set of procedural safeguards to ensure fairness. This principle is central to forensic investigation. If a forensic investigation is launched (or if one is a possibility), it is important that technicians and managers are aware of the processes that the investigation will use. It is vital that they are able to assist the investigator and that they not do anything to compromise the investigation. In a trial, defense counsel will try to exploit any uncertainty or mistake regarding the integrity of evidence or the process of collecting it. The first response period following detection and notification is often critical. To gather evidence successfully, it is vital that staff do not panic or act without thinking.

Answer 62

refers to the fact that information that may be relevant to a court case must be preserved. Information subject to legal hold might be defined by regulators or industry best practice, or there may be a litigation notice from law enforcement or lawyers pursuing a civil action. This means that computer systems may be taken as evidence, with all the obvious disruption to a network that entails.

Answer 63

a means of filtering the relevant evidence produced from all the data gathered by a forensic examination and storing it in a database in a format such that it can be used as evidence in a trial. eDiscovery software tools have been produced to assist this process.

Answer 64

* Identify and de-duplicate files and metadata—many files on a computer system are "standard" installed files or copies of the same file. eDiscovery filters these types of files, reducing the volume of data that must be analyzed. * Search—allow investigators to locate files of interest to the case. As well as keyword search, software might support semantic search. Semantic search matches keywords if they correspond to a particular context. * Security—at all points evidence must be shown to have been stored, transmitted, and analyzed without tampering. * Disclosure—an important part of trial procedure is that the same evidence be made available to both plaintiff and defendant. eDiscovery can fulfill this requirement. Recent court cases have required parties to a court case to provide searchable ESI rather than paper records.

Answer 65

The first phase of a forensic investigation is to document the scene. The crime scene must be thoroughly documented using photographs and ideally audio and video. Investigators must record every action they take in identifying, collecting, and handling evidence. Note: Remember that if the matter comes to trial, the trial could take place months or years after the event. It is vital to record impressions and actions in notes. If possible, evidence is gathered from the live system (including screenshots of display screens and the contents of cache and system memory) using forensic software tools. It is vital that these tools do nothing to modify the digital data that they capture. Note: Also consider that in-place CCTV systems or webcams might have captured valuable evidence. As well as digital evidence, an investigator should interview witnesses to establish what they were doing at the scene, whether they observed any suspicious behavior or activity, and also to gather information about the computer system. An investigator might ask questions informally and record the answers as notes to gain an initial understanding of the circumstances surrounding an incident. An investigator must ask questions carefully, to ensure that the witness is giving reliable information and to avoid leading the witness to a particular conclusion. Making an audio or video recording of witness statements produces a more reliable record but may make witnesses less willing to make a statement. If a witness needs to be compelled to make a statement, there will be legal issues around employment contracts (if the witness is an employee) and right to legal representation.

Answer 66

* CPU registers and cache memory (including cache on disk controllers, GPUs, and so on). * Routing table, arp cache, process table, kernel statistics. * Memory (RAM). * Temporary file systems. * Disk. * Remote logging and monitoring data. * Physical configuration and network topology. * Archival media.

Answer 67

Different OS and different file systems use different methods to identify the time at which something occurred. The benchmark time is Coordinated Universal Time (UTC), which is essentially the time at the Greenwich meridian. Local time is the time within a particular time zone, which will be offset from UTC by several hours (or in some cases, half hours). The local time offset may also vary if a seasonal daylight saving time is in place. NTFS uses UTC "internally" but many OS and file systems record time stamps as the local system time. When collecting evidence, it is vital to establish how a timestamp is calculated and note the offset between the local system time and UTC. Forensics also needs to consider that a computer's system clock may not be properly synchronized to a valid time source or may have been tampered with. Most computers are configured to synchronize the clock to a Network Time Protocol (NTP) server. Closely synchronized time is important for authentication and audit systems to work properly. The right to modify a computer's time would normally be restricted to administrator-level accounts (on enterprise networks) and time change events should be logged.

Answer 68

On a typical network, sensor and logging systems are not configured to record all network traffic, as this would generate a very considerable amount of data. There are certainly protocol analyzers that can do this job, but few organizations would deploy them continually. Most network appliances, such as firewalls and IDS, do log events, and these are likely to be valuable evidence of an intrusion or security breach. On the other hand, an organization with sufficient IT resources could chose to preserve a huge amount of data. A Retrospective Network Analysis (RNA) solution provides the means to record network events at either a packet header or payload level. As well as being used in a legal process, forensics has a role to play in cybersecurity. It enables the detection of past intrusions or ongoing but unknown intrusions by close examination of available digital evidence. A famous quote attributed to former Cisco CEO John Chambers illustrates the point: "There are two types of companies: those that have been hacked, and those who don't know they have been hacked." Counterintelligence is the process of information gathering to protect against espionage and hacking. In terms of cybersecurity, most counterintelligence information comes from activity and audit logs generated by network appliances and server file systems. Analysis of adversary Techniques, Tactics, and Procedures (TTP) provides information about how to configure and audit active logging systems so that they are most likely to capture evidence of attempted and successful intrusions.

Answer 69

the process of obtaining a forensically clean copy of data from a device held as evidence. An image can be acquired from either volatile or non-volatile storage.

Answer 70

To obtain a forensically sound image from non-volatile storage, you need to ensure that nothing you do alters data or metadata (properties) on the source disk or file system. A write blocker assures this process by preventing any data on the disk or volume from being changed by filtering write commands at the driver and OS level. Mounting a drive as read-only is insufficient. A write blocker can be implemented as a hardware device or as software running on the forensics workstation. For example, the CRU Forensic UltraDock write blocker appliance supports ports for all main host and drive adapter types. It can securely interrogate hard disks to recover file system data, firmware status information, and data written to Host Protected Areas (HPA) and Device Configuration Overlay (DCO) areas. HPA is used legitimately with boot and diagnostic utilities. A DCO is normally used with RAID systems to make different drive models expose the same number of sectors to the OS. Both these areas can be misused to conceal data or malware.

Answer 71

A critical step in the presentation of evidence will be to demonstrate that analysis has been performed on an image of the data that is identical to the data present on the disk and that neither data set has been tampered with. The standard means of proving this is to create a cryptographic hash or fingerprint of the disk contents and of the image subsequently made of it.

Answer 72

Once the target disk has been safely attached to the forensics workstation and verified by generating a cryptographic hash of the contents, the next task is to use an imaging utility to obtain a sector-by-sector copy of the disk contents (a forensic duplicate).

Answer 73

Forensic procedures are assisted by having an appropriate software toolkit. These are programs that provide secure drive imaging, encryption, and data analysis. There are commercial toolkits, such as EnCase (https://www.guidancesoftware.com/encase-forensic) and AccessData's Forensic Toolkit (FTK) (https://accessdata.com/products-services/forensic-toolkit-ftk), plus free software, such as Autopsy/The Sleuth Kit (https://www.sleuthkit.org/autopsy).

Answer 74

It is vital that the evidence collected at the crime scene conform to a valid timeline. Digital information is susceptible to tampering, so access to the evidence must be tightly controlled. Depending on the strength of evidence required, physical drives taken from the crime scene can be identified, bagged, sealed, and labeled (using tamper-evident bags). It is also appropriate to ensure that the bags have anti-static shielding to reduce the possibility that data will be damaged or corrupted on the electronic media by ElectroStatic Discharge (ESD). Any other physical evidence deemed necessary is also "bagged and tagged."

Answer 75

form records where, when, and who collected the evidence, who subsequently handled it, and where it was stored. The chain of custody must show access to, plus storage and transportation of, the evidence at every point from the crime scene to the court room. Anyone handling the evidence must sign the chain of custody and indicate what they were doing with it. The evidence should be stored in a secure facility; this not only means access control, but also environmental control, so that the electronic systems are not damaged by condensation, ESD, fire, and other hazards. Similarly, if the evidence is transported, the transport must also be secure.

Answer 76

The purpose of a forensic investigation is to produce a forensics report detailing any matters of interest or potential evidence discovered. All analysis should be performed on a copy of the evidence rather than on the original devices or the secure image created at the crime scene. When analyzing information from hard drives taken as evidence (data recovery), one of the most significant challenges is dealing with the sheer volume of information captured. Within the thousands of files and hundreds of gigabytes there may only be a few items that provide incriminating evidence. Forensic analysis tools help to identify what could be of interest to the forensic examiner.

Answer 77

analysis techniques can assist in this process. Big data refers to large stores of unstructured information. Big data analysis tools use search query like functions to identify patterns and information of interest within unstructured files such as documents and spreadsheets. The contents of the file, plus analysis of the file metadata, including time stamps, can reveal useful information. As well as examining the information on hard drives, big data techniques can also be used to analyze network traffic. Big data analysis tools oriented towards security and computer intrusion detection and forensics will certainly become more widely available over the next few years.

Answer 78

Big data analysis software often includes data visualization tools. Visualization is a very powerful analysis technique for identifying trends or unusual activity. For example, a graph of network activity will reveal unusually high activity from a particular host much more easily than analysis of the raw data packets. A "tag cloud" (a visual representation of how frequently words or phrases appear in a data store) of the information on a hard drive might reveal clues about malicious behavior that could not be found by examining each file individually. Third-party investigators need to keep track of the man hours spent on the investigation and note incidental expenses as part of the billing process. The overall cost of an incident and its investigation is important to establish to feed back into risk assessment. It provides quantitative information about the impact of security incidents and the value of security controls. Establishing the true cost of an incident may also be required in a subsequent claim for compensation against the attacker.

Answer 79

* Develop or adopt a consistent process for handling and preserving forensic data. * Determine if outside expertise is needed, such as a consultant firm. * Notify local law enforcement, if needed. * Secure the scene, so that the hardware is contained. * Collect all the necessary evidence, which may be electronic data, hardware components, or telephony system components. * Observe the order of volatility as you gather electronic data from various media. * Interview personnel to collect additional information pertaining to the crime. * Report the investigation's findings to the required people.

Lesson 14: Explaining Risk Management and Disaster Recovery Concepts Flashcards

(103 cards)