Chapter 11: Containment, Eradication, and Recovery Flashcards
Containment, Eradication, and Recovery Phase
Moves an org from the primarily passive incident response activities that take place during Detection and Analysis to more active undertakings
Once an org understands that a cybersecurity incident is underway, they take actions designed to minimize the damage caused by the incident and restore normal operations ASAP
Containment
The first activity that takes place during this phase, and it should begin as quickly as possible after analysts determine that an incident is underway
Containment activities are designed to limit the scope and impact of an incident
Scope of the Incident
* The number of systems or individuals involved in an incident
Impact of the Incident
* The effect that it has on the org
NOTE: Containment means very different things in the context of different types of security incidents
* EX: Let’s say the org is experiencing active exfiltration of data from a credit card processing system
* Incident responders might contain the damage by disconnecting the system from the network, which prevents attackers from continuint exfiltration
* But if the org is experiencing a DoS attack, disconnecting the network connection actually helps the attacker achieve their objective in this case
* Here, containment could be placing filters on an upstream internet connection that blocks all inbound traffic from networks involved in the attack, or blocks web requests that bear a certain signature
Dion’s Containment Priorities
1. Ensure the safety and security of all personnel
2. Prevent an ongoing intrusion or data breach
3. Identify if the intrusion is the primary or secondary attack
4. Avoid alerting the attacker that the attack has been discovered
5. Preserve forensic evidence of the intrusion and attack
Containment Collateral Damage
Containment isn’t always perfect, and as such can cause collateral damage to a business
Using the previous examples:
* Disconnecting a credit card processing system may bring transactions to a halt
* Blocking inbound traffic may render the site inaccessible to leigitmate users
Incident responders undertaking containment strategies must understand the potential side effects of their actions while weighing them against the greater benefit of the org
Containment Strategy Criteria
Selecting the right containment strategies is one of the most difficult tasks facing incident responders
NIST recommends the following criteria to develop an appropriate containment strategy and weigh it against business interests:
* Potential damage to, and theft of, resources
* Need for evidence preservation
* Service availability (EX: network connectivity, service provided to external parties)
* Time and resources needed to implement the strategy
* Effectiveness of the strategy (EX: partial containment, full containment)
* Duration of the solution (EX: emergency workaround to be removed in four hours, temporary workaround to be removed in two weeks, permanent solution)
NOTE: There’s no formula or decision tree that guarantees responders will ever make the right decision during incdient response—understand these criteria, the intent of managament, and technical and busines operating environments to do your best
Network Segmentation
Used as a proactive control in a defense-in-depth approach to infosec, and is used to prevent the spread of future security incidents
It’s also crucial in incident response
During the early stages of an incident, responders may realize that a portion of systems are compromised, but wish to continue to observe the activity on those systems while they determine other appropriate responses
At the same time, they want to protect other systems on the network from potentailly compromised systems
So you could build a separate VLAN that contains those systems, keeps them somewhat isolated, and allows continued live analysis
Isolation
Network segmentation might not go far enough to meet containment objectives, and in those cases analysts may decided to use strong isolation practices to cut off an attack
Isolation is a mitigation strategy that involves removing an affected component from a larger environment
There are two primary isolation techniques:
1. Isolating Affected Systems
2. Isolating the Attacker
Isolating Affected Systems
This is taking network segmentation to the next level
Affected systems are completely disconnected from the remainder of the network, although they may still be able to communicate with each other and the attacker of the internet
With an isolation approach, the quarantine network connects direct it to the internet and has no access to other sysems
It may be implemented by altering firewall rules rather than bypassing a firewall directly, but the objective remains the same regardless:
* Allow the attacker to access the isolated systems but restrict their ability to access other systems and cause further damage
NOTE: This technique can be used to physically and logically isolate extremely sensitive systems from other networks—commonly referred to as an airgapped system
Isolating the Attacker
A variation on the isolation strategy—depends on the use of sandbox systems that are set up purely to monitor attacker activity and don’t contain any information or resources of value to the attacker
Placing attackers in a sandbox environment allows continued observation in a fairly safe, contained environemtn
Some orgs will use honeypot systems for this
Removal
Removal of compromised systems from the network is the strongest containment technique in the analysts incident response toolkit
It’s different from segmentation and isolation in that the affected systems are completely disconnected from other networks, although they may still be allowed to communicate with other compromised systems within the quarantine VLAN
In some cases, each suspect system may be physically disconnected from the network so that they’re prevented from communicating even with each other
The exact details of removal will depend on the circumstances of the incident and the professional judgment of incident responders
NOTE: NIST goes to great lenghts to reinforce that removal isn’t foolproof, despite being a strong weapon
Evidence Acquisition and Handling
The primary objective during the containment phase of incident response is to limit the damage to the org and its resources
While that takes precedence over other goals, responders still need to gather evidence during containment
This evidence can be crucial in continuing analysis of the incident for internal purposes, or it can be used during legal proceedings against the attacker
NIST recommends that investigators maintain a detailed evidence log that includes the following:
* Identifying information like the location, serial number, model number, hostname, MAC, and IP of a computer
* Name, title, and phone number of each individual who collected or handled the evidence during the investigation
* Time, time zone, and date of each occurrence of evidence handling
* Locations where the evidence was stored
Identifying Attackers
This is a complex task to accomplish, so before heading down the road of investigating an attack’s origin, as why you’re pursuing it
Is there really business value in uncovering who attacked you, or would your time be better spent on containment, eradication, and recovery activities?
NIST says:
* “Identifying an attacking host can be a time-consuming and futile process that can prevent a team from achieving its primary goal—minimizing the business impact”
NOTE: LEO may approach this situation with objectives that differ from those of the attacked org’s analysts—their responsibilities may conflict with the core cybersecurity objectives of containment, eradication, and recovery, so always take that into account before involving LEO
Eradication
Once the incident is contained, it’s time to move on to eradicaton
The primary purpose of the eradication phase is to remove any of the artifacts of the incident that may remain on the org’s netowkr
This could include the removal of any malicious code from the network, sanitization of compromised media, and the security of compromised user accounts
Dion’s Notes
* The simplest option for eradicating a contaminated system is to replace it with a clean image from a trusted source
* However, you can’t always format the HD because some malware can bypass this
* Make sure you do proper sanitization and disposal
Three Eradication Methods
* Reconstruction: A method of restoring a system that has been sanitized using scripted installation routines and templates
* Reimaging: A method of restoring a system that has been sanitized using an image-based backup
* Reconstitution: Restoring a system that can’t be sanitized using manual removal, reinstallation, and monitoring processes
Seven Steps for Reconstitution
1. Analyze processes and network activity for signs of malware
2. Terminate suspicious processes and securely delete them from the system
3. Identify and disable autostart locations to prevent processes from executing
4. Replace contaminated processes with clean versions from trusted media
5. Reboot the system and analyze for signs of continued malware infection
6. For continued malware infection, analyze firmware and USB devices for infection
7. If tests are negative, reintroduce the system to the production environment
Recovery
The recovery phase of incident response focuses on restoring normal capabilities and services
It includes reconstituting resources and correcting security control deficiencies that may have led to the attack—these are the actions taken to ensure that hosts are fully reconfigured to operate like before the incident occurred
This could include rebuilding and patching systems, reconfiguring firewalls, updating malware signatures, etc
The goal of recover is not just to rebuild the org’s network, but also to do so in a way that reduces the likelihood of a successful future attack
Recovery Is Long and Challenging
It’s the longest and most challenging part of the IR, and to ensure it’s done right you must:
* Restore from known good backup
* Reinstall the OS
* Potentially buy new, trusted equipment
* Harden devices
* Change passwords
* Increase security
Four Main Types of Recovery Actions
* Patching: Installing a set of changes to a computer program or its supporting data designed to update, fix, or improve it
* Permissions: All types of persmissions should be reviewed and reinforced after an incident
* Logging: Ensure that scanning, monitoring, and log retrieval systems are functioning properly following the incident
* System Hardening: The process of securing a system’s configuration and settings to reduce IT vulnerability and the possibility of being compromised
Dion’s Three Mottos for Hardening
1. Uninstall anything you aren’t using—hardware, sowftware, programs, etc
2. If you need it, patch it frequently—scan, patch, scan
3. Always restrict users to the least privilege
Understand the Root Cause
During both eradication and recovery efforts, always work to develop a clear understanding of the incident’s root cause
This is criticual to implementing a secure recovery that corrects control deficiencies that led to the original attack
Understanding the root cause of an attack is completely different thatn identifying the attacker
This process is also known as implementing compensating controls, because those controls compensate for the original security deficiency
NOTE: Root cause analysis can also help identify other systems in your environment that share the same vulnerability—if a Cisco router that has a device configuration is compromised, you can go fix all your other Cisco routers that are the same
Remediation and Reimaging
Once an attacker gains control of a system, consider it completely compromised and untrustworthy—it’s not safe to simply correct the security issue and move on, because the attacker can still have an undetected foothold on it
The system should be rebuilt, either from scratch or by using an image or backup of the system from a known, good, and secure state
Rebuilding and/or restoring should always be done with the incident root cause analysis in mind
* If a system was compromised because it contained a security vulnerability, backups and images of that system likely have the same vulnerability
* Rebuiding the system from scratch can reintroduce the same vulnerability as well, rendering the same system susceptible to the same attack
Patching Systems and Applications
During incident recovery, analysts will patch OS and apps involved in the attack
This is also a good opportunity to review the security patch status of all systems in the org and address other security issues lurking behind the scenes
First, focus your efforts on systems that were directly involved in the compromise, and then work your way outward
* As you go outward, addrress systems that were indirectly related to the compromise before toucning systems that weren’t involved at all
Sanitization and Secure Disposal
During the recovery effort, you may need to dispose of or repurpose media from systems that were compromised during the incident
In those cases, special care should be taken to ensure that sensitive information that was stored on the media isn’t compromised
There are three, general options available for the secure disposition of media containing sensitive information, defined by NIST in SP 800-88 Guidelines for Media Sanitization:
1. Clear
2. Purge
3. Destroy
Dion’s Notes
* Cryptographic Erase (CE): For self-encrypting drives
* Zero-Fill: For magnetic drives
* Secure Erase (SE): For SSD
Page 409 flowchart
Clear
Applies logical techniques to sanitize data in all user-addressable storage locations for protection against simple noninvasive data recovery techniques
This is typically applied through standard Read and Write commands to the storage device:
* EX: Rewriting with a new value or using a menu option to reset the device to the factory state (where rewriting isn’t supported)
Purge
Applies physical or logical techniques that render target data recovery infeasible useing state-of-the-art lab techniques
Examples of purging activities include:
* Overwriting
* Block erase
* Cryptographic erase
Degaussing
* Another form of purging that uses extremely strong magnetic fields to disrupt the data stored on a device
Destroy
Renders target data recovery infeasible using state-of-the-art lab techniques
Results in the subsequent inability to use the media for storage of data
Destruction techniques include:
* Disintegration
* Pulverization
* Melting
* Incinerating
* Drill thru drive
Validating Data Integrity
Before concluding the recovery effort, incident responders should take time to verify that the recovery measures put in place were successful
The exact nature of this verification will depend on the technical circumstances of the incident and the org’s infrastructure
Five activities that should always be included in these validation efforts include:
Validate that only authorized user accounts exist on every system and app in the org
* Often, orgs already undertake periodic account reviews that verify the authorization for every account
* This process should be used during the recovery validation effort
Verify the proper resotration of permissions assigned to each account
* During the account review, responders should verify that accounts don’t have extraneous permissions that violate the least privilege principle
* This is true for normal user, admin, and service accounts
Verify the integrity systems and data
* Confirm that systems involved in the incident are properly configured to meet security standards, and that no unauthorized changes have been made to settings or data
* You may need to restore systems from backups taken prior to the incident
Verify that all systems are logging properly
* Every system and app should be configured to log security-related information to a level that’s consistent with the org’s logging policy
* Log records should be sent to a centralized log repository that preserves them for archival use
* The validation phase should include verification that these logs are properly configured and received by the repository
Conduct vulnerability scans on all systems
* Vulnerability scans play an important role in verifying that systems are safeguarded against future attacks
* Run thorough scans against systems and initiate remediation workflows where necessary
Closing Out the Incident Response
After containment, eradication, and recovery are complete, the CSIRT isn’t quite finished—you have to complete post-incident activities, including:
* Managing Change Control Processes
* Conducting a Lesons Learned Session
* Creating a Formal Written Incident Report
Managing Change Control Processes
During containment, eradication, and recovery, responders may have bypassed the org’s normal change control and configuration management proceses in order to respond to the incident quickly
Once the urgency of response efforts pass, responders should turn back to these processes and use them to document any emergency changes made during the incident response effort
Conducting a Lessons Learned Session
At the conclusion of an incident, everyone involved in the response should participate in a formal lessons learned session that uncovers critical information about the response
The session should also highlight potential defeciencies in the incident response plan and procedures
During this session, the org may uncover potential changes to the incident response plan—in those cases, the leader should propose those changes and move them through the org’s formal change process to improve future incident response efforts
As part of the lessons learned review, the team should:
* Clearly identify any new IoC
* Make recommendations for updating the org’s security monitoring program to include those IoC
Developing a Final Report
Every incident that activates the CSIRT should conclude with a formal written report that documents that incident
This serves several important purposes:
1. Creates an institutional memory of the incident that’s useful when developing new security controls and training new team members
2. Serves as an important record of the incident if there’s legal action as a result of the incident
3. The act of creating the report can help identify previously undetected deficiencies in the incident response process
Important elements that the report should include:
* Chronology of events for the incident and response efforts
* Root cause of the incident
* Location and description of evidence collected during the incident response process
* Specific actions taken by responders to contain, eradicate, and recover form the incident, including the rationale for those decisions
* Estimates of the impact of the incident on the org and its stakeholders
* Results of post-recovery validation efforts
* Documentation of issues identified during the lessons learned review
NOTE: Reports should be classified in accordance with the org’s classification policy and stored in a secured manner