Week 6: Problem Management Flashcards
What is a Post-Incident Review for?
To discuss:
- Why an incident happened
- Its impacts
- Actions taken to prevent it
- How to prevent it from happening again
What would be considered a “risk”?
Any potential incident
How are risks classified?
Based on severity and likelihood
What are the ways to manage risk?
- Avoidance
- Mitigation
- Transferrence
- Acceptance
What is RTO?
Recovery time objectives - maximum agreed acceptable period of time following a service disruption that can elapse before business functions are severely impacted.
What is RPO?
Recovery point objective - the point to which information used by a business activity must be restored to enable the activity to operate on resumption of the service
What constitutes an “event”?
An observable occurrence that might indicate a POTENTIAL problem. It could also be a change in status of a system/service/application, usually in the form of log entries, notifications, alerts etc.
What is the difference between an event and incident?
Events are simply observable occurrences that could indicate a potential issue, while incidents are events that have already caused a negative impact on system performance, and therefore require resolution
What are the phases of problem management?
- Problem identification and logging
- Problem control
- Error control
What is the troubleshooting process in problem management?
Define, Gather, Determine, Recommend
When is the best time to stop RCA?
When you have control over the improvement
2 possible methods to do RCA?
- Timeline analysis
- Fishbone diagram (Ishikawa, cause-and-effect)