recover Flashcards

Question 1

Q

what is an event

Answer

A

any change of state that has significance for the management of a service. Typically, they are notifications from monitoring tools

Question 2

Q

what is an incident

Answer

A

an unplanned interruption to a service or reduction in the quality of a service.

Question 3

Q

what is a problem
- what is a known error

Answer

A

a cause, or potential cause, of one or more incidents

known errors are problems that have been analysed but not resolved

Question 4

Q

what is incident management
- purpose?

Answer

A

to minimise the negative impacts of incidents by restoring normal service operation as quickly as possible
diagnose and escalate
reactive process
not a proactive measure

Question 5

Q

what is problem management
- purpose?

Answer

A

reduce likelihood and impact of incidents by identifying actual and potential causes of incidents and managing workarounds and known errors
reactive and proactive
same incident occurring many times; affects many users;

Question 6

Q

incident management process (4)

Answer

A

identify
log
categorise
prioritise

Question 7

Q

incident identification

Answer

A

come from:
1. users: walk-ups, self-service, emails, etc
2. alerts: application monitoring software

decide if issue is an incident OR request

Question 8

Q

incident logging

Answer

A

include:
- user’s name and contact information
- incident description
- date and time of incident
- date and time of incident report

Question 9

Q

incident categorisation
- purpose?

Answer

A

assigning a category + at least 1 subcategory
purpose: allows sorting and model incidents, automatic prioritisation; accurate incident tracking and see patterns emerge

Question 10

Q

incident prioritisation

Answer

A

determined by:
1. impact on users and the business: measure extent of potential damage
2. urgency: how quickly a resolution is required to reduce business impact

Question 11

Q

incident tracking status (6)

Answer

A

New
Assigned
In progress
On hold
Resolved
Closed

Question 12

Q

post incident review

Answer

A

check users’ perception
check business process and infrastructure metrics
decide if an underlying problem exists and raise a ticket if necessary (problem management)

Question 13

Q

incident communication

Answer

A

find out what happen
escalation
updates
reporting incident impact and resolution
confirming the resolution with the users

Question 14

Q

user satisfaction surveys
- why?
- success?

Answer

A

good method of monitoring user perception and expectations
key points for success: scope, define, conduct, understand, publish, translate, follow through

Question 15

Q

incident report

Answer

A

basic summary: ticker number, description, impact, resolution time
causes found: technical analysis
actions taken: short-term workarounds, improvements to avoid similar occurrences
post-incident follow up: measurements taken after the fix, eliminate root cause/problem tickets raised, user surveys

Question 16

Q

summary report
- purpose?
- includes?

Answer

A

ensure incident management effective

includes:
- number of incidents
- average resolution time
- type of incident reported
- % of incidents handled within the agreed response time
- % closed by service desk without escalation
- summarise in non-technical language and show where improvements could be made

Question 17

Q

security concerns

Answer

A

incident may occur due to security event (unauthorised access, virus, cyber attack)
elevated system access may need to be granted to resolve incident
data may be lost/leaked

Question 18

Q

support team’s role in IM

Answer

A

Receive and communicate all incidents
Filter out service/change requests
Resolve or escalate incidents as appropriate
Confirm and close tickets
Analyse incident logs
Report on incident trends and suggest improvements

Question 19

Q

types of root causes

Answer

A

(special cause) random root cause:
- hard to track down and fix
- log but no action unless occurs again
(random cause) root cause will produce more incidents if not fixed:
- problems
- find and fix

Question 20

Q

risks

Answer

A

potential incidents that have no manifested yet

Question 21

Q

risk management
- purpose?
- how?

Answer

A

any potential incident is a risk and should be considered as early as possible
ensure reliable enterprise solutions
avoid/mitigate/transfer/accept

Question 22

Q

risk classification

Answer

A

severity (business impact)
likelihood (probability of the event to happen)

Question 23

Q

RTO

Answer

A

recovery time objectives
- maximum agreed acceptable period of time following a service disruption that can elapse before business functions are severely impacted
- how long to recover?

Question 24

Q

RPO

Answer

A

recovery point objectives
- the point to which information used by a business activity must be restored to enable the activity to operate on resumption of the service
- how far back last point where data is in usable format?

Question 25

Q

phases of problem management

Answer

A

problem identification
problem control
error control

Question 26

Q

problem identification

Answer

A

detect duplicate and recurring issues
during major incident, identify risk that an incident could recur
analyse information received that may cause problems like security risks, vendor reports, quality assurance teams

Question 27

Q

problem control

Answer

A

problem analysis (RCA) / troubleshooting
documenting workarounds
documenting known errors

Question 28

Q

troubleshooting process

Answer

A

define problem statement
gather information, data, etc
determine - root cause analysis
recommend solutions for eliminating or mitigating the problem

Question 29

Q

RCA

Answer

A

root cause analysis
- systematic process for identifying ‘root causes’ of problems/incidents and an approach for responding to them
- prevent problems
- pinpoint contributing factors to a problem
- creates RCI & RCR

Question 30

Q

RCA - time analysis

Answer

A

understand what happened and ensure all information is available
get data, sort by date and time, list in time order == look for patterns

Question 31

Q

RCA - fishbone diagram

Answer

A

helps to understand and visualise relationships between causes
helps with troubleshooting documentation
progressively break down potential causes of a problem
1. causes are grouped into categories
2. create possible causes under each category

Question 32

Q

problem response: troubleshoot recommendation

Answer

A

design solution based on analysis
decide & plan implementation
follow change process

Question 33

Q

what is a workaround

Answer

A

solution that reduces/eliminated the impact of an incident/problem for which a full resolution is not yet available

Question 34

Q

error control

Answer

A

manage known errors
identify potential permanent solution
regularly reassess the status of known errors not yet resolved

Question 35

Q

disaster recovery

Answer

A

aims to protect an org from effects of significantly negative events
allows org to maintain or quickly resume mission-critical functions following a disaster

Question 36

Q

ESM role in disaster recovery

Answer

A

Escalating if a situation looks like a potential disaster
Help test DR plans
Check critical business processes
Triage incidents
Check if back to normal

Brainscape's Knowledge GenomeTM

recover Flashcards

Brainscape's Knowledge Genome^TM