Lesson 11 A Implement Best Practice Documentation Flashcards
A standard operating procedure (SOP)
Employees must understand how to use computers and networked services securely and safely and be aware of their responsibilities. To support this, the organization needs to create written policies and procedures to help staff understand and fulfill their tasks and follow best practice:
- A policy is an overall statement of intent.
- A standard operating procedure (SOP) is a step-by-step list of the actions that must be completed for any given task to comply with policy. Most IT procedures should be governed by SOPs.
- Guidelines are for areas of policy where there are no procedures, either because the situation has not been fully assessed or because the decision-making process is too complex and subject to variables to be able to capture it in a SOP. Guidelines may also describe circumstances where it is appropriate to deviate from a specified procedure.
Typical examples of SOPs are as follows:
- Procedures for custom installation of software packages, such as verifying system requirements, validating download/installation source, confirming license validity, adding the software to change control/monitoring processes, and developing support/training documentation.
- New-user setup checklist as part of the onboarding process for new employees and employees changing job roles. Typical tasks include identification/enrollment with secure credentials, allocation of devices, and allocation of permissions/assignment to security groups.
- End-user termination checklist as part of the offboarding process for employees who are retiring, changing job roles, or have been fired. Typical tasks include returning and sanitizing devices, releasing software licenses, and disabling account permissions/access.
ticketing system
A ticketing system manages requests, incidents, and problems. Ticketing systems can be used to support both internal end-users and external customers.
The general process of ticket management is as follows:
- A user contacts the help desk, perhaps by phone or email or directly via the ticketing system. A unique job ticket ID is generated, and an agent is assigned to the ticket. The ticket will also need to capture some basic details:
- User information—The user’s name, contact details, and other relevant information such as department or job role. It might be possible to link the ticket to an employee database or customer relationship management (CRM) database.
- Device information—If relevant, the ticket should record information about the user’s device. It might be possible to link to the relevant inventory record via a service tag or asset ID. - The user supplies a description of the issue. The agent might ask clarifying questions to ensure an accurate initial description.
- The agent categorizes the support case, assesses how urgent it is, and determines how long it will take to fix.
- The agent may take the user through initial troubleshooting steps. If these do not work, the ticket may be escalated to deskside support or a senior technician.
Categories
Categories and subcategories group related tickets together. This is useful for assigning tickets to the relevant support section or technician and for reporting and analysis.
Service management standards distinguish between the following basic ticket types:
- Requests are for provisioning things that the IT department has a SOP for, such as setting up new user accounts, purchasing new hardware or software, deploying a web server, and so on. Complex requests that aren’t covered by existing procedures are better treated as projects rather than handled via the ticketing system.
- Incidents are related to any errors or unexpected situations faced by end-users or customers. Incidents may be further categorized by severity (impact and urgency), such as minor, major, and critical.
- Problems are causes of incidents and will probably require analysis and service reconfiguration to solve. This type of ticket is likely to be generated internally when the help desk starts to receive many incidents of the same type.
Using these types as top-level categories for an end-user facing system is not always practical, however. End-users are not likely to know how to distinguish incidents from problems, for example. Devising categories that are narrow enough to be useful but not so numerous as to be confusing or to slow down the whole ticketing process is a challenging task.
One strategy is for a few simple, top-level categories that end-users can self-select, such as New Device Request, New App Request, Employee Onboarding, Employee Offboarding, Help/Support, and Security Incident. Then, when assigned to the ticket, the support technician can select from a longer list of additional categories and subcategories to help group related tickets for reporting and analysis purposes. Alternatively, or to supplement categories, the system might support adding standard keyword tags to each ticket. A keyword system is more flexible but does depend on each technician tagging the ticket appropriately.
TICKET MANAGEMENT
After opening an incident or problem ticket, the troubleshooting process is applied until the issue is resolved. At each stage, the system must track the ownership of the ticket (who is dealing with it) and its status (what has been done).
This process requires clear written communication and might involve tracking through different escalation routes.
Severity
A severity level is a way of classifying tickets into a priority order. As with categories, these should not be overcomplex. For example, three severity levels based on impact might be considered sufficient:
- Critical incidents have a widespread effect on customers or involve potential or actual data breach.
- Major incidents affect a limited group of customers or involve a suspected security violation.
- Minor incidents are not having a significant effect on customer groups.
More discrete levels may be required if the system must prioritize hundreds or thousands of minor incidents per week. A more sophisticated system that measures both impact and urgency might be required. Severity levels can also drive a notification system to make senior technicians and managers immediately aware of major and critical incidents as they arise.
Escalation
Escalation occurs when an agent cannot resolve the ticket. Some of the many reasons for escalation include:
- The incident is related to a problem and requires analysis by senior technicians or by a third-party/warranty support service.
- The incident severity needs to be escalated from minor to major or major to critical and now needs the involvement of senior decision-makers.
- The incident needs the involvement of sales or marketing to deal with service complaints or refund requests.
The support team can be organized into tiers to clarify escalation levels. For example:
- Tier 0 presents self-service options for the customer to try to resolve an incident via advice from a knowledge base or “help bot.”
- Tier 1 connects the customer to an agent for initial diagnosis and possible incident resolution.
- Tier 2 allows the agent to escalate the ticket to senior technicians (Tier 2 – Internal) or to a third-party support group (Tier 2 – External).
- Tier 3 escalates the ticket as a problem to a development/engineer team or to senior managers and decision-makers.
The ticket owner is the person responsible for managing the ticket. When escalating, ownership might be re-assigned or not. Whatever system is used, it is critical to identify the current owner. The owner must ensure that the ticket is progressed to meet any deadlines and that the ticket requester is kept informed of status.
Clear Written Communication
Free-form text fields allow ticket requesters and agents to add descriptive information. There are normally three fields to reflect the ticket life cycle:
- Problem description records the initial request with any detail that could easily be collected at the time.
- Progress notes record what diagnostic tools and processes have discovered and the identification and confirmation of a probable cause.
- Problem resolution sets out the plan of action and documents the successful implementation and testing of that plan and full system functionality. It should also record end-user or customer acceptance that the ticket can be closed.
At any point in the ticket life cycle, other agents, technicians, or managers may need to decide something or continue a troubleshooting process using just the information in the ticket. Tickets are likely to be reviewed and analyzed. It is also possible that tickets will be forwarded to customers as a record of the jobs performed. Consequently, it is important to use clear and concise written communication to complete description and progress fields, with due regard for spelling, grammar, and style.
- Clear means using plain language rather than jargon.
- Concise means using as few words as possible in short sentences. State the minimum of fact and action required to describe the issue or process.
Incident Reports
For critical and major incidents, it may be appropriate to develop a more in-depth incident report, also referred to as an after-action report (AAR) or as lessons learned. An incident report solicits the opinions of users/customers, technicians, managers, and stakeholders with some business or ownership interest in the problem being investigated. The purpose of an incident report is to identify underlying causes and recommend remediation steps or preventive measures to mitigate the risk of a repeat of the issue.
Asset management uses a catalog
of hardware and software to implement life-cycle policies and procedures for provisioning, maintaining, and decommissioning all the systems that underpin IT services.
It is crucial for an organization to have an inventory list of its tangible and intangible assets and resources. The tangible inventory should include all hardware that is currently deployed as well as spare systems and components kept on hand in case of component or system failure. The intangible asset inventory includes software licenses and data assets, such as intellectual property (IP).
asset-management database system
There are many software solutions available for tracking and managing inventory. An asset-management database system can be configured to store details such as type, model, serial number, asset ID, location, user(s), value, and service information. An inventory management suite can scan the network and use queries to retrieve hardware and software configuration and monitoring data.
Asset Tags and IDs
For an inventory database to work, each instance of an asset type must be defined as a record with a unique ID. The physical hardware device must be tagged with this ID so that it can be identified in the field. An asset tag can be affixed to a device as a barcode label or radio frequency ID (RFID) sticker. Barcodes allow for simpler scanning than numeric-only IDs. An RFID tag is a chip programmed with asset data. When in range of a scanner, the chip powers up and signals the scanner. The scanner alerts management software to update the device’s location. As well as asset tracking, this allows the management software to track the location of the device, making theft more difficult.
Network Topology Diagrams
A diagram is the best way to show how assets are used in combination to deliver a service. In particular, a network topology diagram shows how assets are linked as nodes. A topology diagram can be used to model physical and logical relationships at different levels of scale and detail.
In terms of the physical network topology, a schematic diagram shows the cabling layout between hosts, wall ports, patch panels, and switch/router ports. Schematics can also be used to represent the logical structure of the network in terms of security zones, virtual LANs (VLANs), and Internet Protocol (IP) subnets.
Schematics can either be drawn manually by using a tool such as Microsoft Visio or compiled automatically from network mapping software.
ASSET DOCUMENTATION
An asset procurement life cycle identifies discrete stages in the use of hardware and software:
- Change procedures approve a request for a new or upgraded asset, taking account of impacts to business, operation, network, and existing devices.
- Procurement determines a budget and identifies a trusted supplier or vendor for the asset.
- Deployment implements a procedure for installing the asset in a secure configuration.
- Maintenance implements a procedure for monitoring and supporting the use of the asset.
- Disposal implements a procedure for sanitizing any data remnants that might be stored on the asset before reusing, selling, donating, recycling, or destroying the asset.