Previous Section   Next Section

5.6 Incident Management activities

This section discusses in more detail the six activities encapsulated in Section 5.3 (Basic concepts), namely:

Each of these is discussed in more detail below.

5.6.1 Incident detection and recording

Incident details from Service Desk or event management systems are the inputs for Incident Management. Resultant actions are to:

Outputs will be:

All Incidents should be recorded: automatic generation of 'skeleton Incident records' in an Incident database by a system-monitoring tool is the ideal solution to this requirement. Symptoms, basic diagnostic data, and information about the related Configuration Item should be included in Incident records during detection and recording. Annex 5C illustrates the scope of data to be captured in records during the entire Incident Management process. This data is required both for Incident resolution/recovery and for management information on Incident types and trends.

In the past, it has been common practice for all Incidents to be reported to the Service Desk, where personnel manually created a record in the Incident database. Where this was not practical or possible support groups have been allowed to record Incidents manually; in this case the Service Desk received alerts so that they were informed about possible degradation of services. With modern technology, however, Incidents can nowadays be reported by various means, including the ability for Users to log Incidents directly to the system. But the fundamental requirement remains that these Incidents should still all reach the Incident Management database and that the Service Desk should receive appropriate alerts and maintain overall control - Incident monitoring remains the responsibility of the Service Desk.

An alert to the Service Manager is required in the case of serious degradation of service levels, in case it is necessary to take special action.

An Incident should be handled in conformance with standard SLM procedures. These specific procedures do not fall within the scope of the Incident Management process.

5.6.2 Classification and initial support

Inputs:

Incident records raised in the previous activity are now analysed to discover the reason for the Incident. The Incident should also be classified, the process on which further resolution actions are based. Annex 5A provides some examples of classification codes.

Actions:

Outputs:

Classification is the process of identifying the reason for the Incident and hence the corresponding resolution action. Many Incidents are regularly experienced and the appropriate resolution actions are well known. This is not always the case, however, and a procedure for matching Incident classification data against that for Problems and Known Errors is necessary. Successful matching gives access to proven resolution actions, which should require no further investigation effort.

Classification is one of the most important aspects of Incident Management (and often one of the most difficult to get right). The classification is used to:

The final classification(s) may vary from the initially reported classification because end Users are only able to report symptoms of the Incident rather than the root Problem. The levels of classification will vary depending on the detail required. For example, a top-level classification of 'Word Processing', or 'Payroll Service' is adequate for an overview; however, it may then be necessary to obtain greater detail in areas such as:

As much information as possible should be provided when classifying Incidents. Classification data contributing to the matching process includes:

The process of classification and matching allows Incident Management to be carried out with more speed and minimum recourse to support. The classification-matching process is an ideal application area for the use of so-called expert software.

The Service Desk collects information about affected CIs and therefore should be able to detect inconsistencies in the CMDB when asking a User for configuration id numbers, serial numbers and so on. If inconsistencies are discovered, an exception report should be raised and the Configuration Management process informed. This can take place automatically via the Incident Management software or by reporting on a daily basis.

One of the important aspects of managing an Incident is to define its priority: how important is it and what is the impact on the business. The responsibility for definition lies with Service Level Management within the parameters sets in the SLA. The priority with which Incidents need to be resolved, and therefore the amount of effort put into the resolution of and recovery from Incidents, will depend upon:

'Impact' is a measure of the business criticality of an Incident or Problem, often equal to the extent to which an Incident leads to degradation of agreed service levels. Impact is often measured by the number of people or systems affected. Criteria for assigning impact should be set up in consultation with the business managers and formalised in SLAs.

When determining impact, information in the CMDB should be accessed to detect how many Users will suffer as a result of the technical failure of, for example, a hardware component. The Service Desk should have access to tools that enable it rapidly to:

'Urgency' is about the necessary speed of solving an Incident of a certain impact. A high-impact Incident does not, by default, have to be solved immediately. For example a User having operational difficulties with his workstation (impact 'high') can have the fault registered with urgency 'low' if he is leaving the office for a fortnight's holiday directly after reporting the Incident.

'Priority' is defined by expected effort. An Incident with a low impact and average urgency that can be resolved with minor effort will be resolved immediately in most organisations (e.g. a password reset).

Initial support involves resolution of the Incident to the satisfaction of the Customer by the Service Desk. The resolution may be derived from several areas, including:

After this, little further action is required by the Service Desk other than recording details of the resolution, the classification and Customer satisfaction.

Tip:

In the event that classification matching is unsuccessful, or the resolution process is complex, investigation and diagnosis by a support group is the next step.

Although responsibility for resolution is handed over to another support group, the Service Desk should retain ownership of the Incident, and manage it until it is resolved to the Customer's satisfaction.

5.6.3 Investigation and diagnosis

Inputs:

Actions:

Outputs:

Wherever possible, the relevant User should be provided with the means to continue business, perhaps via a degraded service. An example could be that faulty printers might necessitate printing taking place at another more distant location. The effect of such a Work-around is to minimise the impact of the Incident on the business and to provide more time to investigate and devise a structural resolution. Temporary Work-arounds may have to be advised to other Users too.

Once the Incident has been assigned to a support group, it should:

Investigation and diagnosis may become an iterative process, starting with a different specialist support group and following elimination of a previous possible cause. It may involve multisite support groups and support staff from different vendors. It may continue overnight with a new shift of support staff taking over the next day. All this demands a rigorous, disciplined approach and a comprehensive record of actions taken with corresponding results.

Tip:

If it is not clear which support group should investigate or resolve a User-related Incident, the Service Desk, as the owner of all Incidents, should coordinate the Incident Management process. If there are differences of opinion or there are any other issues arising, then the Service Desk should escalate the Incident to the Problem Management team.

Annex 5D shows a typical process of Incident investigation. Continual expansion of the Incident record should occur, with each progress point logging the action taken in a progress summary.

5.6.4 Resolution and recovery

Inputs:

Actions:

Outputs:

After successful execution of the resolution or some circumvention activity, service recovery can be effected and recovery actions carried out, often by specialist staff (second- or third-level support). The Incident Management system should allow for the recording of events and actions during the resolution and recovery activity.

5.6.5 Incident closure

Inputs:

Actions:

Outputs:

When the Incident has been resolved, the Service Desk should ensure that:

Tips:

5.6.6 Ownership, monitoring, tracking and communication

Inputs:

Actions:

Outputs:

The Service Desk is responsible for owning and overseeing the resolution of all outstanding Incidents, whatever the initial source, by the following procedure to:

Following this procedure will help to guarantee that each individual Incident will be resolved within agreed timeframes or, at least, as soon as possible. Larger Service Desks should consider the establishment of a dedicated team for Incident monitoring and tracking.

In the event that an Incident fails to achieve satisfactory progress, the Service Desk should act in accordance with well-defined escalation procedures. These procedures should be agreed on by all support groups. In practice, it is important to be aware of support staff becoming too engrossed in an Incident, spending much time on diagnostics gathering, and consequently losing sight of the immediate User need; in all circumstances, when agreed escalation thresholds have been exceeded (which are defined in SLAs), action should be taken to escalate the matter regardless of the views of support staff.

Tips:

Previous Section   Next Section