AVAILABILITY MANAGEMENT: 8.6 Availability improvement

8.6 Availability improvement

8.6.1 Important considerations
8.6.2 Methods and techniques
8.6.3 The Availability Plan

A key output from the Availability Management Process is the creation of an Availability Plan.

The Availability Plan should be a long-term plan for the proactive improvement of IT Availability within the imposed Cost constraints.

The impetus to improve Availability comes from one or more of the following:

the inability for a new IT Service to meets its SLA on a consistent basis
period(s) of IT Service instability resulting in unacceptable levels of Availability
Availability measurement trends indicating a gradual deterioration in Availability
unacceptable IT Service recovery and restoration time
requests from the business to increase the level of Availability provided
increasing Impact on the business and its Customers from IT Service failures as a result of growth and/or increased Business functionality
a request from SLM to improve Availability as part of an overall SIP
Availability Management monitoring and trend analysis.

8.6.1 Important considerations

Availability Management monitoring and trend analysis

Availability Management should take a proactive Role in identifying and progressing cost justified Availability improvement opportunities. The ability to do this places reliance on having appropriate and meaningful Availability measurement and reporting.

To ensure Availability improvements deliver benefits to the business and Users it is important that Availability measurement and reporting reflects not just IT component Availability but Availability from a business operation and User perspective. (See Section 8.7 for additional guidance).

Determining (changed) Availability requirements

Where the business has a requirement to improve Availability the process outlined in Section 8.5 should be followed to reassess the IT Infrastructure and IT support organisation capability to meet these enhanced requirements.

An output of this activity is enhanced Availability and recovery design criteria.

The cost of improving Availability

To satisfy the business requirement for increased levels of Availability may require additional financial investment to enhance the underpinning IT Infrastructure and/or extend the Services provided by the IT support organisation.

It is important that any additional investment to improve the levels of Availability delivered can be cost justified. Determining the cost of an IT failure(s) can help support any financial investment decision. Section 8.4 provides additional guidance on how to derive the cost of IT failure.

However, a key benefit of Availability Management is the opportunity to optimise the Availability of the IT Infrastructure to deliver an improved level of Availability with reduced cost. This optimisation approach is a sensible first step to deliver better value for money. A range of Availability Management methods and techniques can be applied to identify the potential for improved levels of Availability at a much lower cost.

8.6.2 Methods and techniques

There are a number of methods and techniques that can be utilised to identify Availability improvement opportunities. These are described as follows:

Component Failure Impact Assessment

Component Failure Impact Assessment (CFIA) can be used to predict and evaluate the impact on IT Service arising from component failures within the IT Infrastructure. The output from a CFIA can be used to identify where additional Infrastructure resilience should be considered to prevent or minimise the impact of component failure to the business operation and Users.

Fault Tree Analysis

Fault Tree Analysis (FTA) is a technique that can be used to determine the chain of events that causes a disruption to IT Services. FTA in conjunction with calculation methods can offer detailed models of Availability. This can be used to assess the Availability improvement that can be achieved by individual IT Infrastructure design options.

CRAMM

CRAMM can be used to identify new risks and provide appropriate countermeasures associated with any Change to the business Availability requirement and revised IT Infrastructure design. See Paragraph 8.9.3 for fuller information on CRAMM.

Systems Outage Analysis

Systems Outage Analysis (SOA) is a technique designed to provide a structured approach to identifying the underlying causes of service interruption to the User. SOA utilises a range of data sources to assess where and why shortfalls in Availability are occurring. SOA enables an holistic view to be taken to drive not just IT Infrastructure improvements but improvements to the IT support organisation process, procedures and tools.

SOA is run as an assignment and may utilise other Availability Management methods and techniques to formulate the recommendations for improvement.

The Expanded Incident 'Lifecycle'

An aim of Availability Management is to ensure the duration and impact from Incidents impacting IT Service are minimised ,to enable business operations to resume as quickly as is possible.

The expanded Incident 'lifecycle' enables the total IT Service downtime for any given Incident to be broken down and mapped against the major stages that all Incidents progress through (the lifecycle).

This makes it possible to identify where 'time is being lost' and provides the basis for the identification of improvements that can improve recovery and restoration times.

Continuous Improvement

Availability Management can play an important role in helping the IT support organisation recognise where they can add value by exploiting their technical skills and competencies in an Availability context. The continuous improvement technique can be used by Availability Management to harness this technical capability. This can be used with either small groups of technical staff or a wider group within a workshop Environment.

Technical Observation Post

A Technical Observation Post (TOP) is a prearranged gathering of specialist technical support staff from within the IT support organisation brought together to focus on specific aspects of IT Availability. Its purpose being to monitor events, real time as they occur, with the specific aim of identifying improvement opportunities or bottlenecks which exist within the current IT Infrastructure.

For more detailed information and guidance on how these methods and techniques can be deployed please refer to Section 8.9.

8.6.3 The Availability Plan

To provide structure and aggregation of the wide range of initiatives that may need to be undertaken to improve Availability, these should be formulated within a single Availability Plan.

The Availability Plan should have aims, objectives and deliverables and should consider the wider issues of people, process, tools and techniques as well as having a technology focus. In the initial stages it may be aligned with an implementation plan for Availability Management, but the two are different and should not be confused.

As the Availability Management process matures the plan should evolve to cover the following:

Actual levels of Availability versus agreed levels of Availability for key IT Services. Where possible Availability measurements should be business focused to report Availability as experienced by the business and User.
Activities being progressed to address shortfalls in Availability for existing IT Services. Where investment decisions are required, options with associated costs and benefits should be included.
Details of changing Availability requirements for existing IT Services. The plan should document the options available to meet these Changed requirements. Where investment decisions are required the associated costs of each option should be included.
Details of the Availability requirements for forthcoming new IT Services. The plan should document the options available to meet these new requirements. Where investment decisions are required the associated costs of each option should be included.
A forward looking schedule for the planned SOA assignments.
Regular reviews of SOA assignments should be completed to ensure that Infrastructure Availability is being proactively improved.
A technology futures section to provide an indication of the potential benefits and exploitation opportunities that exist for planned technology upgrades. Anticipated Availability benefits should be detailed, where possible based on business focused measures. The effort required to realise these benefits where possible should also be quantified.

During the production of the Availability Plan, it is recommenced that liaison with the following functional areas is undertaken:

Service Level Management, concerning changing business and User requirements for existing IT Services
IT Service Continuity Management concerning business impact and resilience improvements
Business Relationship Management to understand major Customer concerns and/or future needs that relate to IT Availability
Capacity Management, concerning the scenarios for upgrading (or downgrading) the software, hardware and network layers of the IT Infrastructure
IT Financial Management concerning the cost and budget implications of the various options identified for Availability improvement
Application Management, concerning the Availability requirements for new services
areas responsible for IT supplier management and the managing of relationships and contracts with suppliers
technical support groups responsible for testing and maintenance functions, concerning the reliability and maintainability of existing services.

The Availability Plan should cover a period of one to two years with a more detailed view and information for the first six months. The plan should be reviewed regularly with minor revisions every quarter and major revisions every half year. Where the IT Infrastructure is only subject to a low level of Change this may be extended as appropriate.

It is recommended that the Availability Plan is considered complementary to the Capacity Plan and publication aligned with the Capacity and business Budgeting cycle.

If a demand is foreseen for high levels of Availability that cannot be met due to the constraints of the existing IT Infrastructure or budget, then exception reports may be required for the attention of both senior IT and business management.

Hints and Tips

There is potential for confusion on the purpose of an Availability Plan versus a Service Improvement Programme.

The Availability Plan is a forward looking plan aimed at improving the overall Availability of the IT Infrastructure to ensure that existing and future levels of Availability can be provided on a timely and cost effective basis.

The Availability Plan is a key output and deliverable of the Availability Management process. It is reviewed and revised on an ongoing basis. Improvements to the IT Infrastructure may benefit many IT Services where common Infrastructure is utilised.

Service Level Management is responsible for instigating service improvements that improve the overall quality of the whole IT Service provision.

The mechanism used to achieve this is the Service Improvement Programme (SIP). This SIP should be used to co-ordinate all IT Service improvement opportunities into an overall programme of improvement activities.

The SIP has a defined start and end and its scope can include all elements of the service provided to the business. This may include Availability, but equally may not dependent on the overall service indicators and the areas of desired improvement.

Availability Management can play a key role in supporting a SIP by the appropriate use of Availability Management techniques, e.g. Systems Outage Analysis.