Mastering Problem Management
There are several approaches to take that prevent Problem Management failure, including a thorough understanding of how Incident and Problem Management affect each other. If IT organization leaders can incorporate Incident Management into their Problem Management, as well as follow other tips laid out in this white paper, they can likely reduce the quantity and duration of service disruptions.
IT Organization (ITO) leaders often confuse Incident and Problem Management, leading to more service disruptions of longer duration. Here we explore how an ITO leader can reduce the quantity and duration of service disruptions with Problem Management.
Problem Management starts with Incident Management. Incident Management and Problem Management are not the same, although they are interdependent. If you do not have both, then you most likely have more user downtime, lower customer satisfaction, and higher costs than you should, all of which reduce ITO Return on Investment (ROI).
Incident Management aims to restore service quickly by reducing the duration of disruptions. Problem Management seeks to prevent service disruptions by discovering the causes (or potential causes) of disruptions, and creating workarounds and permanent resolutions to them.
These two can reduce both the quantity and duration of disruptions significantly. However, if you do not make the goals of each a priority, they tend to fall off in response to the day-to-day pressures of supporting users. It is just as important for you to allocate staff and require activities to develop workarounds, identify root-causes, and perform trend analysis as to restore service.
If one functional group cannot do both well, then consider two groups - but you must ensure both Incident and Problem Management activities get the management attention required to operate well. You simply cannot have one without the other, and they require tight integration with each other.
Incident and Problem Management Depend on Each Other
Incident Management is a "one shot" process that starts in response to a report of service disruption, and ends in service restoration. Its goal is restoration of service, and capturing information for use by Problem Management. Problem Management is an "always on" process, continuously examining information from any source that has or could initiate an Incident Management cycle. Its goal is prevention. Properly focusing on the objective of each (restoration vs. prevention) can result in higher service quality, increased customer satisfaction, and improved ITO ROI.
If you do not have both Incident and Problem Management, you can experience a higher number of longer-duration outages. Without Problem Management, ITO staff is doomed to "fix" the same issues repeatedly. Without Incident Management, ITO staff has limited data for analysis and cannot focus on Root Cause Analysis (RCA) and other prevention activities. One without the other usually results in more user downtime and steals valuable ITO resources away from efforts to add business value.
To make tangible contributions to the success of your firm, you need to improve efficiency to free resources for other business-aligned projects (innovation). Following discrete Incident and Problem Management processes decreases call volume and reduces outage duration. These improvements can shift the balance between innovation and "Keeping the Lights On" (KTLO) enough to produce visible improvements in ITO business contribution, while freeing resources for focus on adding value beyond basic operations.
Steps You Can Take To Get Started with Problem Management
To improve (or start) your Problem Management, you need to take the following steps:
1. Reassess your understanding of Incident and Problem Management. The former seeks to restore unplanned service disruptions quickly while the latter aims to prevent them. These are two very different, albeit related, objectives. Consider how you currently operate with respect to restoring and preventing. Understand the objectives of Incident and Problem Management and what it means to your firm and ITO.
2. Assume that currently you combine restoration and prevention activities with little formal management over either. In most ITOs, prevention efforts receive much lower priority than restoration efforts - even though prevention can reduce restoration efforts significantly. It is also common to fail to collect accurate information from each Incident Management cycle. Seek and compare the opinions of ITO managers, supervisors and staff as well as customers and users to get to the truth.
3. Investigate current activities to validate your assumption findings. If warranted, attempt to document the average duration and number of outages. Also include user downtime if possible. Do you have management objectives of each? What percentage of effort do you expend on each? Is your team capturing the information required to reduce their workload? Are they actively working to reduce outage duration and number of outages?
4. Assess the capabilities of your staff. At a minimum determine if there are 1) methods in place to record all Incident Management details, 2) any formal method for developing workarounds to speed resolutions, and 3) preventative activities such as trend analysis using Pareto Analysis (a.k.a. the "80/20" rule.) More mature ITOs include formal RCA techniques such as CFIA, Ishikawa, Kepner-Tregoe and others.