Incident Management Process Flow – Which Comes First, Categorization or Initial Diagnosis?
In the current version of the ITIL Foundation class, the following exam question appears in one of the two sample exams used in the class:
Which one of the following is the CORRECT sequence of activities for handling an incident?
- identification, logging, categorization, prioritization, initial diagnosis, escalation, investigation and diagnosis, resolution and recovery, closure
- prioritization, identification, logging, categorization, initial diagnosis, escalation, investigation and diagnosis, resolution and recovery, closure
- identification, logging, initial diagnosis, categorization, prioritization, escalation, investigation and diagnosis, resolution and recovery, closure
- identification, initial diagnosis, investigation, logging, categorization, escalation, prioritization, resolution and recovery, closure
The correct answer to this question is 1, however students often disagree with that answer choice. The rationale behind the answer is simply, “The correct order is given in the diagram in the incident management process, and in the subsections of [SO] 4.2.5.” In this post, I will provide a better explanation of why choice a is the correct answer. First of all, the flow of activities in the incident management process is described in the Service Operation book section 4.2.5, and shown visually in Figure 4.3. Figure 4.3 shows the following flow of activities for incident management:
As shown in Figure 4.3, the correct flow of activities in the incident management process begins with identification, which is followed by logging, which in turn is followed by categorization. Initial diagnosis occurs later in the process flow following prioritization.
While the Service Operation book is clear about the flow of activities, the logic behind why the activities are in this order is not completely clear. Very few people disagree that the incident management process begins with identification, which in turn is followed by logging. The disagreement primarily exists in what follows logging, whether it is categorization or initial diagnosis. A good way to summarize the flow of activities is that they flow from general to specific.
It often helps to clarify what the steps in the process do. Categorization allocates the type of incident that is occurring. In practice, organizations often use a multi-level categorization scheme, where the top-level consists of a few broad high-level categories. Subsequent levels of categorization might provide an additional level of detail. Practically, I’ve always thought of categorization as a way of identifying at a high-level what general area an incident should belong to. For example, common top-level categories include things like “hardware”, “software”, “network”, “user induced”, “supplier-induced”, etc.. In fact, I once worked at a large organization that processes about 50,000 incident tickets per month with a set of 8 top-level categories. In other words, when categorization is done, we’re really just trying to identify a general area to which the incident most likely belongs. Categorization can be revisited, and often changes throughout the lifecycle of an incident.
Prioritization accounts for the impact and urgency of the incident and assigns a pre-defined code that guides an organization’s response to an incident. In any population of incidents, an effective prioritization scheme tells the organization which incident to work on first. The ability to do this is critically important in high-volume environments where the organization has limited and shared resources capable of responding to numerous, simultaneous incidents. In other words, organizations have to make decisions about how to marshal resources based on their impact to the business and how quickly service must be restored.
Initial diagnosis is described in the Service Operation book in section 188.8.131.52 as the activity where the service desk attempts to understand all symptoms of the incident in an effort to uncover what is wrong and attempt to correct it. During this activity, the service desk staff might use the known error database to speed incident resolution, or diagnostic scripts to identify the service fault.
The logical reason why these steps are in this order is because during categorization and prioritization we try to uncover enough details about the incident so that it can be routed correctly throughout the process. For example, organizations might choose to handle hardware or network incidents differently than they handle software incidents. The same is true for prioritization. Prioritization seeks to establish facts about the incident in terms of its impact and urgency such that proper routing decisions can be made; for example, the highest priority is what is typically known as a “major incident”, which will often follow a specific procedure dedicated to handling major incidents.
Therefore, the early steps in the incident management process are focused on properly routing the incident. Knowing the category and priority help organizations make effective decisions about routing incidents. Improperly routed incidents will result in delayed resolution of service, which impacts users and customers and decreases satisfaction. For example, it would not make sense for a service desk to attempt initial diagnosis if they are not properly trained or equipped to investigate that category of incident. In fact, a service desk spending time doing initial diagnosis for incident categories where they are improperly trained and do not have effective scripts and tools will often result in delayed restoration of service, increased impact to users, and a negative impact to customer satisfaction.
Clearly, according to ITIL, categorization occurs early in the incident management process, and there are good reasons why this is the case.