Event management, although theoretically different, is fundamentally what most IT organizations refer to as “monitoring.” Monitoring an organization’s environment to determine whether important assets are in the state they should be, and knowing when that state changes, is a very important activity that many organizations spend significant portions of their budget doing.
Event management, while useful, can be dangerous if not done appropriately. In the “ITIL Service Operation” book several policies are given to guide an event management process. In this post I will discuss the importance of those policies.
Policy One — Event Notifications Should be Sent Only to Those Responsible for Action
Events that are sent to people who are unable to, or do not need to take action are somewhat worthless. Event management is a process that helps a service provider understand changes in state throughout their IT environment. The only people who need to be aware of those changes in state are those who are responsible for some kind of action related to that change in state.
Policy Two — Event Management and Support Should be Centralized as Much as Possible
In my experience I have seen that smaller organizations, and organizations up to a certain size, are able to effectively conduct event management in a decentralized way. However, as the organization grows, so does the need for event management. This growth typically drives a growth in monitoring tools, with different groups monitoring things different ways. Ultimately what happens is through monitoring the organization accidentally does a denial of service attack against itself. Additionally, centralized event management means that the organization is more clearly able to define accountability and responsibility for the handling of specific event. Furthermore, decentralized event management, as mentioned earlier, leads to a proliferation of tools, which tends to mean the organization is investing in multiple tools that do the same things, with none of them being fully utilized.
Policy Three — Events Should Utilize a Common Set of Messaging and Logging Standards
An organization doesn’t have to become very large before the body of events becomes overwhelming. When those events all have different formats and structures and say different things (or the same things in different ways), it can be very difficult to effectively filter, correlate and take action on the body of events an organization faces.
Policy Four — Event Handling Should be Automated When Possible
Effective automation tends to speed up the handling of events, whereas if an event management process relies exclusively on humans to respond to events, the wealth of events will quickly overwhelm human processing power, resulting in events being missed or their handling being delayed.
Policy Five — Events Should have Standard Classification Schemes and Escalation Procedures
In other words, a service provider should know what to do with the events that it generates. Not only is it pointless to send an event to someone who is unable to take action on it, it is equally foolish to send an event to an operational team without effective instructions for how to handle that event. This results in events being ignored, which ultimately results in important events being missed.
Policy Six — All Recognized Events Should be Captured and Logged
If something is important enough to consider an event, then the organization must take steps to ensure that that event is predictably and consistently captured and logged. If events are not predictably captured and logged, then it is very difficult to rely on those events as triggers for automated activities within the organization’s IT environment.
Any service provider can set up monitoring tools, but only the top-performing service providers build an effective event management process based on an actionable and usable set of policies.