Cart () Loading...

    • Quantity:
    • Delivery:
    • Dates:
    • Location:


Are Production Server Reboots Standard Changes?

April 27, 2011
Michael Scarborough

I attended a meeting recently with a customer of mine and a potential new vendor. The new vendor was there to pitch his configuration and setup service offerings for a specific ITSM toolset.

My customer has already had one bad experience with an ITSM tool configuration vendor who promised one thing and delivered much less. He ended up with a tool that’s minimally used and not configured to match his business needs. He’s looking for a vendor that can understand his business needs and priorities and quickly help him get his tool configured and working in a short time frame.

My role with this customer is to help him adopt and stay aligned with ITIL best practices. The meeting went well, and I felt like the vendor showed my customer something that would really help him get maximum benefit from his chosen ITSM toolset.

Then the topic of standard changes came up. My customer asked for examples of standard changes. The vendor responded, “Server reboots are an example of standard changes.”

Things like this often make my head want to explode. I’m fairly good at controlling my emotions in a professional setting, and it definitely wasn’t the time or the place to get into a theoretical debate about what is and what is not a standard change. Discretion is the better part of valor, and I chose to not sidetrack the meeting into a pointless argument about standard changes.

However, the topic is important to me and to this particular customer. Standard changes are described by ITIL v3 as low-risk, regularly occurring activities that are well understood and repeatable. Standard changes are often associated with service requests. They are simply a way to document regularly occurring activities that involve some aspect of change, but are very low risk, low cost and repeatable.

There certainly is room for debate about what is and what isn’t a standard change. The point of the best practice is to communicate that controlling the risk of and documenting standard changes is a good idea, not to specifically tell you what is and what isn’t a standard change in your environment. Therefore, what is and what isn’t a standard change, like many things described by ITIL, is highly context-driven.

The bulk of my experience is in financial services, in IT environments that are critical to daily business operations involving billions of dollars. In such environments, regularly rebooting a server as a precautionary measure is slothful. It’s seen as not pursuing some underlying issue to root cause, and then permanently correcting that root cause. In such environments, a server reboot is what ITIL calls a workaround, in that the reboot temporarily addresses the symptoms of a problem but does not permanently correct it. In such environments, these are not typically low-risk activities and therefore would not be considered standard changes. This is not to say that we didn’t have many people who would’ve loved to make such a thing a standard change; we did, but the needs of the business wouldn’t allow it.

One issue with deeming server reboots as standard changes is that whatever situation the reboot is intended to address, which is usually clearing some memory-related issue, might immediately recur upon reboot. I have seen this very thing happen. A server is rebooted to clear a memory leak, and almost immediately upon reboot, the server is again unusable because of the memory leak. The reboot only addresses the symptom temporarily, which is why it really is a workaround.

Making these types of things standard changes tends to encourage bad administrative behaviors and lets the effects of poor development linger in organizations for years. Think of the situation where server administrator John creates some automated process to reboot a server, doesn’t document it, and then leaves the company. Whoever follows John (and then follows that person) is often left to figure out why a process is in place to reboot the server.

This doesn’t even mention the impact of a regular reboot on the business. Who knows (without asking) what impact it has on this business, how the business has adjusted and changed over time, or even if the business is taking some workaround actions of their own to ensure that the application hosted on the server is working properly.

The point is, server reboots are much more likely to be workarounds than they are to be standard changes. A workaround is an action taken to temporarily address the effect of an incident. Workarounds are sometimes called “temporary fixes”, which means that the duration the workaround’s effect applies is limited (it could be 1 minute, it could be 10 years). Furthermore, workarounds can be done preemptively. In most cases where I’ve seen organizations regularly reboot servers, what they are in fact doing is preemptively carrying out a workaround.

The Change Management process, where possible, should work to turn normal changes into standard changes. The Change Advisory Board should not review the same changes every week; an effective Change Management process figures out how to lower the risk of those changes and turn some (not all) of those normal change activities into low-risk standard changes.

ITIL is useful in many of the methods, techniques, and constructs that it describes. Declaring something as a standard change means that the activity is low-risk, well understood, and routine. Regular virus definition updates are a good example of a standard change. A regular reboot of a server to preemptively correct a memory leak in an application heavily used by the business is not a standard change. It is a potentially high-risk situation, that, if called a standard change, minimizes the amount of impact and undermines a critical part of the change management process.

Much, if not most, of what ITIL describes is contextual in nature. How one organization applies the concept of a standard change might not be the same as how another organization applies the same concept. However, one thing is clear. Any activity that is potentially high-risk to the business is not a standard change, and high-risk activities should not be managed through a process designed to handle low-risk, repeatable activities. To answer the question posed in this post, in most organizations, production server reboots are not standard changes