If you work in IT, chances are you have your very own IT horror story. Who are we kidding? Chances are you have several IT horror stories. In honor of Halloween, we asked for the worst of the worst - the one your IT buddies tell their IT buddies. We got some good ones too. It was tough, but we compared the stories and narrowed it down to our three favorites who we present here for your (scary) reading enjoyment.
The grand prize, and winner of the iPad 3, is Grey Howe:
In 2011 the company I worked for purchased another company that dealt in power generation. They had about fifteen electrical engineers on staff who all "needed" admin access, and the rest of the staff (totaling about forty which included machinists and admin staff) also had admin access. For the two years prior to the acquisition, they were running with no IT, only a single guy who would come in for once or twice a day per month to handle emergency issues that had been written out. Patches hadn't been done in excess of nine months on some stations, there was no inventory, and at least two PCs weren't running licensed software. Some systems were home built style by the same IT guy, and one of them had an SLI set up so he could run CAD faster (laughable). This is just the PCs!
The server room was four or five systems, and these were all dedicated to a 2003 AD, file/print server, and a couple application boxes (Quicken was one). They hadn't been patched. Ever. The patch panel was split between phones (a VOIP system using PoE) and cat5. Nothing was routed, there was no cable management, nothing was labeled. The company had bought into a three-year T1 contract that marginalized their data for the phones, resulting in performance that was worse than a DSL at a cost of greater than $1200/month, and the firewall was a simple Juniper box with rules that were very complex, right up until the end, where it did an all/all/all. The UPS was amazing! The engineers had used three batteries from solar installations (easily three feet tall apiece and over 200 lbs each), chained them, added a 1000w converter (as you might find on a car), and chained that to a minuteman UPS and two small APC units (the converter wasn't fast enough to switch the power during an outage). This all combined for a total runtime of three days.
No racks for the servers, just tables. One of the servers, their "most important" one, sat on an old desk. No cooling, either, and it was all stored in a closet that was accessed every day by everyone because the paper boxes were in there, too.
My task? Fix it all. In two weeks. I worked until 11pm or later on this project, getting stuck inside once due to the alarm system, and at the halfway point, my manager took the project away from me and declared that someone else, from 2500 miles away (Denver-Alaska), would be taking over the project. He arrived a week later, and I was released for being unable to perform quickly enough.
Our first runner-up:
I started a job at a rural hospital in California in the fall of 2009. We started getting tickets on my first day that the times on the clocks on almost all of the PCs were off by an hour. It didn't take me long to figure out that the PCs had not been patched for quite some time. After a little sleuthing, I discovered that all 250+ XP PCs had not been patched beyond Service Pack 1, and many didn't even have that. I asked the current sysadmin what was going on, and he said that he set up a WSUS to take care of everything, and that the PCs should be patched. I try to log into the WSUS, and it was dead (we eventually figured out it was a failed motherboard, but that's another story).
I'm starting to wonder how someone couldn't know how a WSUS server was down and that the PCs were not being patched for several years, but I had to figure out what needed to be done to fix the issue. No problem, we've got spare room on another server, so I'll just rebuild a WSUS there, I thought to myself. I informed the IT director of what was happening and what I was going to do to remedy the situation. I proceeded to build a WSUS and start downloading all of the needed patches through our only connection to the Internet (a single T1 line). I let it run overnight thinking that all of the patches would be downloaded by the time I got in the next morning.
The next day I come in and we're completely offline. Nobody seems to know what's going on. Most pings to outside our network time out. Those few that do come back have an extremely high latency. I soon realized to my horror that all 250 PCs were trying to download several years' worth of patches along with several service packs from Microsoft through our single T1 line. I immediately asked the sysadmin what he changed, and he said that he deleted the GPO for the previous WSUS since it wasn't working. He didn't seem to realize that it would cause the PCs to try to get the patches from Microsoft.
I quickly created a new GPO that pointed to the new WSUS, and we were back online later that morning. This was only one of many interesting IT experiences at this hospital.
Our second runner-up:
I once deleted the company's entire DFS (Distributed File System) share infrastructure during the middle of the work day. Which caused ALL (300+) users to lose access to all their mapped drives, network shares, and desktops. Then to add salt to that open wound, our engineer informed me that there was no back-up of this data and no documentation of what was currently in the structure. So, in a live environment we had to organize the IT Support staff to recreate from memory what we recalled was previously there. Thankfully, we have a great team who is very knowledgeable and was able to get the DFS infrastructure back up within a few hours. This was definitely one of the top stressful moments of my IT career.