IT Horror Stories

We asked for your IT horror stories. You listened and sent us yours. Here they are:

Single Point of Failure

I cannot name the company for obvious reasons except to say it was a very large enterprise company. I worked in the IT department on middleware apps. One day the applications came to a halt, but not just the middleware apps but many, many others. A P1 incident was created and emergency alerts were sent out. We could not get back to normal work for nearly 3 full days. It had come to the attention of the IT department that there were not enough UPC’s backing up the main data center. So a contractor came in to bolster the power load. He was assured by the data center team that advanced failover measures were in place, so he went to work.

He found that one switch did not have a UPC at all. He unplugged it, plugged the UPC in and re-plugged the switch to the UPC. All is well correct? Nope. Come to find out, the architecture drawings were incorrect. Hundreds of thousands of dollars later (in lost revenue and time), it was found that the high powered switch he unplugged brought down over 3000 servers and virtual servers. Each had to be rebooted in sequence to be brought back online. Yes, this was a nightmare!!! Lesson learned, never allow a single point of failure to exist on your enterprise network. In this case, it was a good thing to provide UPC’s. Ironically, the very thing being supplied to thwart an outage, became the source of the outage. Submitted by Mike H.

IT horror stories and nightmares

 

Clean Up on Aisle Intranet

In 2000, due to a communication gap between my team and the IT team, I wasn’t aware that the files supporting our company’s intranet had been restructured. When I ran a report to clean up old files on servers, I mistakenly thought these 10K files were no longer in use and had them deleted. Net result – our entire intranet had been deleted and it took many resources several days’ work to restore it. I was SURE I was going to be fired, but instead it gave us the opportunity to review and fix what went wrong in our processes and I continued working there for another 10 years. Submitted by Yolanda J.

 

When a Patching Server Just Wants to Do its Job

Some years back, we had a change window set up to patch the Windows servers. Everything had been done according to policy and Change Management had approved our change and the timing and notification had been sent out that we would be doing this after hours to minimize impact. At the start of the change window the admin logged onto the patch management server, opened the application, and kicked-off the patching job. The patch management server immediately crashed and would not reboot. We cancelled the change per policy and contacted hardware support to come out and troubleshoot the patching server. Due to the level of support that we had, the field tech did not arrive until business hours the following day. After running diagnostics he determined that a memory module in the patching server had failed. He replaced the memory module and booted the server. A few minutes later the monitoring console lit up like a Christmas tree and pagers started going off all around the server and application admins’ aisles. Servers were going down all across the enterprise. Senior Managers started calling to find out what the heck was going on. Customers were being impacted as applications became unavailable. After a little investigation we realized what had happened. The patching server had set a flag to run the patching jobs before it crashed. When it was repaired and restarted, it dutifully picked up where it had left off the night before and kicked off those patching jobs and started patching all the servers – in the middle of the business day! Lesson learned, don’t do work in the middle of the business day even if it not supposed to cause impact. You just never know.
Submitted by Derrick B.

Add your IT horror story in the comment section

 

A Computer “Guru” Who Built Her Own Anti-Virus Program and Has Never Heard of Google

I used to work for an IT services company that is part of large corporation from 2004 to 2008. They used to own a Cable & High Speed Internet company. At that time, the company I worked for provided tech support for them, which is always interesting.

One evening, I had a client call up looking for help with her e-mail and then proceeded to tell me that she knew almost everything about computers, and how she had her Master CNE designation & computer science degree. First thought was why are you calling then? But, being the good help desk techie I am, I proceeded to troubleshoot her issues, verify settings, etc., etc.

Halfway through she started cursing the pop-up ads that keep coming up, so I ask if she has run a virus scan on her computer. She says no and I ask her, maybe you should consider it and ask her do you have Norton, McAfee or AVG installed. She says none, I wrote my own.

Huh?

I asked her to confirm, what anti-virus program do you have installed and she replied, “None, as I wrote my own software.” WTF?? So I suggested some anti-spyware tools and she asks which one. I suggest three and she says okay, I might reverse engineer them after seeing the code. This call then took a turn for the weird.

She asks me how to find them and I tell her go to Google.com and “Google it.” … She then said that she had never heard of Google, that they must have sprung up overnight and how she was going consult a lawyer about suing them over the name.

I asked her if there was anything else I could do for her and she asked me what qualifications I had for the job, to which I replied “My qualifications and experience are only the concern of my boss & the HR department.” I then asked her where she got her computer science degree from and her reply was a “F*** you” and phone slammed down in my ear. Submitted by Donald M.

 

Servers and Oil

I was building three new 2012 servers to create a security network and performing the rack and stack. On the day the systems were set to go live I walked into the server room which was in an adjacent storage room in a parking garage. The room had an inch of oil over everything. The elevator seals holding the hydraulic fluid had burst in the next room flooding both rooms. To my shock, the servers were still running because the hydraulic fluid is not conducive. I was very lucky because I could save the hard drives and since this was a three-location project I used the future servers. Submitted by Eric M.

 

Never Forget Computer Basics

A B2B e-commerce project was launched with best RAC servers running on Oracle 10g database. It was a cross-breed of applications running on a Windows Server platform, Oracle database cluster with PL/SQL cartridges, early days of Java 1.3, PKI Security using card authentication and mainframe backend. Everyone is excited for the first business internet product of the year 2000, eliminating physical cash transfers on the road between the bank and the client. The B2B e-commerce internet product is a game-changer to perform electronic fund transfers, payroll payout, cash collection and more so all eyes are on. The initial setup and installation was a success with the physical servers mounted on the data center, cabled and all the pre-requisite software are installed.

Day 2 is the big event of the deployment with the DBAs, developers, operations, project manager and head of IT all inside the locked data center. It was 10pm and everyone had all their installation scripts and application packages ready for the 3hrs installation. The deployment went smoothly and completed ahead of time ending at 1am so everyone is ready to call it for QA testing and about to head home… until something strange happened.

The PL/SQL packages started getting corrupted, Java processes were not 100% processing all the transactions and the Windows Server started triggering system event logs. The DBA was immediately engaged to determine if there was data corruption and it indeed affected about 10,000 records and growing. The operations team also checked the event logs and seeing intermittent I/O errors on the mapped drives used by the database. Developers are seeing various Java exceptions on the logs and unable to figure out if a missing Java package caused it. Oracle Support was called out on-site after 15mins and raised it to high priority incident, requiring an immediate database backup and recovery. Surprisingly, even the backup process fails and has dug deeper by rebuilding the corrupted database thru the database index and structure files. It was already about 4am and the big launch is set at 8am with the press media with the bank CEO presenting. This should never happen after 5 cycles of QA testing and audit and everyone is scrambling of what is happening.

As the Head of IT was about to call out the deployment backout, face the dreaded shame and the whole IT team thinking of next job opportunities, a young operations guy was staring at the back of the server. He had a basic understanding of the server infrastructure based on home computer with the power supply, video card, PS keyboard and mouse, audio card and the network adapter card. He looked at it closely, seemingly something odd is in place. And, there it was…

The network adapter card was vibrating and the LAN cable was loose. The network clip was broken and the network adapter card was not secured to the frame. He inserted it back in securely and all of the sudden all the errors went away. All the experts look at him and said “What did you do?” The pale-faced young ops said, “I noticed the network cable was not secured and plugged it back in.” Everyone went to the back of the server and confirmed that it was indeed a bad network connection.

Everyone was cheerful with the project saved, the corrupted records were immediately restored and QA/audit completed at 6am. The team went home happy with a night to remember, and the CEO had never heard of the frustration that night. It has been 20 years and I still remember this experience and why I always look at the back of the server. Submitted by Jorge G.

 

Every IT Professionals’ Fear

Every day is a horror story, because nobody really knows what we do. Everyone specializes in something different and when a Network Specialist gets a question about a user’s cell phone, it hurts. Submitted by Natasha A.

 

The Good ‘ole “I Love You” Email From Your Manager

Well, I didn’t actually witness it but my older brother tells a cautionary tale that goes something like this: Several years ago while working for a tech company his manager poked her head in his office one day and said “If you receive an email from me with ‘I love you’ in the subject line, don’t open it.

Well, a short time later, wouldn’t you know an email appeared that read “I Love You”. Needless to say, curiosity got the best of him (Did his boss actually LOVE him?) and he opened it up, only to realize it was a virus which immediately began sending the email out to everyone on his contact list. At this point in the story he acts out how he lunged from his desk and yanked the cable out of the wall. I bet that was entertaining to watch. Submitted by Julia S.

Add your IT horror story in the comment section

 

Hot Dates Get In the Way of Weekly Backups

When I was asked to hire a computer operator I was told to hire the first person who walked in the door, so I did. He worked out well for a while, backing up our data on big reel-to-reel tapes: incremental backups on weekdays, with full backups every Friday. Then one day our removable 300mb disk drive crashed, we found out that the operator had been overwriting the full backups with incremental backups. He explained, “I have hot dates on Friday nights and the full backups took too long.” We had to recreate our order entry system because it hadn’t been changed in a long time, and wasn’t on an incremental backup.

Luckily the consultant who wrote the original programs was still in the business but I estimate it took 2 weeks of full-time programming for the both of us to finish the work. Needless to say, I verified every backup every day for the rest of my career. Submitted by Angel L.

 

The Reply-All Email That Crashed The Server

Years ago I worked for a software development company that had just migrated from Lotus Notes to Exchange 4.0 for email and the users had some issues getting used to the new email client. The administrators also took some time to get used to the Exchange backend. After about a month of use, the new email server crashed – the hard drive was full. Turns out, no one thought to backup the email server and as such, the database logs grew too large too quickly. Once we figured out the problem, we started backing up the server but soon realized that we were greatly under provisioned in hard drive space and ordered new drives. We sent out a message to all users on our system asking them to refrain from sending large attachments until we solved the issue. One of our sales managers thought that this was a perfect time to use the fancy new mail client and “reply all” with a video of a disgruntled user bashing in his monitor…. yep, he took the server down for good with that one! Two days later we got the disk we needed to bring the server back online…… Thankfully I was user support and wasn’t held responsible for the server crash and kept my job…. the network admin wasn’t so lucky. Submitted by Tina I.

 

Not IT nightmares, but too good not to share

 

I Learned Computers Have a “Secret” CD Drive

User complained her CD drive was eating her CD’s. Upon opening her CD drive there was nothing there, she was amazed what I did, had no idea THAT was the CD drive. She showed me the small slot between the 5.25″ CD drive and the empty plastic holder on the second 5.25″ hole, and pushed a CD into that slot. It made a horrid noise when it hit the bottom of the case as one would assume it would. Popped the side of her case open and roughly a dozen CDs were at the bottom of the case. Submitted by Eddie N.

 

Excel Files are Heavy

I’m asked by my way-cool CEO to deliver a mini and full-size laptop to a new bigwig coming from Germany the next day; for her to choose which she preferred. This goes back about 10-15 years ago so, yes, CD Rom was a big deal.
It went something like this:
“How much do these laptops weigh? Is there a big difference?”
“I’d guess 3 to 5lbs maybe, the larger unit has a CD drive built-in, and this smaller unit does not, is that a concern or request?”
“I think so… can a CD be added to the smaller unit?”
“Kind of, you’d need to carry this sleeve, the CD drive itself, and 2 additional cables though…. and in the long run, may be the same weight, only now it’d be easier to lose or misplace the parts”
“Oh yes I agree…. so, 5 pounds huh? How much more is it going to weigh when you add my data to it, like my emails, Word and Excel files…. I have loooots of Excel files.”
“uhhhhhh…”
At this point I used every muscle in my face not to smile, let alone burst out laughing… but just know I was dying to say, “Any PowerPoint or PDF’s because those are the worst offenders.” Submitted by Mark M.

 

Stories were edited for spelling and basic grammar.

What’s your IT Horror Story? Share in the comment section below

 

Subscribe
Never miss another article. Sign up for our newsletter.


Join the Conversation

11 comments

  1. Snowdog Reply

    I was working at an HPC center and we made extensive use of the LMOD environmental modules system for user software. We hired a new guy that thought he knew way more than he did (formerly worked for Los Alamos) and while I was on vacation, he apparently decided to “reorganize” the entire modules tree, including several life science genomics titles I had painstakingly built and tested over the prior 3 years. I found out about this, because I started getting a flood of emails from upset research users, who suddenly couldn’t use any of the applications they’d been using trouble-free for the last 3 years. In addition to “reorganizing” the LMOD tree, he had changed ownership to himself, altered the permissions on the files, and then refused to answer his email over the weekend. When he finally responded back to me and fixed the permissions, I spent 2.5 days at a condo in Florida fixing everything. Needless to say, the wife was not pleased.

    I’ve also had the “secret” CD drive problem pop up at a former job. Hilarious! Looks like it’s more common than I could have ever imagined.

  2. Mit MO Reply

    NO BACKUP

    There was a customer company with 150 users working with terminal server daily and all their files have been saved on server D drive.

    One Thursday afternoon my colleague at network team sent us an email that a malware file has been downloaded and saved to the network and he got the notification from firewall … ! firewall saw the malware and just said yessss it has been downloaded..

    Unfortunately before we can do any action the client opened that and all folders he had access got affected, and again unfortunately one of our members team who was working on help desk run the patch inside one of folders and yessss, we done!

    Weird thing is Antivirus couldn’t get it too !

    Terminal server affected and that special folder files encrypted and a note has been saved to that folder to pay somebody for encryption key..

    On that time we just unplugged all network cables from back of servers, connect it to a private separated switch, I could scanned all servers with another AV and nothing affected except terminal server.

    I just moved terminal server from VMware to Scale computing and I was busy with storage managing and those stuff and FORGOT to set the backup from server!

    I removed ransomware from terminal server but it was unreliable, I had the previous server in VMware side so I run it and clients could login to the server a day after but with files versions for two weeks ago !
    I copied d drive files from affected server to a new fresh server and just killed that server, again I scanned files to make sure no more virus in sub folders then moved them to a separate file server and shared it as a mapped drive for clients. AND I set the backup.

    We could back to normal after two days (Weekend fortunately) and they had everything on Monday morning. I was thinking I may get fired but my boss was no nice and said “all these things are experience, learn and do better next time and focus on the future, we can not change the past”.
    IT MANS : FIRST THING YOU SHOULD CARE ABOUT IN NETWORK IS BACKUP.

  3. Glenn B Reply

    At a major company I worked for back in 2011, one of our server engineers wrote a small registry patch to fix an issue on all our windows servers. SMS would copy it to the server, it would unpack itself, do it’s thing, erase it’s temp files when done, and the server would then wait for the next regular reboot cycle before it effectively applied the patch. All well and good… except he didn’t test it on more than one O/S, it ran with full admin and /force permissions, and he let it run against every server in the entire active directory.

    Turns out, that on Windows 2003 it installed it’s temp files to C:\windows\system32. And when it cleaned up after itself… you guessed it… it deleted every file in that directory. And nobody noticed, until the next Sunday reboot cycle, when over 1100 production servers rebooted and found they didn’t have enough of an operating system left to do anything.

    To top it off, our backup system had no full backups, only incremental ones, because you know full backups are just way too expensive. So… reload the O/S by hand, reinstall all the software by hand, then restore the data. Repeat 1100 times while the business burns down around us.

    Six 100+ hour workweeks later, most stuff is back up and running. And each member of our team was awarded… a commemorative coin. (We didn’t get paid overtime, we were salaried.)

  4. Eric Reply

    I invented the concept of “draining the swamp”. I deleted an entire* state government agency.

    This nightmare was of my own creation with some help from terrible data practices.
    I was working on a data center upgrade contract for a state government agency upgrading several servers and a storage array. The agency was using Sun Solaris for their main servers. The agency had a few test servers they used to test applications they built in house but they had no systems that were used to test outside applications or patches. Anything from an outside vendor went right to the production servers.

    Sun Microsystems released an new batch of patches for Solaris so we patched the servers over the weekend and everything seemed fine until Monday morning. The agency security group was responsible for adding new users to the systems and a new programmer started that Monday. The security group used a graphical tool to add new users and the graphical tool was not working. None of the agency employees knew how to add users from the command line, a problem that was compounded by the fact the agency was using NIS to resolve all users on all the Solaris servers.

    It wasn’t part of my contract but the agency asked if I could help them out so I ran the the useradd command and added the new programmer into the NIS server. I set his initial password and then I ran a “tail -1” of /etc/passwd and redirected the output to the NIS passwd file and then ran “make” to remake the NIS database. It was the very second I hit the “Enter” key that I saw my mistake. I wanted to append the new programmer to the NIS passwd file which would have been done by using two “greater than” (>>) characters as part my “tail -1” command. I forgot one of the “>” characters which caused the system to over write the existing NIS passwd file deleting all the existing users and replacing them with just the new programmer. So I didn’t really delete an entire* state government agency, I just downsized it from a few thousand people to just one.

    What made this even more fun was the fact that we had the systems set up so no one could log in directly as the root user. You had to log in as a regular user and then switch users to become the root user.

    None of the agency employees nor myself could log into the backup server as ourselves or as root to run a restore of the previous NIS passwd file or the previous NIS database. There were a few ways to fix this but when time is of the essence, simple is often better. The only user that existed on the Solaris servers, including the backup server, was the new programmer and I had just set his initial password. I logged into the backup server as the new programmer, switched to the root user and restored the NIS database from the previous night’s backup. Problem solved.

  5. TN Reply

    Years ago I worked for a company that did not have their i-Series on a UPS. There was a lot of road construction going on, about 75 feet from our data center, as part of a widening of the main road. One day they must have severed a power cable in the street and we lost power. Well it was a hard crash of the system. We got the system back up in a few hours with a few damaged files we were able to recover and get back to normal business.

    Well a few days later, the same thing. Boom ! Out goes the power again. Slow recovery process but we got the system back. Now the scramble is on to get a UPS. We got one and had it installed a few days later.

    YEA WE ARE PROTECTED !!!

    WRONG..

    Yes we had our UPS up and running and our i-series was saved, or so we thought. The problem this time was the network console that controlled the i-Series was not on a UPS. So when we lost power a third time we could not do a safe shutdown of the i-Series. We had to watch as the UPS battery life drained away and the system ended abnormally again.

  6. Ian Frazer Reply

    So many to choose from… so many ID-10-T’s I have worked for…

    How about the server farm that needed a new high-speed switch to be installed? We config’d it at the office & shipped it over to the shared secure facilities where we, the client, were not allowed onsite “due to security concerns”. Their highly skilled, trained, certified, and experienced on-site tech, who I will call “Bubba”, would do it. Bubba was about 5’7″ and weight somewhere north of 300lbs apparently. He just LOVED his huge metal belt buckles as we sadly found out. The only space left on the 40U rack was at the top. So Bubba installed the rails & then lifted the 60lb switch up and into them. Imagine what happened to his large stomach as he did this, and what his belt buckle did as he leaned in and slide the switch into place. Then dropped his arms down – pinched his stomach against about 48 fiber modules of another switch in the middle of the rack and his belt buckle. Said fiber modules were then ripped out of the device where he struggled to get away… taking down some 5,300 users and many many public-facing websites. Client name not mentioned due to extreme embarrassment. Ditto that of the secure facility located somewhere in the interior of BC, Canada (No, not that one. The -other- one…). I believe Bubba washed cars now for a living.

  7. Random Devops Guy Reply

    Couple years ago I was working on the DevOps team for a startup that had recently successfully migrated all our infrastructure from cloud/data-center hybrid infra to completely in the cloud.

    The cloud migration actually went reasonably well. Quite a bit behind schedule, as such things typically are (And is bound to happen when you go through three different DevOps managers in the period leading up to the move. Yay! Startups!), but when we finally did switch everything over it was relatively glitch free. Due mostly to months (Ok, a couple years. It dragged on a while.) of advance planning and prep.

    We now had a problem though. A rather sizeable monthly colo bill in addition to our now sizeable cloud-spend. One might think a data-center decommissioning should also be something that’s planned and prepped for well in advance, right?

    Nope, just put the devops team in charge of it, give the junior-most guy the bulk of the planning workload and give him a month or so to do it, along with all his other IT duties, with little to no supervision.

    There are companies that do this sort of work. Maybe we should look into contracting this one out. Nope. Too expensive.

    What could go wrong?

    The answer: Pretty much everything.

    Moving day finally arrives and before we even get to the colo there is the first of *two* minor accidents in the rental box-trucks (The second takes place with a truck full of equipment.), including taking down an entire tree when pulling over to the side after the collision. Hundreds of thousands of dollars to low millions worth of servers, NAS boxes, and high-end data storage appliances eventually end up basically rattling around inside the back of said trucks ’cause they were affixed to pallets, not with banding straps or anything meant for shipping such equipment, but *shrink wrap*. And it’s miraculous that there were not injuries resulting in expensive workers comp claims after a bunch of out-of-shape ops guys, most of whom had probably never even *seen* a pallet jack or the inside of a box truck, much less operated either of them, were given the job of unracking, palletizing loading and unloading hundreds of piece of rack-mount networking/storage/server equipment.

    The kicker: 2 years later, and the last I heard all those servers, switches, high-end storage appliances, etc. are still piled in a storage facility somewhere in the greater LA area, ’cause we couldn’t be bothered to take the time to remove and/or erase the drives prior to unracking the servers, so none of that equipment could even be sold after the move. A service, I might add, that is included in the cost of many of the companies who specialize in this sort of work.

  8. Pedro O. Reply

    Back in 2002, I was the Network Administrator for a Telecom company. We were using Norton Antivirus for both servers, workstation and gateway, when one day our dept start receiving calls about replicating emails. Turns out the Bear Bug worm virus hit our network and flooded our email server, workstations and shut it down by filling out disk space. The antivirus did not do it’s job, so we have to isolate our network for 4 days to clean up each server manually with another antivirus.

    When everything was clean and online, I start investigating what happened. Turns out the company’s CEO ordered one of my technicians to disable the antivirus on his computer so he can use a floppy disk, which of course contained the virus. told him never do that again if he values his company, I lasted 6 more months there.

  9. Joseph Rapoport Reply

    I worked for a law firm. One of the partners was trying to install Office on his tower computer with 3.5″ drives. He started the process, ran into issues, and then called support. I was sent up to see him. He had put Disk 1 in the 3.5″ drive and when that had finished loading had crammed Disk 2 into the drive, and pressed the space bar to continue installation, without taking Disk 1 out. I removed the disks and showed him how to install Office, but I always thought how funny it would have been to see him cram all 25 disks into that 3.5″ drive.

  10. Art Barnett Reply

    I was the only SYSADMIN in a small manufacturing plant in CA. Fridays were normally slow days with the owner telling me to NEVER make network changes on Friday. With permission from the owner, employees were allowed to bring in their personal PCs with any issues they were experiencing. Our lead engineer brought in his PC, explaining that his PC was shutting down within a few minutes of booting up. I open the PC and found about an eighth of an inch of some oily substance in the bottom of the case. I asked,”what was done to your PC?” He said that he replaced the CPU fan and he didn’t have any heat-sink grease so he used wheel bearing grease between the CPU and the fan. I’ve never seen a deep-fried CPU before!

  11. Daniel Reply

    My wife tells a story where recently an email addressed to about 2000 employees made it to her inbox. Needless to say, some folks where not happy to be on that mailing list. After the first few hit reply all, the flood gates were open with me too as well as the preverbal please do not reply to all. After all was said and done, she had to delete oven 700 emails from this thread!