I have trained hundreds of cloud engineers. They work for organizations ranging in size from ten employees to tens of thousands of employees, so the scale of their responsibilities differs from job to job. Having said that, I see consistent trends regarding common mistakes that simply should not happen in the cloud.
Systems engineers are the women and men responsible for keeping your applications running and databases responsive, and while the cloud has removed much of their responsibilities regarding hardware, their jobs are by no means easy. Lack of attention to details is always a factor in most mistakes, but these issues stem from misperception and a lack of understanding regarding the cloud itself.
If you are looking to become or remain a cloud engineer, here are four mistakes you should strive to avoid:
1. Assuming the Cloud is Unbreakable
Hardware housed in a data center under someone else's control seems to promote a feeling of invulnerability. It is still the same hardware, and hardware fails. Overreliance on the maintenance and built-in redundancies of your cloud provider can lead to disaster. Four nines of availability is standard in most cloud services, but that does not relieve you from the need to have a plan for when things go wrong.
How to avoid this mistake:
Prepare disaster recovery plans that include more than simple backups of your data. Server images, responsibility charts, and consistent updating and maintenance are a good start. I have seen an unfortunate number of outdated plans that place important responsibilities under the supervision of people who have not worked there in years. Do not stop with the plan. Practice game day exercises until recovering from these issues is something that every team member is accustomed to doing.
Disaster recovery is not something that should be left to chance, and much of that obligation still falls on an engineer's shoulders. Any system that is considered business-critical should never be managed by keeping your fingers crossed, and while the cloud has a good amount of inherent resiliency, preparation is the key to surviving any failure.
Anyone who has worked in IT for a while has come from a place where we constantly have our hand out to the company's accountants. We need money for servers. We need money for storage. We need to hire more staff. The requests never stop, and the average number cruncher ducks for cover when they see an engineer coming.
The cloud offers us a new opportunity where we have a direct influence over day to day costs and the oddly refreshing ability to reduce them. Cloud providers always push the concept of "pay as you go" to explain that if you turn something off, you stop paying for it. This is mostly true, but the opposite of that is true as well. If you leave something running, you will continue to pay for it.
A client of mine called me frustrated with the bill he had received from his cloud provider. Outraged, he told me that he had only used two servers, but he was being charged for eight. When I looked at his logs, I pointed out that he had launched eight servers, and then a quick check of his portal showed that they were all still running. He explained that he had launched eight but that he only used two. His logic was flawed. If it is running, you are paying. Treat your hardware as a disposable resource and terminate any item that is not currently required.
How to avoid the mistake of overpaying for cloud services
There are other easy ways to avoid wasting money. Servers that run consistently, day in and day out, can be billed as "reserved instances." This option can drastically decrease your costs (by up to 80%) by guaranteeing usage for a minimum of a year. If you have a six-month project, this is the wrong choice, but for your long-running infrastructure servers that aren't going anywhere, this is a great offer.
Spot instances, managed services, and serverless options can also save money, but make sure your choices meet all your requirements. One of the basic tenets of cloud computing is to always use the right tool for the job, so avoid the flip side of this mistake by not compromising to save a few dollars.
3. Abusing Admin Credentials
Least privilege is a simple idea to understand, albeit a difficult one to implement. Put simply, give people access to what they need and deny them access to what they do not need. However, the truth is that certain accounts are not affected by policies and permissions, even if they are well designed.
People login as domain admins on a regular basis in some on-prem environments. The equivalent poorly made decision in the cloud is to use your root account or Azure administrator account for daily usage. This decision puts your entire organization at risk. The risk of your root or administrator account becoming compromised may be small, depending on how you use it, but the damage could be catastrophic. These accounts are literally all-powerful and using them for daily use continuously puts you at risk.
How to avoid being an admin credentials n00b
It may seem like a small difference, but creating a new account and adding it to a group that grants admin privileges is the preferred option. While this new account gives nearly the same level of power, if it is compromised, the root account can always be used to override it. Using the root account concedes this safety mechanism and could lead to disaster due to a hacked account.
Similarly, you must protect your root identity when accessing the cloud via SDK or command line. While usernames and passwords may be used, it is more likely to see the use of access keys and secret access keys. These are typically tied to accounts (including the root user) and give the same level of permissions. The best practice for the keys associated with the root account is to delete them immediately.
As a final note in this vein, there are tools that help protect passwords and connection strings if you take the time to use them. Azure's Key Vault and AWS' Secrets Manager are solid offerings that allow the secure storage of information while enabling constant auditing of their use and misuse. Finally, the use of Azure's role-based access control (RBAC) and AWS' Identity and Access Management (IAM) roles allow you to forego the use of passwords entirely.
4. Ignoring Your Advisors
All this discussion regarding the mistakes that plague engineers can make the job seem impossible. While there are pitfalls, there are tools that can make your experience far easier to handle. Both AWS and Azure include account-wide services which can recommend and instruct you in everything from best practices to possible security flaws. Regrettably, most people are unaware of their existence and miss the opportunity to fix the issues that would have been brought to their attention.
How to avoid the mistake of managing the cloud on your own
AWS has the Trusted Advisor. This built-in tool surveys five areas of concern: Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits. Some of this advice is free, and some of it requires higher levels of support to activate them. This service's sole purpose is to help lead you to a secure and highly available operation while keeping costs under control. It is a novice mistake to ignore its advice.
The Azure Advisor fulfills the same role for Microsoft. Personalized and actionable recommendations are provided to optimize the reliability, security and performance in your Azure account. You would do well to heed its advice as well.
Protect your organization and become an indispensable cloud engineer
These four mistakes can take a toll on any organization. Individually, they can cause harm, but when combined, these errors can leave an organization vulnerable to attack, over budget, and incapable of recovering from outages. Cloud engineers are valuable employees, but without the know-how to do things right, you will be replaced.
Recommended courses to develop these skills
- Introductory Courses
- Foundational Courses
- Focused Specialties
View all cloud computing courses.
Jeff Peters is a systems engineer, cloud architect, and technical trainer with over 20 years of IT experience and a current focus on Amazon Web Services and Microsoft Azure. He holds several MCSEs from Microsoft along with Professional and Specialty certifications from Amazon. Jeff resides in Metuchen, NJ, with his wife and two sons, who all roll their eyes when he gets overly excited about technology.