“With great power…”
You know the rest of the quote (whether you’re a French Revolution fan or a Spider Man fan1).
Weighing in at over 100 services, AWS is a powerful toolkit that can cure many of the ills (scale, agility, cost reductions) your company may be experiencing. But like any tool, responsible usage is key. It’s easy for cost to become an issue when services are used indiscriminately and without oversight.
Scale and cost are often at odds with one another. If you need to spin up 1,000 servers to deal with a short-term load on your systems, it’s going to cost you. If you don’t have processes in place to spin down those servers during low-usage periods, that on-demand access to the “great power” of thousands of servers can result in a “great bill” at the end of month.
AWS is a new paradigm
Before you start investigating the technical nuts and bolts of AWS, getting a big picture handle on how, why, and where AWS differs from on-prem is critical. To help new customers save money, AWS has created, and aggressively updates, a document called the Well-Architected Framework—a compendium of best practices compiled from thousands of customer interactions. Although this blog focuses on cost savings—condensing AWS’ advice while adding tips and tricks of our own—we highly recommend you at least skim this framework because it’s filled with many nuggets of gold.
Three pillars of cost savings
Although there’s countless tactical, low-level actions one can—and must—take to save costs, in the experience of our 20-plus rock star AWS consultant trainers, each can be contained within one of three high-level considerations.
1. Turning stuff off!
2. Using the right tools in the right way.
3. Building, maintaining, and executing realistic plans.
Turning stuff off!
This is likely the No. 1 reason you’re encountering financial bleed in AWS. Spinning up large pools of infrastructure is as simple as five minutes of clicking about with a mouse, and it’s super easy for your developers to forget that they spun up a 500 node Hadoop cluster in those five minutes.
You may think that turning stuff off is a no-brainer in the cloud, but as an AWS consultant with nine years of client projects under my belt, I can assure you that you’re nearly guaranteed to have a fair amount of cruft lying about in your AWS architectures.
So how do you find and remove the existing gunk and prevent further cruft from building up? There are several tools AWS provides to manage this. Here are the most financially impactful ones:
• Billing roll-ups and alerts based on tags. AWS gives you the ability to break down your bill based on user-defined tags, and here’s the super cool part—this even works across accounts. You can (and should) create a “project_id=1234” tag and attach it to every one of the resources used by that project. Using third-party tools like Netflix’s Grafitti Monkey or Janitor Monkey can let you enforce and automate tagging of certain resources, or automatically delete resources that are not properly tagged. Then, via the IAM service, you can assign a billing role to one or more of your employees and let them either view the bills or receive alerts when certain bills have gone over a user-defined threshold.
• S3 Lifecycle policies. Think of S3 as a file-dumping ground that keeps expanding to let you add any number of files. Since S3 continues to expand, you need to use these “Lifecycle Policies” or you’ll end up with an abnormally high amount of cruft after even a few months.
• Big data stacks. If you only need that Hadoop (EMR) or Redshift cluster for a few hours or months, why would you keep it running any other time? Because of the high-cost nodes needed for modern workloads, this is one area where you can rack up a five-figure bill overnight if you’re not careful.
• Auto-scaling. A handful of services like EC2 and DynamoDB allow you to set up auto-scaling, the ability to define when your capacity should grow and shrink based on a metric you define. Leveraging this lets your architectures “breathe” in and out without any user intervention.
Using the right tools in the right way
Ask yourself two questions at every phase (architecture, development, maintenance, etc.) of an AWS project: “Did I research all the alternatives?” and “Am I using the selected service properly?”
AWS iterates so quickly, if you’re not constantly revisiting your designs, you are likely using either the wrong service or using it the wrong way. Let’s examine some specifics.
• Managed Services vs EC2-based app. Years before AWS even existed, folks had been using fantastic open source load balancer software like Apache, NGINX and HAProxy. In AWS, you can still use those tools by installing them on EC2. If your system is under any type of moderate load, though, you may need 10, 20 or 100-plus EC2 instances to provide this load balancer function. If you need 20 instances a month, that would be an average cost of just under $1,500. And that’s AWS costs only; it doesn’t take into account the time and effort to scale, maintain and monitor that stack. By contrast, the ELB service is around $20 per month, and it automatically scales, automatically patches itself, and comes with point-and-click monitors and alerts. Although this is one of the more extreme examples, the general rule is that if the managed service provides what you need, don’t try to build it in EC2.
• Lambda vs EC2 designs. AWS Lambda is one of the most heavily used AWS service, and for good reason—it’s much easier to deploy, easier to maintain and scale, and is but a fraction of the cost of EC2. If you have custom code you need to run in AWS, first explore whether or not you can run it via Lambda. Just about the only thing you have to use EC2 for these days is if you need to install a completely standalone application like WordPress, Drupal, or MongoDB.
• Reservations. All of the VM-based services (EC2, Redshift, EMR, RDS, Elasticache) let the customer pay via “reservations.” A reservation is a bit like a stock option – it gives you the option, but not obligation, to purchase something in the future at a lower cost. For example, if you know for sure that your company’s app needs at least 20 instances of type “x” in region “z” over the next three years, you can buy reservations for those instances and pay up to 60% less. Most reservations come in one- or three-year terms, and in our experience, the one-year reservation pays for itself after about six months of constant usage, while the three-year reservations are in the green after about a year of usage.
• Bandwidth considerations. Nearly every aspect of your AWS infrastructure is going to be significantly lower than what you’d pay on-premise. Bandwidth is the one glaring exception to the rule. There are three important things to realize about bandwidth in AWS:
o Costs vary by service. DynamoDB and Redshift have some of the most expensive bandwidth costs while CloudFront and ELB have some of the cheapest.
o Bandwidth is only charged out of the region. With the exception of EBS to EC2 (which needs dedicated NICs), nearly all intra-region bandwidth is free.
Building, executing and maintaining realistic plans
You’ve probably been burned by plans before—we all have. Whether it was spending too much time in planning, limiting agility by adhering too strictly to a plan, or running a plan that didn’t work, we’ve come to think of plans as an archaic hold-out from the dark ages of technology.
But that’s not realistic, especially in AWS.
Since AWS has a “self-service” focus, your company must implement standards, guidance and yes, even some plans. The larger the organization, the more critical these plans are to ensuring that you not only have centralization, but that you have effective cost visibility and control. Here are the most important aspects of plans you’ll need to consider:
• Company-wide tagging topology. Tags are simple “key=>value” metadata that you can associate with nearly any resource. Common tags are things like “env=[dev,test,prod],” “cost-center=<client-number>,” “email@example.com,” “compliance=[hipaa,pci,etc…].” Tags provide (at least) four important benefits:
o Access Control. They integrate with the IAM service to allow us to set policies like “deny delete for a resource tagged as ENV=PROD for the DEVELOPERS group.”
o Automation. With everything in AWS being an API call, we can write scripts to list out any untagged resource and then apply required tags.
o Filtering. When your company has 5,000 resources in its account, it’s very convenient to filter that list down to ones where “firstname.lastname@example.org”.
o Billing. As mentioned above, AWS allow us to roll up costs based on tag. If we tag everything, then we can see how much “env=dev” costs us, or if “email@example.com” assets have a higher cost than “firstname.lastname@example.org.”
• Deployment and maintenance plans. One of the biggest double-edged swords in AWS is the ability to do one single thing many different ways. When it comes to deployments and maintenance, however, that sword can dice you to pieces if you’re not careful. What you should aim for in AWS is “one button” deployments where everyone in the organization uses a simple tool to deploy updates to their applications. When you start running at speed, this will probably result in a CI/CD pipeline with a tool like Jenkins in the driver’s seat. When it comes to maintenance, you’ll need to decide what you’ll use to upgrade and patch your remaining EC2 fleet – hopefully a configuration management tool like Chef, Puppet, or Ansible.
• Backup and DR. A stellar manager I had some time ago left me with a warning that I remember to this day: “If you’ve never tested your backup plan, then you don’t have a backup.” Taking regular backups in AWS is just as important as it was on-premises. And if you’re not testing the restore of those backups via regular unannounced simulations, you won’t discover that the plan no longer works. With the high velocity of infrastructure changes in AWS, you’ll want to review DR plans more regularly than you did on-prem.
This is far from the end of your journey
Hopefully you now have some actionable, big-ticket things to think about in terms of AWS cost savings, but it’s important to remember that every service has many settings and configurations that can dramatically impact the costs of using it. Additionally, some services do a more specific, and more cost-effective, task than others. Take S3 and Glacier for example—both store data at the same high durability, but S3 gives you real-time access that is 4-5x more expensive.
This blog is just the start of your journey. You should also review AWS’ “Cost Optimization” framework for additional details. Set up billing alerts, set up tagging topologies, and don’t forget to allow time for your developers and administrators to monitor and adjust every strategy, architecture, and plan you have.
If, like a lot of companies, your team is not yet up to speed on the AWS side of things, take an authorized AWS training course from Global Knowledge—it’s the fastest way to gain the skills you need to put to use.
Never miss another article. Sign up for our newsletter.