How to Optimize and Improve Your AWS Architecture Transcript

Speaker 1:

How to optimize and improve your AWS architecture. So, we have Ryan Dymek. He's an AWS consultant and instructor with global knowledge. Not only does he teach it, but he works in it every day, so a very great resource for us. He's been working with Cloud technologies for over nine years. As a consultant, Ryan helps optimize environments for some of the largest Fortune 500 companies in the world, having earned all nine AWS certifications, Ryan teaches others throughout the world the best practices and strategies to improve their architecture, doing more with less while increasing security, agility, and performance.

Thank you so much for being here with us, Ryan, and feel free to take it away!

Ryan Dymek:

So, let's just go ahead and dive on in here. The topic of today's discussion is: How to optimize and improve your AWS architecture in AWS. Let me just kind of preface what we're going to walk through here, before we dive right on in. Obviously, there's a lot of different technologies you can build in AWS, a lot of different things you could work with. So, this webinar, we're going to go ahead and focus on a particular kind of ... What I like to call "low-hanging fruit" in the industry, and that would be something like a very simple web server design to show you how we might build that out, and the kinds of questions we would pose along the way, as far as how we might improve that architecture over time.

I'm going to kind of walk you through that starting point of where most people are when they first go into AWS. They're looking to build something out; they're probably going to start with infrastructure type services, just a classic web server and database kind of model. We'll go ahead and just start right with that, and then we're going to kind of improve this and optimize it as we go. What I'm going to do here is, in some cases, I will be showing you common halves that many of my customers have taken over the years, and pain points that you might encounter along the way, and ways that we can potentially avoid those pains ourselves. One thing I always like to encourage is, of course, learn from others whenever you can. Others' pain is better than our own, so a lot of this is coming from common, common problems that I see constantly from my customers in AWS.

If we're building a web server in AWS, we're going to start with a VPC. We're going to probably pick some sort of an availability zone there, one of them that you choose. We go ahead and launch a web server and a database. This is pretty traditional. There's not much going on here, not a lot of resiliency, very basic, single server. You may be prone to wanting to make this server bigger, and bigger, and bigger over time. Of course, that's just a single node, there's not a lot of resiliency, not any real scale there. You can certainly scale better than you could in your regular on-premises environments, because you could ... You could change the size of these resources when you need. Not necessarily on the fly. In this situation, we have some downtime potential. We might have to resize the server, and that's going to require at least the amount of time it takes to reboot it.

We're going to have to shut it down, change its size, and start it again, or the database is the same thing; if we need to expand the size of that database server, depending on how we built it, whether we're using Amazon Services or just our own virtual machines. So, pretty basic design. I think a lot of people as they go through, learning how to work with AWS, you see kind of these best practices. We talk about scaling and resiliency and all these things, and so you get into a much more advanced design. To expand on this design, you would add some high availability, you add some scaling ideas. We're going to have to balance that load, and so our final architecture might look a little bit more like this.

I'm just kind of taking you through that path, right? There's some good design. This is something that you'll see a lot of times discussed, and seen in a lot of white papers, and this is very infrastructure focused. It's got a load-balancer, you've got your various components that you need to deal with to make this a valid solution. Now, realistically, a lot of companies have a hard time getting from that point A to point B. Right? So we talk about auto-scaling. This is an ideal thing, but where do you stand on that front? I've actually had customers that knew they needed to scale.

Think about this, the reason we scale is really for two primary reasons: One is to save costs, and the other one, actually, is we can use auto-scaling for resiliency, where auto-scaling can maintain a certain amount of resources, right? But from a cost perspective, typically, you wouldn't do auto-scaling if costs were no issue, right? If we could just go out and buy a thousand instances and run them 24/7, it would be fine. We wouldn't have to worry about scaling. Obviously, that's not ideal. We're going to be throwing a lot of money away doing that, so auto-scaling is one of those things that's preached quite a bit in the Cloud. But what does this actually look like?

The hardest thing to auto-scaling is actually not turning it on. That's actually fairly easy to set up, an auto-scaling group and policies and things like that that make it happen. But what goes into those policies? Right? There's a lot of logic that happens there. What size instances should we be using and scaling across? What ... When do we scale? Do we scale based off of CPU metrics? Do we scale off of latency? Do we scale off of all these things? And that's where a lot of people tend to lean towards. Now, I've had companies know that they needed to scale, six months later, they're still not scaling properly here. They're just hard-fixed on instance sizes, and nothing's going away, neither shrinking in or expanding out, because they're having a hard time finding the right logic there.

So, right off the bat, there is an ability with auto-scaling to do time-based. I would like encourage you to take a look at that, if this is kind of the path you're on. Time-based, at the very least, you can have a "scheduled action" is what we call it. Maybe you know that you have a higher load during the day than at night, so maybe you scale out and add many nodes during the day, and then at night, you go ahead and trim those back. At the very least, you're going to save a substantial amount of money doing that, so think about that from the cost perspective.

As an example, there's 40 hours in a work week. A lot of us are working more than that, but let's just go with easy numbers, 40 hours. There's 168 hours in an actual week. If you do the math there, that's 75% of your time could be saved, to some capacity, if it can scale to zero, great. If it scale to just be less, you know that 75% of the time, you could have less resources running, right? And that's going to have a net result. So I usually encourage people to take a look at that, but all this auto-scaling, all these different features, there's a lot more to this design than what we're seeing here, and what I have on my slide here.

There're components like storage, right? Whoops, my apologies. I hit the wrong direction. Let's try that again. There're things that we need to consider as this evolves, and there's a lot more to it than just this basic architecture that we see on the screen. Things like storage, right? So the web instances themselves, assuming that this is a web server that we're talking about, or web application. What are we using there? Are we using attached storage? Are we using EBS volumes? Are we using some sort of attached storage like EFS, Elastic File Service? There's a lot of options we can come up with there. If we're doing something really rudimentary and using just regular EBS volumes, we're going to have to worry about having to replicate that information across our instances; as we upload information to a website, we're going to have that on all the nodes.

Now, let me talk about that one real quick. Not only is that a pain, right? That's a real pain and something you would have to try to synchronize. There's a much greater financial cost to you when you do that. So, let's just assume ... You know, in a perfect world, you're scaling and you've got many small instances instead of a couple of really big ones, which would meet good resiliency definitions. But now you've got more nodes. More nodes means more replication. Let's assume I have 10 nodes as part of my web server cluster, and for every one gig of data I upload, I'm not really uploading one gig of data. I'm actually having to deal with 10 times that, one for each node.

When I say "data" here, I'm not talking about the data that would go in your databases, but your images and your headers and footers, and static content that people may have to download off of your site, whatever that looks like. So, really, if it's 10 times the actual storage because of that replication that has to take place, your financial cost is 10 times as well. Literally 10 times. That can get very expensive in the Cloud. Good strategies here are not just about resiliency, there's a lot more to it. I want to be resilient. We want to also think about the ramifications of those resilient discussions. A lot of times, we'll talk about resiliency, and we'll say: "You know what? Instead of just running one node, we should run two. Or maybe instead of two, we run 10 that are smaller, but equal capacity," because 10 smaller nodes are going to give you more resiliency than just two bigger ones.

But then there's a trade-off there that you have to understand that just exists, right? So at that point, I'm duplicating my storage, and there's a bunch of other factors. But I can also abstract that storage out, and I can say: "You know what? I want to go ahead and do shared storage instead," and that's where things start changing. If I did something like EFS, Elastic File Service; it is certainly more than an EBS volume, but it's already highly replicated, and it's not going to cost you more as your environment grows. At the end, it can actually be cheaper than running something like EBS volumes. But EFS, and I don't have an icon for that up here, this is just a possible option there.

We'll talk more about that shared storage in a couple of moments here. Database storage is also something we're going to have to consider, right? Storage in general is always a problem. As this stuff grows, my databases grow, things become more problematic over time. My databases, in most cases, especially with the relational databases, are only going to be able to grow bigger. I'm not going to be able to go horizontal. I can't add more nodes to a database, usually, if we're talking relational databases. If we're talking NoSQL, then everything changes.

Network inefficiencies. So, there're some things here to be aware of. If I've got instances spread across availability zones to go ahead and make stuff more resilient, make stuff more able to handle an entire availability zone failure, which is typically touted as a good practice, or a best practice; there're some caveats that do come with that, that we want to be aware of. There is typically cross AZ traffic costs, data transfer out. If this right instance has to talk that direction, and then you get the data over here, or you get the request and then the response goes the other direction, we're going to actually pay for outbound data transfer on both sides of that equation. So in that case, that could be a cost for me. It depends on what it is we're talking about here. If we're dealing with NoSQL databases in that scenario, that can actually get very costly for us. We need to be aware of that kind of data movement and what's going on.

Keeping in mind, AWS doesn't charge us for networks. Networks are completely free. You don't pay for any routers and switches and any of that stuff, so this outbound data transfer does make sense, but it's commonly misunderstood that you don't pay for anything across the availability zones. It depends on the product we're talking about, and you want to go look at the product pages for each product to understand what those billing metrics are. EC2 instance to EC2 instance, or EC2 instance to the database, in most cases, you're going to pay across AZ fee. However, there are some other services that we might consider, and we'll talk about these in a moment, something like S3. S3 does not incur that fee. We'll address that when we get to that point in the discussion.

Also, operational concerns. How are we going to get our code onto these servers? We're going to have to address that. Usually, configuration management, we're going to have to deal with something like [SHAFF 00:12:19] or [Puppet 00:12:19] or Ansible or [Salt 00:12:20]. Any of those types of products that might exist, or there's other ways to approach that as well, but that's all additional infrastructure that we're going to have to come up with now. That's all ... Other costs we're going to have to evaluate, right? Also, we have our code and our language efficiencies here. This is probably something that's probably one of the least evaluated things in the Cloud for less mature companies.

I say "mature", I say mature in the Cloud, right? How efficient is the actual code that you have chosen, the language you've chosen? This is usually not something we have to evaluate when we're working on-premises, because you've already got the equipment purchased, your investment is already there. If one language is more bloated than another, or more inefficient than another, you're not really going to notice it much as long as the output to the customer doesn't change. If it requires more storage, or if it requires more IO, or if it requires more various components; the customer may not notice the difference if it's a couple hundred milliseconds, or a single digit millisecond difference, so you're not really going to care.

In the Cloud, this can magnify itself. An inefficient code language chosen may actually require me to run more resources at the end of the day. More resources means more cost, more scaling that I'm going to have to do, less users per instance than I could potentially handle. That kind of thing, right? So something very important to be aware of. Actually, the language you choose ... Let's say this is a traditional web server running PHP. You can compare that to something like Java or Python or other possibilities as well, and really evaluate that and what it is you're doing with this web server.

Then security. This is a really big one. By the way, just to kind of set the stage here: We have kind of a short time together. This is a very big topic, but right now, I'm just focusing mostly on this kind of more traditional thing that most people see. A lot of people are starting in the Cloud, working with these types of infrastructures. We are going to grow this as we go through the discussion. So, security. How is that affected here? We've got things like firewall rules with security groups, and network access control lists, and all those possibilities. You've got instances with operating systems on them, so you're going to have Host operating systems there, which means you're going to have Host firewalls potentially. But then you've gotten a security of, again, the code comes into play. If it's a good old classic PHP web app running Apache, you have to deal with your HT access files and your permissions on the OS, make sure you don't have any open permissions there; make sure there're no vulnerabilities, making sure you have all of your services stopped that shouldn't be used.

All of that are things that we're going to have to evaluate, and think of this at scale. So when it was just one instance, that's maybe not a big deal to manage all of that. We're talking about potentially a much more highly scaled system. Those are all things we're going to have to consider, and don't forget, scaling this stuff. As this evolves and grows, scaling security is not always an easy task. Granted, AWS certainly makes that easier than on-prem. Things like security groups being able to call other security groups is a huge factor there. Only having the ports that we need open and nothing more, since, by default, AWS closes all ports, right? Security groups only have allow rules, not denies, so we're really only focused on the allows; exactly what we want and nothing more.

That's all good stuff, but if you allow all traffic on port 80 to a web server, there's a lot that can happen on port 80. Right? There's a whole lot of stuff that can be breached and vulnerable there, and again, breaking it down to the code in the web server itself. A lot that we're going to have to deal with there from a vulnerability perspective. The more instances I have, the more possible invasions I can have, the more potential systems that we could have in effect, so we're going to have to make sure we're watching for that very closely, and costs to all of this.

As this grows, my costs are pretty much going to grow exponentially, and if ... Go back full circle, I have inefficient storage, inefficient network communications happening here. I've got inefficient operational concerns, and a poor performing language or inefficiently written code. All of that is actually going to have a cost to it. There's actually going to be a financial cost that will come out of all of those decisions. These will grow and amplify themselves, and a small cost inefficiency when you're small, not usually a big deal, even if you know it is an inefficient thing, you're not thinking much of it on day one. You really do care when things have grown by a hundred times, right? And now suddenly, the discrepancies are enormous, and you're wondering why the Cloud is so expensive.

This is not a bad design, right? But we can improve on this and actually save on costs substantially, save on security, improve this overall function. Something like introduction of S3. S3 is a pretty basic service. It's one of the older services that Amazon introduced, and it's still extremely prevalent today, and that is kind of a centerpiece for a lot of things that we do in AWS. You may not be aware of it; you can actually host entire websites on S3, as long as you're not dealing with server site code. It'll host your HTML and your JavaScript, and things like that. Maybe just the content of instances, the website I have built here, I can put all my headers and my footers, and my JPEG and my CSS, my HTML, my JavaScript, all that stuff could actually be living on S3.

If I do that, S3 can serve it up. If it serves it up to my customers, my website is still serving up my code, but at that point, my instances are just dealing with my code, whether it be a PHP website, or whatever else it is you might be running here. But as much of that content we can get over to S3, we're actually going to run substantially smaller instances. Fewer of them, less of them. Also, you have a side effect of the storage, right? The storage's having to be replicated and duplicated continuously. S3 is the most robust storage that Amazon offers, but also the cheapest, so it's a great place to put that data, and now, you just put one copy of your object there. You put one copy of your file. Let's say I've got my PDF that people are going to be downloading off my website. I put that on S3, I only have to store it once, because S3 is already replicated, highly replicated across the region, and in fact, it offers us 11-9 durability there.

I put that object there, I put that file there in S3, and now that's one less piece of information running on my instances, right? And I don't have to replicate that across my instances. In fact, my instances don't even have it at all. So, that's going to improve that design substantially. It's a step in the right direction, anyhow. Now my instances are just running my server side code. All of my media, all of my static content, my CSS file, my HTML files, anything I can even convert to HTML. If it's PHP, I can't put it on S3, because that's server side code, but what I could do is, instead of doing some conversions, creating and maybe generating static PHP sites into HTML or JavaScript, things like that; that all can sit on S3, or maybe even consider my overall language of what I'm using.

Maybe PHP isn't the long term choice for us here. I'm just using that as an example. There's obviously a lot of languages you can use, Java, and so on. Also maybe consider the ease of something like DynamoDB. Now, this is where things do diverge a little bit. If you're using a commercial application, you're not always going to have the ability to just switch over to DynamoDB. DynamoDB is a NoSQL database. If you're using even something less commercial, but it's just not designed for NoSQL, it's maybe not going to work out of the box.

If this is a website that you've created yourself entirely, and you have control over those calls to the database, you can consider the use of DynamoDB. DynamoDB will typically be cheaper and faster and more resilient than running database instances though. So, there's a bigger question to be had here, right? Is DynamoDB ... is a managed service ... All you really care about is the data. You don't care about any of the nodes, and in fact, you don't even have visibility into the nodes, it just automatically scales for you. There's more to it, but at the end of the day, you're just writing API calls to it. It's not even ODBC or JDBC type connections. We're talking about API calls now. In DynamoDB, you just pay for the amount of read and write capacity that you want to be able to handle, so in other words, how many reads or writes per second you want it to deal with, and it deals with it, and then you pay for the storage separately.

So you pay for the storage, you pay for the reads and writes, and that's what you got. That's typically a better pricing model than running big databases that have to be able to handle those fluctuations and they're scaled usually at a much larger size. Your costs at that point will typically be much cheaper in DynamoDB. With that said, not every application is going to be able to move over here, so what do I do? Well, it's not always an "if" or one or the other decision. In many cases, you can start peeling pieces off, especially if this is a custom application or a custom web interface or a web app that you've created. You can start moving components. That low-hanging fruit, you can go ahead and take things like temporary data and caching information, and session information; a lot of that stuff could be moved over DynamoDB quite easily.

Then you have to start looking at the core components, but if you can get stuff over there to DynamoDB, it's going to be accessible from anywhere in the world. Obviously, not permitted. There's a whole level of security there. You have to be authenticated and authorized, but you shifted away from the need to make ODBC or JDBC connections, IP addresses no longer matter, so operationally, this might be a little easier. And then also, we can consider using a caching service inside of our network, so something like ElastiCache runs either [Retis 00:22:27] or [MCacheD 00:22:27]. So, something ... A really easy example discussion, since we're talking websites here, what if I'm running something like WordPress?

WordPress is the most popular content management system in existence right now. It comes with its own caveats and [gotchas 00:22:43], but one thing you can do is run MCacheD with it. There's plenty of plug-ins that work quite well with that natively. There's caching engines and caching plug-ins that we'll just plug right into it. So, if it works with MCacheD, it'll work with ElastiCache. ElastiCache is now going to shift database calls to a caching engine when possible, so if it's in the cache, it'll get the results from the cache out of memory. That's going to speed things up, it's going to decrease the costs of my database instances.

My instances at that point, are going to be able to be smaller; less calls, fewer calls to the databases. Your resources, I can go smaller, right? A caching option might be an example there. I want to illustrate, I'm just throwing some ideas out here. At you're looking at these architectures, you're looking at how you can improve them. The point is, try to minimize the amount of reads and writes, the amount of compute requirements. Try to centralize that stuff as you can, as an example: Centralizing your storage ... You know, instead of dealing with individual scaling of nodes on my databases, maybe instead of having to scale the database, I add a cache and actually allow my databases to go smaller and not bigger.

Now, it all depends on if it's heavy read or heavy write. If it's a heavy write database, the cache is probably not going to help you. If it's a heavy read database, cache will. These are all things you have to kind of evaluate as we go, but these are some possible components that we could introduce to kind of help expand our capacities here. But is this a sure win? The bigger question here is: Is just plugging in S3 going automatically make things cheaper and faster? Maybe. It is going to make my instances not be as big? Probably. What are the results though? There's some interesting considerations we have to address here, and let's talk cost for a second.

S3 is the cheapest storage in all of AWS. However, you also pay for puts and gets to S3 separately. So, something you have to be aware of is the type of application it is you're running. It's something that you have to be really cognizant of, if it's a really high write rate to S3. It could actually be inefficient, and that's where we have to consider potentially a different product or a coupling of other products. If I was sending a whole bunch of logs to S3, log events, not log files, but logs, log events as they come through; that would be a whole bunch of little tiny micro writes. S3 is probably going to be very, very expensive for us in that case, but then that's when we consider other services like streaming services on Amazon for that purpose.

The point I'm trying to make there is: You hear this kind of advice all the time, "Oh, just move everything over to S3," and I would say, 95% of the time, that's probably going to be a good choice. But don't just [blanketly 00:25:38] make those choices. Make a conscious choice, and evaluate it. Evaluate your reads and your writes. The costs in S3, just because you are charged for puts and gets doesn't make it expensive. The question is how many puts and gets you're actually performing to S3, right? You have to do the math, right? Writes are 10 times ... a-proximately 10 times the cost as gets, as reads. Puts in S3 will typically run you about 10 times the price as gets, or lists. High write, we want to probably use this a little differently. It doesn't mean I won't put my data in S3. It means that I may have to have an intermediate solution there to kind of stage it and then batch it, or something like that, right? Streaming services, things like that.

Security with all of this, though. What have we done with our security? If we move much of that stuff over to S3, we have static web pages, we have HTML pages, JavaScript, things like that sitting on S3, and less of it is running on my instance; those things are actually now just read-only objects out of S3's storage. There's actually nothing to compromise. They can't be written to. There's no permission that somebody can modify on a file to make changes to it. So more of that kind of stuff we get, we get off the instance as well, it's less vulnerability points in our infrastructure. If I've got HTML being generated, even if it is PHP under the hood, but I'm generating HTML pages and putting those on S3, and hosting them there instead: Now there's no PHP code for somebody in that situation to compromise. That may be on page by page basis. You may do some pages that way, and other pages not, but you're actually improving your security substantially by getting things off of those instances.

You see two instances like your virtual machines, and they are not insecure. I don't want to scare you from those instances, but compared to Amazon services that are much more robust, they are the least secure of pretty much all the other AWS services offered, and the reason being is you have to manage that security at the OS layer. You've got all of these different ports that you have to deal with. You've got a whole bunch of services running on those, you've got to worry about patching them on a constant basis, and there's a lot of entry points that could potentially be vulnerable to you. Whereas you take an Amazon service like S3 or DynamoDB, they're fronted by just an API call. That API call has to be, of course, authenticated with keys on every single call on everything that we do. We have to be authenticated and signed, and then you have to have, of course, the proper policies in place to allow that call to that resource.

So, substantially more secure by moving over to API-only kind of interfaces, like those services. It's abstracted out. There's no OS access, there's no way for me to actually get to the underlying S3 service, right? I can make API calls to it, I can make puts and gets, but I can't actually go somehow try to compromise underlying servers, they're not exposed to anybody. Resiliency of all this: Amazon services like S3 and DynamoDB are already totally tolerant across the entire region, so I am ... I am moving that resiliency to a much more resilient offering.

Something that is very important from an architectural perspective, and this is something that a lot of people overlook: If I have two components that are each built to be 5-9's, 5-9's meaning availability, and 5-9's would basically mean that I could afford up to about five minutes of downtime every year. If I have two components that are each 5-9, I don't end up with a 5-9 application. Think about this. Component A could fail, cause an application outage for five minutes, and then component B can fail at a different time for up to five minutes, making a total aggregated downtime in my application 10 minutes out of the year. That is not 5-9's.

I have to make sure I'm using very fault tolerant application components in my application to make sure I increase that uptime, so services like S3 and DynamoDB are really going to kind of help us make those extremely resilient products, because the underlying components are much more resilient. An EC2 instance, a virtual machine, is more prone to fail than the S3 service or the DynamoDB service, just as an example. Now I've got components that make up my application that are much more resilient, okay? And I don't have to do anything for it. In fact, they're typically going to be cheaper than what I was designing previously.

Performance of all of this: These services are, of course, highly balanced. They're not scaled just for you, they're scaled for hundreds of thousands, if not millions of transactions going on every single second. What you're going to do to it is typically not going to cause any issues here. DynamoDB would be scaled specifically more towards you, based on your data structure and partitioning and all of that, but take S3 as an example. You're certainly not the only one using S3, and you don't get your own partitioning, if you will. It's going to be much greater than that, so S3 is ... And many of the other Amazon services. These are just two of the common ones I like to call out. Everybody's got different levels of experience in the Native US. I want to make sure that we're using the simple stuff to start with here.

This is where you hear the term "serverless", and serverless could help you here. We could talk about using ... And of course, S3, DynamoDB, those were the start to going the serverless route. Now, the thing about serverless applications means, obviously, there're servers under the hood, but we don't see them. They're abstracted from us, we're just making API calls to the service, that's it. We don't care about the service, we don't care about the underlying components. There's near unlimited scale here with the serverless designs. We can build something today, and it can have 50 users today, and we can go to 50 million in the near future, 500 million, whatever you want to build here, and the stuff could grow with you, in theory. Stay with me, in theory. We'll talk about some of the things that can go wrong here.

Usually, the serverless designs are already fault tolerant. They're almost always highly available. Fault tolerant, just to clarify, fault tolerant means I can lose a component and have zero downtime and zero performance degradation. It is a subset of high availability. It's not the same, you can go read up on that if you like, if you're not familiar with the terms exactly. Typically, operational burden is substantially less. We're not dealing with scaling, we're not having to deal with replication across nodes, and when I say scaling, meaning dealing with things like having to scale that storage, scaling the nodes, all of that kind of stuff. It's all just done for me.

Typically, I get a lot better security, right? Because normally, you're going to have network controls like firewalls that are going to control traffic between resources in a more traditional environment. Networks are actually pretty low in their security, think about this. I allow server A to talk to server B over a particular port number, and what's going over that port number? What actions are actually happening? We don't know. We don't necessarily have any visibility into that. That's at a different layer. When you get into the API layer of things, now I can say I want this call, this one call, to be able to go to this one resource, and that's all that can happen. And by the way, maybe we can even have other restrictions around that call to that resource. When I say a resource, like a piece of storage or even a particular file, a particular object in S3. I can go very fine-grained here. But not only maybe just that call to that resource, but maybe something much more deeper than that, right?

I can even do time of day, I could do ... I could do something from certain sourced IP's still. I can still have those levels of controls, so there's a lot more of what we call "conditions" that I can apply to it that would have to be met for that call to occur. It's all software now, but we can go much more fine-grained than network ever offered us. Traditional network designs are substantially less secure in this sense. As an example: If I have a backup server or something I am backing up, let's say that web server application, I'm backing up some stuff every day, and it has to send some data over to S3; I would allow that web server to just make a write. It wouldn't be able ... It would do a put, it wouldn't be able to do gets, lists, deletes, nothing. All it could do is put, and all it could do is put exactly what I say I want it put, and that's it. That'd be very secure, compared to a server just talking to another server over a port number.

That's what I was getting to there, my apologies, I didn't advance that. Serverless is ideal, but there're some things that can absolutely go wrong here, so we need to understand our limits. What I mean here is: There are server limits on just about every single product that Amazon offers us. The limits are, many times, soft limits, so we can contact Amazon and increase those limits, but if we don't know those and we don't understand them, we can hit limits on our application that we didn't expect, and it could actually cause us some big problems.

Also, some of those limits aren't limits in the terms of API limits or service limits, but maybe limits in the way that we've created our application. Let me explain this one real quick. Actually, you know what? I've got this in one of the other bullets, so hold tight. It's kind of covered right here. There's this body called [Lambda 00:35:11], and you may be familiar with it. It allows me to just run a function of code when I make a call to it, it'll just execute it. Sounds like magic, and it kind of is, it's an amazing product. But there are some gotchas that can come with that. Depending on the languages that you choose here, you can have substantially different performance outputs. As an example, I think Java is a great language, but specifically, the JVM engine can actually take a much longer time to start up than the Python or Node. Python tends to be one of the fastest here.

Actually, surprisingly enough, Amazon recently released C# in this, and C# actually beats them all. Go figure. But Java is going to be the slowest to start, so if I'm activating ... I'm making a call to the service kind of ad hoc, or it is happening on occasion, and it makes a call to it; it may be very slow, it may be very inefficient, and that's actually not an inefficient factor of the service. It's actually an inefficient factor of my code and my code choices, so those are things that we need to pay attention to.

We want to also pay attention to the fact that we are, in a lot of cases, paying by the API call. We're going to pay, as an example, to trigger that Lambda function to run, right? So we send a call to it, and it runs. But think about this, I'm paying for Lambda to run in hundred millisecond increments. If its inefficient code and it's running a lot longer than it needs to, my bills could be substantially higher, in magnitudes of hundreds of times higher than they need to be. So let me explain this one real quick.

As an example: I've seen poorly written Lambda functions where it should be stateless. You should make a call to it, it should run, it should run in a couple of ... Maybe a couple of hundred milliseconds at most, and it could be like a stateless request to a queue. From there, you could make database look-ups or whatever. But instead, maybe they have that Lambda function doing an ODBC connection to a database, and that in and of itself is not a problem, but what if the database is struggling and having problems, and is actually not able to respond to that request? You might actually have a time-out, and you might be waiting two minutes for the database connection to be severed.

So now Lambda is sitting there, running for two minutes, when it should have run for a hundred milliseconds, just as an example. That one time that happened just cost us a whole lot more. Now imagine if you're doing that over and over and over again, and you're waiting for it to time-out. The point there is structure and how I handle my code, I want to make sure I sever that connection early and have some good logic there, and make sure I say, "You know what? If it's happening in a hundred milliseconds, maybe I do a one second timeout there, and if it doesn't respond in one second, kill it and try something else." Or even return an error back to the customer. That's better than a hang, right?

Also, you're paying by I/O, right? In the case of S3, puts and gets. A really high write rate could be a problem so again, any efficiently written code will actually be the most effective ... a high effect on my code. Excuse me, I'm sorry. Inefficient code will be a very high effect on my costs. If there's something where I don't have to do five puts or five gets, and instead, I can just do one and include all the information in the one, I've just decreased my costs on my puts and gets by 80%. Five to one at that point, so really efficient code structure is actually the focus of the Cloud, and the cool thing about this is now developers have always been struggling to get that extra time in their projects to write clean code, and it's always a complaint of developers that the business, a lot of times, pushes them faster, and they just can't get that time.

Now, clean code results in a bottom line savings, and in some cases, very, very sizable savings in AWS. Storage consumption, also good most of the time. Keep in mind, if you don't compress data, or you don't de-dupe it or things like that, there is a direct fee to doing that. I should be compressing and de-duping whenever I could, or whenever I can. If I compress, there's going to be compute processing taking place there, but typically, compute is substantially cheaper to just do that one time than to store it long-term in an unprocessed or uncompressed and duplicated data state, right? So storing it, you have to think about the efficiencies of how I'm storing my data.

Let me tell you a quick thing here on code practices, coding practices. In a traditional development world, developers would write code and they'd be told to check ... Let's say you're creating a resource, you're creating a file, you're creating something. You would check for the existence of it first, if it doesn't exist, then you create it. That's two calls, not one. So, Amazon would actually encourage you, as an example: Maybe we're creating an S3 bucket, or we're creating a DynamoDB database, or whatever it is. Don't check for its existence, just do it, and do error handling. Let the error tell you, "Hey, that already exists. Try again," and that's a different logic. That's going to cut out though, most of my calls. I'm going to actually have a reduction in my calls by 50%, approximately by doing that.

So, the old practices ... Some of it has to change. We have to unlearn things that we've been doing all along. So, DevOps pipelines here, good pipelines are really important, good DevOps practices will really help streamline and save us money. Just keep in mind that your DevOps pipeline themselves become a production environment. So consider this design high level. We're getting short on time, but you've got this user here. I want to go ahead and show you, maybe we can rewrite our entire application here.

We have our users, and we actually have an entire website built entirely serverless. So now what we could do is maybe consider writing JavaScript in S3, or excuse me, writing JavaScript and putting it on S3. Now we do a website called: www.mywebsite.com. The JavaScript is returned to the browser. The browser actually goes an authenticates the user, so we're going to have user authentication built in with this service called "Cognito", it gives the authentication, and now it can issue secured signed API calls to this API gateway service, and run back-end functions that ultimately can make database calls and storage calls and so on. That design, if there're no users happening right now, if there's nothing going on here, I pay nothing, other than other storage that might be happening. Otherwise, it's sitting there doing nothing. There's not going to be any fees for a Lambda function to exist, only to run.

Just some basic considerations here. Again, all those things we've talked about all along: Costs, security, performance, resiliency, and maybe what the effects of my operations burden; these are all things that are coming out of what we call the well-architected framework. Go check that out and learn more about that if you're not familiar with it, but these are how you're going to evaluate the optimization of your environment, how to optimize it for costs, how to optimize it for security. There's a lot of great questions to be asked. Your good designs are going to come from asking really good questions.

Here's some additional resources. The YouTube channel is what I'd recommend. Amazon makes all new announcements in AWS, all the new announcements are constantly coming through. Keep a pulse on that, recommendations on what I would recommend today might be different tomorrow, if a new product comes out or a new way of doing things comes out, it doesn't mean the old one is wrong, but now it may be better, so you have to keep up with what's new in this stuff. Qwiklabs is a great place you can go sign up for free labs, and then the Well Architected Framework link there.

Any questions at this point?

Speaker 1:

What is the quick way to transfer large files from on-prem to AWS?

Ryan Dymek:

Oh, great question. The quick way to transfer, you still have to deal with the size of the data to transfer those, so quick is always relative. There's a few different ways you can do that. You can actually ship your data if it's big enough. Amazon has a few services to do that. There's Snowball, AWS Snowball. There's Snowball Edge, and there's even Snowmobile. It's actually a way to actually transfer up to exabytes of data. It's crazy. That's one option.

Also, if you want to move things into S3, there's multi-part uploads and things like that you can look into, where it will basically split your files and send them in parallel streams, so that the data is maximizing your bandwidth and getting them there. Also, on that real real quick, if you ever need to get your data back, the process is equal. It's the same, you're never stuck. You can get it back just the same as you can put it in.

Speaker 1:

So what is the best way to synchronously replicate data, non-database across AWS?

Ryan Dymek:

Synchronously replicating across the AZs. All right, so synchronously replicating, if we're talking databases, you just use the native database. I think the question said non-database. Your synchronous replication would mean that the write has to occur at the same time on both ends. So, if you're talking files or things like that, you're probably better off using S3 as an example, as kind of a staging place because S3 would have the data synchronous in all the AZs. Synchronous, though, you're still going to have a concept. You can go look it up, we don't have time for it, but called "eventual consistency", that could be a situation that you have to deal with with S3, but that's where you're using Amazon services that are across the entire region are going to help you.

Something like DynamoDB is also a good place to store information, so maybe not files, but data, right? And in that case, you could write data to DynamoDB, and it could be consistent across all AZs. All of your AZs could reference that data, and it'd be very consistent there. Yeah, there's a few strategies there. If you just want to keep things in just sync, and synchronous is less of a concern for you if you're okay with slightly asynchronous tools like [R-Sync 00:45:06], [RoboCopy 00:45:08] on the Windows side, maybe. Those types of tools, and then, of course, you can always load third-party tools on your AC2 instances. There's a lot of third party partners and tools out there that you could work with.

Speaker 1:

There's a couple of follow-ups here. Looking at SQL database to replicate SQL database on EC2, and sends across AZs.

Ryan Dymek:

So, specifically, a SQL instance, you really would just use the native SQL replication. That's going to be your best offering. In fact, if you use a service at Amazon called RDS, that might be a way to do that for you, make that easy on you. RDS is kind of thing de-facto go to for a lot of organizations running SQL databases, because it handles that replication for you. It handles the cross AZ, and is taking a lot of the burden off the management. If you don't want to do that, you can always run EC2 instances and load it yourself, but all the stuff that Amazon has done, all that R&D to get that SQL replication set up, you would have to do yourself.

It's not any different in AWS than it is on-prem in that scenario. So just as you can do on-premises replication, and you could do synchronous or asynchronous, depending on if it's Microsoft SQL or if it's Oracle, or MySQL. They all have their own flavors of replication. You would just use the native database replication at that point. You're not going to do anything special. Don't overthink that one. We're not going to try and replicate disks or anything like that. That would get dangerous when it comes to databases, so native replication.

Speaker 1:

How can I get the true breakdown of my cost in AWS?

Ryan Dymek:

Ah, that's a really good one. This one, you're going to actually go into the billing area of AWS inside of your account, and there is an entire detailed billing reports and budget tools. There's a cost explorer tool; there's a lot of ways that you can visualize this stuff. If you do the detailed breakdown of your costs, if you actually do the detailed billing report, you can actually get your utilization by resource down to the per hour or less. It's a lot of data. It'll generate CSVs for you that you could actually process, for some companies, that actually turns into big data itself, and they actually create dashboards and things like that around it. Billing in AWS is also completely API driven, so you can make all of your own calls to this stuff as well, and make your own tool bases, and then, of course, third parties.

There's a lot of offerings out there as well that third parties could tie into your billing and give you advanced billing tools as well. A lot of options there on the billing side, if you really want to see what that breakdown looks like. There is no hidden cost. Amazon is all about you being able to have full control over every single penny that you spend. There is no packaged deals, if you will, where things are going to get buried. That's going to be something that you have extreme details on, and then use that to optimize your code, as an example. Right? If your I/O is too high on S3, if your puts and gets are too high, we use that to look back at our code and see how we can make it better.

Speaker 1:

Is there a tool or process to check for ... I mean, unused storage devices, servers, other services that could reduce the cost?

Ryan Dymek:

That's a good question. That's a really big, big question. As far as unattached, you can, in fact, run calls. You can go into like ... Like inside the EBS management area, and you can see which volumes are not attached to servers, as an example, are not in use, you do want to also pay attention in S3. A lot of times, there're versions, if you turn versions on, you can have old versions hanging around. You want to do the version commands or version look-ups. That's where the CLI, in a lot of cases, is going to help you. There's a command line utility, that's going to actually give you visibility into things that maybe the web interface doesn't, and that's going to be a good starting point. Start with your bills. Look at your bills and see where that storage is, see where it seems high, and go back and investigate down to those lower levels.

Hey, there's old versions hanging around. Also, if you've done multi-part uploads and things like that in S3, there may be actually parts hanging out there from a failed upload, or something that only went halfway, and you want to remove those old parts in time, so there's some management, maintenance, that you want to do there. Also, when it comes to S3 storage; I know this is very targeted on S3, but there's life cycle management too, so you can set up that life cycle management to say, "Go ahead and transition storage to cheaper storage, delete old stuff," so it's actually good practice to not allow human beings to delete data, but let policy do so, so the policy will kind of hard control that.

That can be a way I can clean up my data as well, making sure that, again, I'm getting rid of old versions and old parts, and things like that. As far as old volumes sitting in EBS, again, look for unattached volumes, things that are unattached to an EC2 instance, and then snapshots tend to be a really big culprit. That would be a whole other discussion we would have to have, but snapshot management, making sure that your snapshots are well maintained, there's a much bigger discussion to be had there. If you're snap-shotting, a quick note on this one: If you're snap-shotting root volumes, like a C-drive on Windows, or a root reboot in Linux, be very, very careful, because those snapshots are going to contain changes as you go, changed information. Think about that.

Your cache, your temporary files, all of that stuff is ... Your paging files, things like that, are actually all going to be part of that snapshot. I am not a fan of snap-shotting root volumes, C-drives, root volumes. I would actually rather create secondary volumes and at that point, snapshot the data and not really worry as much about the operating system. We can handle that one in a separate process, take images of it and do other things, but those are kind of the key areas where people tend to overspend on their storage, is unattached volumes, snapshot data, and old S3 data just kind of sitting around. Those are probably the biggest factors.

Speaker 1:

I've hit walls using dynamic PHP on AWS serverless artifacts. Is there any best practices out there for serverless architectures?

Ryan Dymek:

Yes, there are. You can start looking at SAM, Serverless Architecture Model. It's actually a whole ... Another way of doing things, something worth looking into. To be honest with you, PHP just tends to not be great for serverless stuff. Usually, people are tending to shift more towards things like JavaScript, Python, and even Java at times, quite a bit as well, but taking a look at that Serverless Architecture Model, SAM, is something I highly recommend you go take a look at, and there's a lot of background around that and making good process, good automation around it all and having those good practices.

That's a big ... A very big discussion, obviously, in and of itself. I would highly recommend anybody to if you have any of these questions, there's a whole white paper section in AWS. Just do a Google search for AWS white papers. The white paper section, there's probably hundreds in there; you can go find the white papers on serverless architectures, and depending on the specific questions that you might have around that, there're white papers for just about everything you could imagine in there. That's a really good resource as well.

Cart () Loading...

Subtotal

Topics

Brands

Topics

Brands

Transcript: How to Optimize and Improve Your AWS Architecture