Cutting Costs in the Cloud: How Can You Save Money on AWS?
Hi. I'm Max Clark. Common conversation I get into with customers is how to save money on their cloud bills. Now there's a lot of horrible terms. A lot lot of people call these things hyperscalers now.
Speaker 1:When I say cloud, I'm really talking about AWS, Azure, GCP. Go down that list. So we'll just we'll just make this really simple, and we'll talk about, AWS. Since they're the largest, most people have them. I can keep the terminology straight in my head.
Speaker 1:So how do you save money on your AWS bill? Oh, boy. Well, I just just opened up a whole can of worms. It depends. You thought this was gonna be easy.
Speaker 1:I was gonna give you some secret message, and it depends on everything. It depends if you have, written your application. It depends if you have time with your engineering teams to modify your applications. It depends on if you did a lift and shift. It depends on I mean, it just depends.
Speaker 1:So let's talk about some of these it depends items. I'm gonna try to bucket these things into things you can do if you engineering team has time to do them and things that you can do if your engineering team does not have time to do them, and what's common and kind of, pros and cons of of these things. And there's no particular order in this. I'm just gonna kinda go off the off the cuff here and talk about it. One of the most common things you hear will be, an EDP, and an EDP is a contract that you sign with AWS.
Speaker 1:It has really three components of it that you're gonna negotiate. First one being term length. They're usually 3 years. 2nd one is revenue commitment. What you're what you're committing to AWS, under that term, there's usually an escalation of spend.
Speaker 1:If you year 1 or $1,000,000, Year 2, you're gonna be at 1.2. Year 3, you're gonna be at 1.4. Right? There's gonna be some sort of escalation or growth within that EEP in order for AWS to give you that discount with that EEP. You're going to have, support.
Speaker 1:You know? The default is AWS is gonna tell you you have to sign up for enterprise support and what you're gonna negotiate in terms of that rate. This is something that makes me crazy with EDPs. Companies that sign an EDP thinking they're getting a great deal. Hey.
Speaker 1:Look. We just saved 10% on our AWS bill. Oh, by the way, we have a 10% support cost that we now have to factor into it. So did you save money? Oh, and guess what?
Speaker 1:We're committed to grow a 130% over the next 3 years with AWS. So whoops. Support. I still see EDPs that do not have support in them. It is not a usual thing If you're in a process and you're getting an EDP or have an EDP and you do not have support in it, you should start thinking about budgeting on your renewal when you're doing this projection with your teams of adding 10% on support.
Speaker 1:It will save you a lot of pain and heartache if that gets forced on you. And, look, the name of the game used to be that you would threaten AWS. You're gonna go to Google, and maybe they would give you better terms and, you know, your mileage may vary on that. Last part of the EDP is PPAs and the private pricing agreements and or addendums. And these are things like negotiating discounts for your egress, direct connects, you know, managed services components.
Speaker 1:Basically, everything that bills you within AWS. You know, if you've got a volume and leverage, you can you can talk about EDP is a mechanism that can be imposed without engineering time and enter you know, resources. So it's a, I don't wanna call it an easy button because it is like heroin. You know? The first hits free, but then after that, you're going to spiral down into the abyss very quickly and, find yourself trapped in the EDP terms.
Speaker 1:And when I've seen trapped is, you you know, as you try to save more money, the EDP requires you to spend more money, and you kinda get squeezed in between those two realities. You can't save money in your AWS bill if you're also required to spend more money in your AWS bill. And then you were going to start looking for other things that you could aggressively shift into AWS, including how do you spend money via AWS's marketplace. And by the way, that eliminates direct pricing agreements and discounts with those service providers because now you have to buy through the marketplace and market and AWS will probably only give you 25% spend retirement. So if you spend, you know, $100,000 a month on Datadog, you know, it's AWS is gonna credit you 25 k of that in on your EDP.
Speaker 1:But EDPs are popular because the finance team or the executive team, people can do them without half team to talk about engineering time. Next one is our eyes reserved instances. Our eyes come in lots of different flavors. You've got convertible and non convertible instances. You've got different lengths.
Speaker 1:You know, what seems like really easy becomes really complicated really quickly. And what an RI hope you've got upfront or non upfront payments as well. Again, with an RI is you're making a financial commitment to AWS for a set amount of time. And in exchange for that, you get a discount. Think of it like instead of Amazon making 200% or 300% on the cost of their equipment in the data center.
Speaker 1:They're gonna give you a discount, only charge you a 150%. And, you know, but you have to pay them for 3 years. Some math kinda like that. Who knows what the actual math is, but that's just how you can think about it. Our eyes, again, can be great because you can do it without engineering time.
Speaker 1:How many instances do we have in our fleet? Fantastic. Let's go out and buy RIs and and lower the cost in some of them. What gets scary with RIs is trying to figure out is your application gonna be static inside of that environment for that span of time? If you're gonna buy an RI and you're gonna buy it for 3 years, is that in is your application not changing for 3 years?
Speaker 1:Does your capacity fix for 3 years? Are you never going to have less capacity or need a different RI or all these different things? Now there's RI marketplaces and there's convertible instances and there's different strategies around how to deal with RI's, but RI's get really scary from that standpoint of trying to figure out how to gauge and size your r your RI need and commitments. Remember, the whole point of the cloud is it's supposed to be elastic. Right?
Speaker 1:It's supposed to grow up with you and it's supposed to shrink down with you. And, our eyes, just like EPs with spend commitments, become very inelastic in nature. So if you have an application that has steady state, probably not a great application to have in the cloud in the first place. We'll come back to that. Okay.
Speaker 1:So you have you have our eyes. You have savings plans. Savings plans are a method, again, non engineering related method that the, your finance team can go and can commit to writing a check to AWS. And in exchange for writing a very big check to AWS, they will give you a discount on your services. By the way, RIs and savings plans do not apply universally across everything inside of your AWS environment.
Speaker 1:So you need to make sure what you're actually running will apply to a discount within an with an RI or savings plans. There's some cool things you can do with savings plans. They're really easy. There's some interesting partners in the market that'll help you offset offset the upfront purchase of it. So that way you're not writing a huge check and you're still rationalizing some savings out of it.
Speaker 1:And, you know, we'll help you size the RI, we'll help you or sorry, size size the savings plan or figure out what you'd actually save in the savings plans. You know, are you running EC 2 or managed services on top of EC 2 or ECS or EKS EKS or, you know, any of these things? And how does that actually apply based on what you're actually running? It depends. Spot's a really interesting animal.
Speaker 1:And if you've got a application that can auto scale up and down and needs capacity you know could be let's call relatively predictably needs capacity spot is a it's just a financial technique that AWS created to be able to sell you know underutilized resources within compute at a lower rate to try to stimulate consumption. Right? So if, you know, if you've if you've got a if you got an instance sitting idle, you can sell that, you know, or you can an RI with a commitment. You can do on demand or you can do spot. And the idea with spot is instead of having to just send you an idle and not making any revenue for them, they can sell it at a let's call it a lower margin because it's not selling a negative.
Speaker 1:They can sell it for lower margin to you. Now the negatives of Spot so the good the good things with Spot is you can save a ton of money. Hashtag if your application supports it. The bad things with Spot is as soon as a customer comes along and wants an on demand instance or needs an RI, AKA is going to pay Amazon more money than you're paying them in Spot. Guess where that comes from?
Speaker 1:Turn off Spot instances. You then need tooling to deal with that. And now AWS has been adding more and more functionality into their spot fleet. And then there's 3rd parties that try to give you some signals and and information. So, like, before, you know, I mean, let's just say you've got, you know, 500 instances of EC 2 running in spot.
Speaker 1:Be a little disruptive if all 500 were turned off on you at the same exact instant without instance without you, you know, creating any other capacity into your environment. And so there's there's mechanisms to actually try to predict when those events are going to occur. They're not great. They're better than nothing. And then backfill that with with on demand instances.
Speaker 1:That way you don't have a performance hit your application or other bad things happen to you. So then you've got, so that's okay. So spot. This one I kind of put in between needing engineering resources and time and not. And, this this is the AWS managed services within AWS.
Speaker 1:And so if you take what what's what's a great one? Let's talk talk about RDS and, Kinesis. Those are my two favorites. So what is Kinesis? Kinesis is AWS's black box managed Kafka environment for you.
Speaker 1:EKS EKS is Amazon's black box managed Kubernetes for you. RDS is Amazon's belt you know, black box managed database service for you. They're all great services. These are I mean, Amazon has done a phenomenal job with these things. AWS managed services.
Speaker 1:Kinesis is awesome. RDS is awesome. EKS is awesome. Right? If you don't have the tooling or team or or wherewithal to manage these things yourself, yeah, you can just go and click in the console or go fire off the API and go boom boom boom click click click and boom you've got it.
Speaker 1:Right? Now for that black box managed service, you know, you know, gift wrap thing, you're paying a premium. And as your environment grows and as your I mean, what would what would we see in a an environment. Right? So let's say, let's say you're spending $2,000,000 a month in Amazon.
Speaker 1:You call it you're spending 800 k a month on on EC2 or EC2 Derivatives. And then you're probably spending, I don't know, 2, 3, $400,000 a month on Kinesis. That's, you know, kinda usual typical math. Right? So maybe 10% of your AWS spend on Kinesis in an application.
Speaker 1:Maybe higher, maybe lower, but let's just call it 10%. There's a lot of overhead and waste in that bill. So $200,000 a month in Kinesis. If you come off of Kinesis and you just run Kafka on top of EC 2, it's gonna be a lot cheaper for you. Now this becomes a question.
Speaker 1:The question is, do you devote engineering and DevOps time to actually go out and create and and, run Kafka yourself and manage yourself? Or do you bring in a partner who can give you a managed Kafka service on top of EC 2 in your AWS environment and charge you to manage Kafka, and it still be a quasi black box to you? So there's a lot of examples of these things. You know, what's another example? Sickla DB, for instance, a phenomenal database.
Speaker 1:API compatible replacements to a lot, you know, to different managed databases within AWS. Right? Like, you know, you can run MySQL yourself. You can run it via managed service. You can run on postgreSQL yourself.
Speaker 1:You can run it via managed service. That trend, of course, comes from, like, black box managed service down into managed service with a partner, down to managed service with yourself. Right? We're moving as a CTO said to me once upon a time, I wanna get closer to the metal. Right?
Speaker 1:Up here, lots of lots of stacks of margin built into it that you're paying for. As you come down closer and closer and closer, you eradicate those things, but you have to. You know, it's a scale. Right? And then it's not one for 1.
Speaker 1:Not like for every $1 you move here, you spend $1 over here in management. You know, it's it's a, you know, order of magnitudes. Right? As this decreases, this one kinda slowly comes up. So, again, you can do this yourself or you can do it with a partner.
Speaker 1:And then, leverage that you can get out out of a partner and out of an MSP to manage these things might be you know, it's worth your while to talk about it. Let's just put it that way. Things that you have to do oh oh, so here's another one that you can do without engineering. You can use a SaaS tool to help you auto size your instances. And this is a profiling.
Speaker 1:Hey. You're buying this many of this size instance and looking at the performance optimization utilization of these instances. You should instead of buying that instance you should buy this other instance. So there's a lot of interesting intelligence and things that you can do about that like did you size your instances properly from the first place there's tools that can auto shut down unused instances commonly marketed towards like development environments staging and QA environments, test environments, things that don't actually run your production application but have to run during the day or maybe don't have to run over the weekend. You know, if you can eradicate 4 weekends a month, so 8 out of 30 days, that has a financial impact to you as you start out, you know, scaling that hope over the course of the year.
Speaker 1:If you can shut things down at night, does that impact, you know, your finance? Right? So if you go 8 week you know, 8 days out of 30 just in the weekends, but then you say, okay. My team only works 10 hours a day, so we've got another 14 hours a day that we can turn this stuff off. Right?
Speaker 1:You can see where we're going with this. Some of these tools will help you find an unattached EBS volumes, for instance. And surprisingly, boy, there's a lot in environments. If you're if you're firing up EBS and you destroy an EC 2 instances and you don't have a way of reclaiming or or or destroying that EBS as part of that EC 2 instance going away, chances are you've got a lot of EBS volumes that are unattached and running some sort of tool to go out and find them and kill them for you, lots of money. Right?
Speaker 1:K. So these are all, like, non engineering stuff. Now we get into the engineering stuff. And these are the things like if you did a lift and shift of your application and you took a on premise application to the cloud, it's not cloud native. What does that even mean?
Speaker 1:What that means from the cloud vendors, they want you to be using serverless tools. In in AWS, they want you on Lambda, and they want you on, you know, a modern stack. And maybe that's good for your application. Maybe it's not. There is a lot of inherent lock in when you go all the way serverless.
Speaker 1:It's maybe not a great idea for you. Containers are phenomenal for that because you still have portability of your application between clouds as long as you're not dependent on specific managed services from that cloud. So, you know, if, you know, Google has an alternative to RDS, they have an alternative to Dynamo. They've they're different. I mean, in the RDS world, like, MySQL is MySQL is MySQL for the most part.
Speaker 1:Dynamo versus Bigtable are different. You know, there's just there there's just some realities. Right? You know, going through an architecture review and application modernization, this becomes a big engineering lift and shift. And depending on the application and your application and your engineering team, again, you can find partners and you can find a third party to help you go through that process and do that work for you.
Speaker 1:And maybe it's something that you have to run that you're not really investing in strategically. But, you know, if you address and you make some changes to and you containerize that application, it's gonna give you a lot of benefits. So maybe you bring in a 3rd party in to actually do that work for you and deliver you a containerized environment at the end of the day. So that's another example. But usually, when you start talking about these things, that's a lot of it's a lot of engineering time.
Speaker 1:Data. The way data flows around in your environment. Oh, boy. You should really take a take I mean, when I say take a fine tooth comb to this, you know, find a whiteboard or a really big piece of paper and start drawing it. I have seen environments to take and Internet to load balancer to EC 2 instances to load balancer to Kinesis pipeline to load balancer to Lambda gateways back to load balancer back to other EC 2 instances back to to dynamo back to you know, and and then to a Lambda firing off.
Speaker 1:I mean, you you know, it sounds like I'm making this up, but I'm really not. Looking at that architecture and how that data is flowing, data's expensive in cloud. Moving data is expensive in cloud. And figuring out how your data is moving around inside of that cloud environment, you know, manage NAT gateways. I mean, jeez Louise, you wanna talk about a waste of money.
Speaker 1:You're using managed NAT gateways. Figure out how to not use managed NAT gateways. I mean, that's gonna save you a small fortune. Figure out how to use Glacier when you're using s 3 and having tiering on your storage. Make sure you're sending TTLs in DynamoDB if you're not.
Speaker 1:All these things, like data and the movement of data becomes very you don't think about it. People don't think about it. They think about instance cost and instance sizing, how much would they spend in e c 2. Maybe they think about what they're spending in their s 3 buckets. But, like, actually moving data in and out of the cloud and between cloud services and between AZs and regions and stuff like that.
Speaker 1:It's crazy. It's crazy what AWS charges for this. But I want let's see what else do I wanna talk about here with an AWS. You know, the idea of multi cloud is is interesting. It maybe makes sense for you or maybe doesn't.
Speaker 1:You know, our large AWS customers almost always end up with machine learning pipelines within Google. Maybe it's BigQuery, Bigtable for for data warehouse and analytics, and then maybe it's TensorFlow. It's very situational again whether or not the cost of moving data out of AWS and putting into GCP to be on the size of it, what that works for you. I'll also tell you that I've had a lot of customers get into trouble with EDPs where they've had to abandon their Google infrastructure and bring it back into AWS and spend a lot of engineering hours to bring that application back and shift from Bigtable to Redshift or whatever they were doing because they they hit a EDP spending, you know, shortfall. That's a big, big trap, you know, you don't wanna walk into.
Speaker 1:Bare metal colocation are the ultimate expressions of, like, saving money on cloud. This sounds crazy because, of course, it's not cloud. But if you have an application and most applications hit a steady state, you know, there's only a few Amazon dotcoms running Black Friday sales. Right? And by the way, you can still have an application running in a data center and you can still use an AWS for elasticity and for capacity.
Speaker 1:And there's it's it's not this is not a scary thing. And you can use the same tools. You can use Terraform. You can use CloudFormation. You can use, you can have an ALB and EC 2, you know, or sorry, an AWS directing traffic between your between your bare metal or colocated infrastructure and AWS.
Speaker 1:And there's lots of different ways to skin this cat depending on what you're trying to achieve. I will tell you if you have a generic containerized application serving traffic to the Internet and you're running EKS and you look at that application running in in AWS and you look at that application running on let's go out and say, we bought server micro or Dell servers or whatever it is, and you put into a data center, the cost differential between AWS and your equipment that you purchased and put into a data center is 87%. So if you're spending a $100,000 a month in AWS, you'll spend $13,000 a month in in a data center. Now, of course, you know, this makes more most sense if it was, like, $1,000,000 a month in AWS and I can spend a $130,000 a month in a data center. Right?
Speaker 1:Like, that that's a huge delta, and that's worth the engineering time. You're gonna save $800,000 a month. I mean, go hire some engineers and go make this transition happen. Right? Math that we see, going from data centers to bare metal where you don't buy the equipment, somebody else operates the equipment, maybe they give you a managed Kubernetes or they give you, you know, Terraform APIs and to maintain this environment.
Speaker 1:You don't have to worry about, like, going out and buying servers and signing the colocation agreement, managing bandwidth, or bringing a partner in for that. You're just gonna have boxes sitting sitting somewhere. It's an interesting euphemism. So if you're spending a a $1,000,000, you're gonna save $720,000 a month going from data center from AWS to, your own data center or going to a bare metal provider. Now look.
Speaker 1:There's a lot of caveats on that, and so there's a lot of, again, it depends on everything. If your application fits that mold and you can do it, it is more than likely worth the effort and energy for you to make because, I mean, just think about that. Think about that in terms of your cost structure and your financial stack. Like, what do you spend? What is your overhead to deliver your application for your business to function?
Speaker 1:Now eradicate 70% of that cost and tell me what that does for your business. I know lots of really wonderful things is the answer to that one. So there's no one size fits all answer to this question of how do you save money in AWS. It is so situationally dependent on what you're running, where you're running it, what your capacity is, how do you take advantage of these different programs, can you bring in a partner, can you not bring in a partner, do you engineering resources? Do you not have engineering resources?
Speaker 1:What kind of timelines are you working with? What kind of pressure are you under? You know, anybody that's giving you a a promise of one size fits all is is lying to you. Don't believe them. Run away.
Speaker 1:Run away. Run away. Run away. But if there's something you like to chat about, I love having this conversation. It's always interesting to get into the bowels of an AWS platform and really talk with companies about what they're doing, how they're doing it, and and what tweaks they can make to their platform to do what they actually wanna do, which is just win, dominate their market, grow, be more successful, generate more profit, cut their burn.
Speaker 1:You know, whatever phrasing you wanna use, it's all available to you. And the trick is just whether or not, you know, again, no engineering time, engineering time, no engineering time with partners, no engineer engineering time without partners, yada yada yada. There's some scale that you're gonna go up and down. I'm Max Clark. It's a, you know, quick rant on saving money on your AWS bills in the cloud.
Speaker 1:Hope it helps.